ltx2 ¶

Modules¶

fastvideo.pipelines.preprocess.ltx2.ltx2_preprocess_pipelines ¶

LTX-2 preprocessing pipeline for native FastVideo training data generation.

This module defines the LTX-2 preprocess pipeline used by FastVideo workflows to build precomputed training artifacts from raw text/video datasets.

Usage: - Entry is through preprocess workflows that register PreprocessPipelineT2V. - Input samples should provide prompt text plus video metadata/loader fields consumed by TextTransformStage and VideoTransformStage. - Output artifacts are written by the shared preprocessing workflow into .precomputed/ (latents, conditions, and optional audio_latents).

Optional audio path: - When audio preprocessing is enabled, this pipeline loads the native LTX-2 audio encoder and stores per-sample audio latents in batch.extra["ltx2_audio_latents"].

Classes¶

fastvideo.pipelines.preprocess.ltx2.ltx2_preprocess_pipelines.LTX2AudioEncodingStage ¶

LTX2AudioEncodingStage(audio_encoder: Module, audio_processor: AudioProcessor, fallback_fps: int)

Bases: PipelineStage

Extract audio from input videos and encode into LTX-2 audio latents.

Source code in fastvideo/pipelines/preprocess/ltx2/ltx2_preprocess_pipelines.py

def __init__(
    self,
    audio_encoder: torch.nn.Module,
    audio_processor: AudioProcessor,
    fallback_fps: int,
) -> None:
    self.audio_encoder = audio_encoder.eval()
    self.audio_processor = audio_processor
    self.fallback_fps = fallback_fps
    self.audio_dtype = next(audio_encoder.parameters()).dtype
    self.audio_device = next(audio_encoder.parameters()).device

fastvideo.pipelines.preprocess.ltx2.ltx2_preprocess_pipelines.LTX2TextPrecomputeStage ¶

LTX2TextPrecomputeStage(text_encoder: Module, tokenizer: Any, preprocess_text_fn, tokenizer_kwargs: dict[str, Any], padding_side: str)

Bases: PipelineStage

Compute pre-connector Gemma embeddings for LTX-2 training.

Source code in fastvideo/pipelines/preprocess/ltx2/ltx2_preprocess_pipelines.py

def __init__(
    self,
    text_encoder: torch.nn.Module,
    tokenizer: Any,
    preprocess_text_fn,
    tokenizer_kwargs: dict[str, Any],
    padding_side: str,
) -> None:
    self.text_encoder = text_encoder
    self.tokenizer = tokenizer
    self.preprocess_text_fn = preprocess_text_fn
    self.tokenizer_kwargs = tokenizer_kwargs
    self.padding_side = padding_side

fastvideo.pipelines.preprocess.ltx2.ltx2_preprocess_pipelines.PreprocessPipelineT2V ¶

PreprocessPipelineT2V(model_path: str, fastvideo_args: FastVideoArgs | TrainingArgs, required_config_modules: list[str] | None = None, loaded_modules: dict[str, Module] | None = None)

Bases: ComposedPipelineBase

Native LTX-2 preprocessing pipeline (text/video with optional audio).

Source code in fastvideo/pipelines/composed_pipeline_base.py

def __init__(self,
             model_path: str,
             fastvideo_args: FastVideoArgs | TrainingArgs,
             required_config_modules: list[str] | None = None,
             loaded_modules: dict[str, torch.nn.Module] | None = None):
    """
    Initialize the pipeline. After __init__, the pipeline should be ready to
    use. The pipeline should be stateless and not hold any batch state.
    """
    self.fastvideo_args = fastvideo_args

    self.model_path: str = model_path
    self._stages: list[PipelineStage] = []
    self._stage_name_mapping: dict[str, PipelineStage] = {}

    if required_config_modules is not None:
        self._required_config_modules = required_config_modules

    if self._required_config_modules is None:
        raise NotImplementedError("Subclass must set _required_config_modules")

    maybe_init_distributed_environment_and_model_parallel(fastvideo_args.tp_size, fastvideo_args.sp_size)

    # Torch profiler. Enabled and configured through env vars:
    # FASTVIDEO_TORCH_PROFILER_DIR=/path/to/save/trace
    trace_dir = envs.FASTVIDEO_TORCH_PROFILER_DIR
    self.profiler_controller = get_or_create_profiler(trace_dir)
    self.profiler = self.profiler_controller.profiler

    self.local_rank = get_world_group().local_rank

    # Load modules directly in initialization
    logger.info("Loading pipeline modules...")
    with self.profiler_controller.region("profiler_region_model_loading"):
        self.modules = self.load_modules(fastvideo_args, loaded_modules)