ltx2
¶
Modules¶
fastvideo.pipelines.basic.ltx2.continuation
¶
Typed continuation state for the LTX-2 streaming pipeline.
Segment N+1 conditions on segment N's trailing decoded frames and
denoised audio latents. The streaming runtime used to hold this state as
per-worker globals; lifting it into a typed, JSON-serializable object
lets clients snapshot, migrate, or round-trip it through an HTTP/RPC
boundary. The envelope ContinuationState(kind, payload) is the
shared public API; the typed class here owns the LTX-2 payload shape.
Serialization contract:
- Video frames → PNG bytes + base64, or a :class:
BlobStoreid. - Audio latents → a self-describing safetensors blob + base64, or a
:class:
BlobStoreid. safetensors preservesbfloat16, which a raw-numpy round-trip cannot. - The returned payload is always a plain JSON-serializable dict.
Attributes¶
fastvideo.pipelines.basic.ltx2.continuation.DEFAULT_INLINE_THRESHOLD_BYTES
module-attribute
¶
Tensors larger than this go to the blob store (if available). 2 MiB is below typical single-JSON-message limits (Dynamo: 4 MiB, Postgres TOAST: 1 GiB) and well above per-frame PNG payloads (~200 KiB at 512x512).
fastvideo.pipelines.basic.ltx2.continuation.LTX2_CONTINUATION_KIND
module-attribute
¶
Public ContinuationState.kind for LTX-2 payloads.
fastvideo.pipelines.basic.ltx2.continuation.LTX2_CONTINUATION_SCHEMA_VERSION
module-attribute
¶
Payload schema version carried inside payload.schema_version.
Classes¶
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState
dataclass
¶
LTX2ContinuationState(segment_index: int = 0, video_frames: list[ndarray] | None = None, video_frames_blob_id: str | None = None, video_conditioning_frame_idx: int = 0, video_conditioning_strength: float = 1.0, audio_latents: Tensor | None = None, audio_latents_blob_id: str | None = None, audio_sample_rate: int | None = None, audio_conditioning_num_frames: int = 0, audio_conditioning_strength: float = 1.0, video_position_offset_sec: float = 0.0, metadata: dict[str, Any] = dict())
Typed LTX-2 continuation state carried between streaming segments.
video_frames hold trailing decoded RGB frames (uint8 HxWx3) from
segment N for conditioning segment N+1 via the VAE encode path.
audio_latents is the cached denoised audio latent tensor of shape
[B, C, T, mel] that segment N+1 will copy into the overlap
region of its clean-latent conditioning.
Most fields map 1:1 onto the internal gpu_pool's per-worker state;
the only new concept is the *_blob_id fields, which allow large
tensors to live outside the JSON payload. See module docstring.
Attributes¶
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.audio_conditioning_num_frames
class-attribute
instance-attribute
¶audio_conditioning_num_frames: int = 0
Number of trailing audio frames that carry over as clean context into segment N+1.
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.audio_conditioning_strength
class-attribute
instance-attribute
¶audio_conditioning_strength: float = 1.0
Clean-latent mask value applied to the overlap region; 0.0 keeps the cached audio entirely, 1.0 renoises from scratch.
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.audio_latents
class-attribute
instance-attribute
¶Denoised audio latent tensor of shape [B, C, T, mel].
None when the state is blob-backed or unset.
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.audio_latents_blob_id
class-attribute
instance-attribute
¶audio_latents_blob_id: str | None = None
Blob store id when audio latents live outside the payload.
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.audio_sample_rate
class-attribute
instance-attribute
¶audio_sample_rate: int | None = None
Sample rate for the audio side (e.g. 24000).
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.metadata
class-attribute
instance-attribute
¶Opaque metadata bag for forward-compat fields that don't need their own typed slot yet (e.g. custom knob experiments).
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.segment_index
class-attribute
instance-attribute
¶segment_index: int = 0
Index of the just-completed segment. Segment 0 has no history;
state returned after segment 0 carries segment_index=0 and the
caller uses segment_index + 1 as the next segment number.
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.video_conditioning_frame_idx
class-attribute
instance-attribute
¶video_conditioning_frame_idx: int = 0
Target frame index inside the next segment that the trailing
frames align with (matches the LTX-2 ltx2_video_conditions
tuple's frame_idx slot).
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.video_conditioning_strength
class-attribute
instance-attribute
¶video_conditioning_strength: float = 1.0
Conditioning strength in [0, 1]. Matches the ltx2_video_
conditions tuple's strength slot.
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.video_frames
class-attribute
instance-attribute
¶video_frames: list[ndarray] | None = None
Trailing decoded frames, each an RGB uint8 np.ndarray shaped
(H, W, 3). None when the state is blob-backed or unset.
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.video_frames_blob_id
class-attribute
instance-attribute
¶video_frames_blob_id: str | None = None
Blob store id when the frames live outside the payload.
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.video_position_offset_sec
class-attribute
instance-attribute
¶video_position_offset_sec: float = 0.0
Seconds by which video RoPE is shifted forward so the audio
prefix can sit at t >= 0 when audio conditioning is longer than
video conditioning.
Methods:¶
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.from_continuation_state
classmethod
¶from_continuation_state(state: ContinuationState, *, blob_store: BlobStore | None = None) -> LTX2ContinuationState
Rebuild a typed state from a public :class:ContinuationState.
Raises :class:ValueError when the kind doesn't match or the
schema version is unsupported.
Source code in fastvideo/pipelines/basic/ltx2/continuation.py
fastvideo.pipelines.basic.ltx2.continuation.LTX2ContinuationState.to_continuation_state
¶to_continuation_state(*, blob_store: BlobStore | None = None, inline_threshold_bytes: int = DEFAULT_INLINE_THRESHOLD_BYTES) -> ContinuationState
Serialize into a public :class:ContinuationState.
When blob_store is given, tensors larger than
inline_threshold_bytes are stored via
:meth:BlobStore.put and referenced by id; otherwise all data
is base64-encoded inline. The payload is always a plain
JSON-serializable dict.
Source code in fastvideo/pipelines/basic/ltx2/continuation.py
Functions:¶
fastvideo.pipelines.basic.ltx2.pipeline_configs
¶
Classes¶
fastvideo.pipelines.basic.ltx2.pipeline_configs.LTX2T2VConfig
dataclass
¶
LTX2T2VConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = None, flow_shift_sr: float | None = None, disable_autocast: bool = False, scheduler_step_in_fp32: bool = False, is_causal: bool = False, dit_config: DiTConfig = LTX2VideoConfig(), dit_precision: str = 'bf16', upsampler_config: UpsamplerConfig = UpsamplerConfig(), upsampler_precision: str = 'fp32', vae_config: VAEConfig = LTX2VAEConfig(), vae_precision: str = 'bf16', vae_tiling: bool = True, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple[EncoderConfig, ...] = (lambda: (LTX2GemmaConfig(),))(), text_encoder_precisions: tuple[str, ...] = (lambda: ('bf16',))(), preprocess_text_funcs: tuple[Callable[[str], str], ...] = (lambda: (preprocess_text,))(), postprocess_text_funcs: tuple[Callable[[BaseEncoderOutput], Tensor], ...] = (lambda: (ltx2_postprocess_text,))(), dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, lucy_edit_task: bool = False, boundary_ratio: float | None = None, audio_decoder_config: ModelConfig = LTX2AudioDecoderConfig(), vocoder_config: ModelConfig = LTX2VocoderConfig(), audio_decoder_precision: str = 'bf16', vocoder_precision: str = 'bf16')
fastvideo.pipelines.basic.ltx2.stage_overrides
¶
Typed override surfaces for the LTX-2 two-stage refine flow.
preset_overrides.refine— init-time knobs (see :class:LTX2RefinePresetOverride).stage_overrides.refine— per-request knobs (see :class:LTX2RefineStageOverride).
Asset paths live on :class:~fastvideo.api.schema.ComponentConfig
(upsampler_weights and lora_path).
Classes¶
fastvideo.pipelines.basic.ltx2.stage_overrides.LTX2RefinePresetOverride
dataclass
¶
Init-time refine wiring under preset_overrides.refine.
fastvideo.pipelines.basic.ltx2.stage_overrides.LTX2RefineStageOverride
dataclass
¶
LTX2RefineStageOverride(num_inference_steps: int | None = None, guidance_scale: float | None = None, image_crf: int | None = None, video_position_offset_sec: float | None = None)
Per-request refine tuning under stage_overrides.refine.
Functions:¶
fastvideo.pipelines.basic.ltx2.stage_overrides.refine_override_to_dict
¶
refine_override_to_dict(override: LTX2RefinePresetOverride | LTX2RefineStageOverride) -> dict[str, Any]
Serialise a refine override, dropping None entries so only
user-set fields reach preset_overrides.refine or
stage_overrides.refine.
Source code in fastvideo/pipelines/basic/ltx2/stage_overrides.py
fastvideo.pipelines.basic.ltx2.stages
¶
LTX-2 family pipeline stages.
Classes¶
fastvideo.pipelines.basic.ltx2.stages.LTX2AudioDecodingStage
¶
Bases: PipelineStage
Decode LTX-2 audio latents into a waveform.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_audio_decoding.py
fastvideo.pipelines.basic.ltx2.stages.LTX2DenoisingStage
¶
LTX2DenoisingStage(transformer, *, sigmas_override: list[float] | None = None, num_inference_steps_override: int | None = None, force_guidance_scale: float | None = None, initial_audio_latents_key: str | None = 'ltx2_audio_latents')
Bases: PipelineStage
Run the LTX-2 denoising loop over the sigma schedule.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_denoising.py
fastvideo.pipelines.basic.ltx2.stages.LTX2LatentPreparationStage
¶
Bases: PipelineStage
Prepare initial LTX-2 latents without relying on a diffusers scheduler.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_latent_preparation.py
fastvideo.pipelines.basic.ltx2.stages.LTX2RefineInitStage
¶
Bases: PipelineStage
Switch the request to half resolution before the stage-1 denoise.
Stashes the original target resolution on batch.extra so
:class:LTX2UpsampleStage can recover it after stage 1 runs. When
the refine path is disabled the stage is a no-op.
fastvideo.pipelines.basic.ltx2.stages.LTX2RefineLoRAStage
¶
Bases: PipelineStage
Apply a refinement-specific LoRA before stage-2 denoising.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_refine.py
fastvideo.pipelines.basic.ltx2.stages.LTX2TextEncodingStage
¶
Bases: TextEncodingStage
LTX2 text encoding stage with sequence parallelism support.
When SP is enabled (sp_world_size > 1), only rank 0 runs the text encoder and broadcasts embeddings to other ranks. This avoids I/O contention from all ranks loading the Gemma model simultaneously, which can cause text encoding to take 100+ seconds instead of ~5 seconds.
Source code in fastvideo/pipelines/stages/text_encoding.py
fastvideo.pipelines.basic.ltx2.stages.LTX2UpsampleStage
¶
LTX2UpsampleStage(*, upsampler: Any, vae: Any, transformer: Any | None = None, sigmas: list[float] | None = None, add_noise: bool = True)
Bases: PipelineStage
Upsample stage-1 latents to stage-2 resolution and add refine noise.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_refine.py
Modules¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_audio_decoding
¶
Audio decoding stage for LTX-2 pipelines.
Classes¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_audio_decoding.LTX2AudioDecodingStage
¶
Bases: PipelineStage
Decode LTX-2 audio latents into a waveform.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_audio_decoding.py
Functions:¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_denoising
¶
LTX-2 denoising stage using the native sigma schedule.
Classes¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_denoising.LTX2DenoisingStage
¶LTX2DenoisingStage(transformer, *, sigmas_override: list[float] | None = None, num_inference_steps_override: int | None = None, force_guidance_scale: float | None = None, initial_audio_latents_key: str | None = 'ltx2_audio_latents')
Bases: PipelineStage
Run the LTX-2 denoising loop over the sigma schedule.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_denoising.py
Functions:¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_image_conditioning
¶
FastVideo-native LTX-2 image-to-video conditioning helpers.
Public-side port of FastVideo-internal/.../ltx2_i2v_conditioning.py.
The module composes a clean_latent + denoise_mask pair that the
LTX-2 latent-prep + denoising stages mix into the noise tensor, so a
generated segment can be anchored to:
- one or more conditioning images at specific latent frame indices
(
ltx2_images), - a multi-frame conditioning video clip jointly VAE-encoded
(
ltx2_video_conditions), - a continuation latent carried over from the previous segment
(
ltx2_conditioning_latent_stage1/_stage2).
The streaming server's session controller populates the continuation
latents between segments; the legacy from_pretrained path passes
ltx2_images / ltx2_image_crf through compat translation.
Classes¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_image_conditioning.LTX2ImageConditioningState
dataclass
¶LTX2ImageConditioningState(clean_latent: Tensor, denoise_mask: Tensor, images: list[tuple[str, int, float]], latent_conditioned: bool = False)
Result of building image / continuation conditioning.
Functions:¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_image_conditioning.apply_ltx2_gaussian_noiser
¶apply_ltx2_gaussian_noiser(*, noise: Tensor, clean_latent: Tensor, denoise_mask: Tensor, noise_scale: float = 1.0) -> Tensor
Mix noise into clean_latent along denoise_mask * scale.
Values close to 1 in the mask produce near-pure noise (used in a fresh stage-2 latent), values near 0 leave the clean latent untouched (used in conditioning regions).
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_image_conditioning.py
fastvideo.pipelines.basic.ltx2.stages.ltx2_image_conditioning.build_ltx2_image_conditioning
¶build_ltx2_image_conditioning(*, batch: ForwardBatch, latents: Tensor, vae: Module, height: int, width: int, image_crf: float | None = None, base_clean_latent: Tensor | None = None) -> LTX2ImageConditioningState | None
Build the (clean_latent, denoise_mask) state for the next segment.
Returns None for plain T2V (no images, no continuation, no
video conditions). The denoise mask is 1 where the model should
sample fresh, 0 where it should preserve the conditioning latent
exactly. base_clean_latent is None corresponds to stage 1
(fresh half-res latent); base_clean_latent set means stage 2
(already-upsampled latent from the upsampler stage).
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_image_conditioning.py
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 | |
fastvideo.pipelines.basic.ltx2.stages.ltx2_image_conditioning.load_ltx2_conditioning_video_clip
¶load_ltx2_conditioning_video_clip(frame_paths: list[str], *, height: int, width: int, dtype: dtype, device: device, image_crf: float) -> Tensor
Load multiple frames and stack as [1, C, T, H, W] for joint
VAE encoding so the resulting latent captures temporal/motion info.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_image_conditioning.py
fastvideo.pipelines.basic.ltx2.stages.ltx2_image_conditioning.post_process_ltx2_denoised
¶post_process_ltx2_denoised(*, denoised: Tensor, denoise_mask: Tensor, clean_latent: Tensor) -> Tensor
Restore the conditioning regions of clean_latent outside the
denoise mask after the model has filled in the masked area.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_image_conditioning.py
fastvideo.pipelines.basic.ltx2.stages.ltx2_image_conditioning.resolve_ltx2_images
¶Collect any LTX-2 image conditioning inputs from the batch.
Falls back to batch.image_path for the simple single-image i2v
case (anchors the first latent frame at full strength).
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_image_conditioning.py
fastvideo.pipelines.basic.ltx2.stages.ltx2_latent_preparation
¶
Latent preparation stage for LTX-2 pipelines.
Classes¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_latent_preparation.LTX2LatentPreparationStage
¶
Bases: PipelineStage
Prepare initial LTX-2 latents without relying on a diffusers scheduler.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_latent_preparation.py
Functions:¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_refine
¶
LTX-2 refinement stages for 2x spatial upscaling + distilled denoising.
Public-side port of FastVideo-internal/.../stages/ltx2_refine.py.
The three stages run between the stage-1 denoising pass and the stage-2
denoising pass:
- :class:
LTX2RefineInitStage— halves the requested resolution so the first denoise runs at ½× and stashes the original target resolution onbatch.extraso the upsample stage can recover it. - :class:
LTX2UpsampleStage— upsamples the stage-1 latents through the LTX-2 latent upsampler, optionally re-applies image conditioning, and mixes in fresh noise scaled by the stage-2 sigma so the next denoise has something to refine. - :class:
LTX2RefineLoRAStage— swaps in a refinement LoRA before the stage-2 denoise (no-op when the path is unset).
Behaviour matches the internal version 1:1 for the text-to-video path;
the i2v / continuation branches inside build_ltx2_image_conditioning
defer to a NotImplementedError until the rest of the i2v conditioning
module is ported.
Classes¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_refine.LTX2RefineInitStage
¶
Bases: PipelineStage
Switch the request to half resolution before the stage-1 denoise.
Stashes the original target resolution on batch.extra so
:class:LTX2UpsampleStage can recover it after stage 1 runs. When
the refine path is disabled the stage is a no-op.
fastvideo.pipelines.basic.ltx2.stages.ltx2_refine.LTX2RefineLoRAStage
¶
Bases: PipelineStage
Apply a refinement-specific LoRA before stage-2 denoising.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_refine.py
fastvideo.pipelines.basic.ltx2.stages.ltx2_refine.LTX2UpsampleStage
¶LTX2UpsampleStage(*, upsampler: Any, vae: Any, transformer: Any | None = None, sigmas: list[float] | None = None, add_noise: bool = True)
Bases: PipelineStage
Upsample stage-1 latents to stage-2 resolution and add refine noise.
Source code in fastvideo/pipelines/basic/ltx2/stages/ltx2_refine.py
Functions:¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_text_encoding
¶
LTX2-specific text encoding stage with sequence parallelism broadcast support.
When running with sequence parallelism (SP), the Gemma text encoder is only executed on rank 0, and the embeddings are broadcast to all other ranks. This avoids I/O contention from all ranks loading the Gemma model simultaneously.
Classes¶
fastvideo.pipelines.basic.ltx2.stages.ltx2_text_encoding.LTX2TextEncodingStage
¶
Bases: TextEncodingStage
LTX2 text encoding stage with sequence parallelism support.
When SP is enabled (sp_world_size > 1), only rank 0 runs the text encoder and broadcasts embeddings to other ranks. This avoids I/O contention from all ranks loading the Gemma model simultaneously, which can cause text encoding to take 100+ seconds instead of ~5 seconds.