models
¶
Modules¶
fastvideo.train.models.base
¶
Classes¶
fastvideo.train.models.base.CausalModelBase
¶
CausalModelBase(*, trainable: bool = True, lora: LoraConfig | dict[str, Any] | None = None)
Bases: ModelBase
Extension for causal / streaming model plugins.
Cache state is internal to the model instance and keyed by cache_tag (no role handle needed).
Source code in fastvideo/train/models/base.py
Methods:¶
fastvideo.train.models.base.CausalModelBase.clear_caches
abstractmethod
¶clear_caches(*, cache_tag: str = 'pos') -> None
fastvideo.train.models.base.CausalModelBase.predict_noise_streaming
abstractmethod
¶predict_noise_streaming(noisy_latents: Tensor, timestep: Tensor, batch: TrainingBatch, *, conditional: bool, cache_tag: str = 'pos', store_kv: bool = False, cur_start_frame: int = 0, cfg_uncond: dict[str, Any] | None = None, attn_kind: Literal['dense', 'vsa'] = 'dense') -> Tensor | None
Streaming predict-noise that may update internal caches.
Source code in fastvideo/train/models/base.py
fastvideo.train.models.base.CausalModelBase.predict_x0_streaming
¶predict_x0_streaming(noisy_latents: Tensor, timestep: Tensor, batch: TrainingBatch, *, conditional: bool, cache_tag: str = 'pos', store_kv: bool = False, cur_start_frame: int = 0, cfg_uncond: dict[str, Any] | None = None, attn_kind: Literal['dense', 'vsa'] = 'dense') -> Tensor | None
Predict x0 streaming via
predict_noise_streaming + conversion.
Source code in fastvideo/train/models/base.py
fastvideo.train.models.base.ModelBase
¶
ModelBase(*, trainable: bool = True, lora: LoraConfig | dict[str, Any] | None = None)
Bases: ABC
Per-role model instance.
Every role (student, teacher, critic, …) gets its own ModelBase
instance. Each instance owns its own transformer and
noise_scheduler. Heavyweight resources (VAE, dataloader, RNG
seeds) are loaded lazily via :meth:init_preprocessors, which the
method calls only on the student.
Source code in fastvideo/train/models/base.py
Attributes¶
fastvideo.train.models.base.ModelBase.device
property
¶The local CUDA device for this rank.
fastvideo.train.models.base.ModelBase.num_train_timesteps
property
¶num_train_timesteps: int
Return the scheduler's training timestep horizon.
Methods:¶
fastvideo.train.models.base.ModelBase.add_noise
abstractmethod
¶ fastvideo.train.models.base.ModelBase.backward
abstractmethod
¶ fastvideo.train.models.base.ModelBase.init_preprocessors
¶Load VAE, build dataloader, seed RNGs.
Called only on the student by the method's __init__.
Default is a no-op so teacher/critic instances skip this.
Source code in fastvideo/train/models/base.py
fastvideo.train.models.base.ModelBase.on_train_start
¶ fastvideo.train.models.base.ModelBase.predict_noise
abstractmethod
¶predict_noise(noisy_latents: Tensor, timestep: Tensor, batch: TrainingBatch, *, conditional: bool, cfg_uncond: dict[str, Any] | None = None, attn_kind: Literal['dense', 'vsa'] = 'dense') -> Tensor
Predict noise/flow for the given noisy latents.
Source code in fastvideo/train/models/base.py
fastvideo.train.models.base.ModelBase.predict_x0
¶predict_x0(noisy_latents: Tensor, timestep: Tensor, batch: TrainingBatch, *, conditional: bool, cfg_uncond: dict[str, Any] | None = None, attn_kind: Literal['dense', 'vsa'] = 'dense') -> Tensor
Predict x0 via predict_noise + conversion.
Source code in fastvideo/train/models/base.py
fastvideo.train.models.base.ModelBase.prepare_batch
abstractmethod
¶prepare_batch(raw_batch: dict[str, Any], *, generator: Generator, latents_source: Literal['data', 'zeros'] = 'data') -> TrainingBatch
Convert a dataloader batch into forward primitives.
Source code in fastvideo/train/models/base.py
fastvideo.train.models.base.ModelBase.shift_and_clamp_timestep
¶Functions:¶
fastvideo.train.models.cosmos
¶
Cosmos model plugin package.
Classes¶
Modules¶
fastvideo.train.models.cosmos.cosmos
¶
Cosmos model plugin (per-role instance).
Subclasses WanModel since Cosmos uses the same FlowMatchEulerDiscreteScheduler. Differences: - transformer class name: CosmosTransformer3DModel - normalize_dit_input("cosmos", ...) instead of ("wan", ...) - forward kwargs: no encoder_attention_mask, needs condition_mask + padding_mask + fps - hidden_states in (B,C,T,H,W) — no permute needed - default flow_shift = 1.0 - single T5 text encoder (not dual like Hunyuan)
Classes¶
fastvideo.train.models.cosmos.cosmos.CosmosModel
¶CosmosModel(*, init_from: str, training_config: TrainingConfig, trainable: bool = True, disable_custom_init_weights: bool = False, flow_shift: float = 1.0, enable_gradient_checkpointing_type: str | None = None, transformer_override_safetensor: str | None = None)
Bases: WanModel
Cosmos 2.5 per-role model.
Inherits most behaviour from WanModel (noise scheduler, timestep sampling, attention metadata, backward). Overrides only the pieces that differ for Cosmos 2.5.
Cosmos 2.5 uses: - Cosmos25Transformer3DModel (velocity prediction) - EDM noise schedule: x_t = x_0 + sigma * eps - No input/output preconditioning (raw latents) - Timestep = raw sigma value - Model output = velocity ≈ noise
Source code in fastvideo/train/models/cosmos/cosmos.py
fastvideo.train.models.cosmos.cosmos.CosmosModel.ensure_negative_conditioning
¶Create negative (unconditional) prompt embeddings.
Cosmos 2.5 uses Reason1 (Qwen2.5-VL) which is expensive
to load. This method only supports training_cfg_rate=0
(no classifier-free guidance dropout), in which case the
negative embedding is never used and a zero placeholder
sized to match the text embedding dimension is sufficient.
training_cfg_rate>0 would require real Reason1 negative
embeddings and is rejected here to avoid silently training
with zero-vector "unconditional" inputs.
Source code in fastvideo/train/models/cosmos/cosmos.py
fastvideo.train.models.cosmos.cosmos.CosmosModel.prepare_batch
¶prepare_batch(raw_batch: dict[str, Any], *, generator: Generator, latents_source: Literal['data', 'zeros'] = 'data') -> TrainingBatch
Same flow as Wan, but uses Cosmos VAE normalisation.
Source code in fastvideo/train/models/cosmos/cosmos.py
Functions:¶
fastvideo.train.models.hunyuan
¶
Hunyuan model plugin package.
Classes¶
Modules¶
fastvideo.train.models.hunyuan.hunyuan
¶
Hunyuan model plugin (per-role instance).
Subclasses WanModel since HunyuanVideo uses the same FlowMatchEulerDiscreteScheduler and linear-interpolation noise schedule. Differences: - transformer class name - normalize_dit_input("hunyuan", ...) instead of ("wan", ...) - forward kwargs: no encoder_attention_mask, no return_dict - default flow_shift = 7
Classes¶
fastvideo.train.models.hunyuan.hunyuan.HunyuanModel
¶HunyuanModel(*, init_from: str, training_config: TrainingConfig, trainable: bool = True, disable_custom_init_weights: bool = False, flow_shift: float = 7.0, enable_gradient_checkpointing_type: str | None = None, transformer_override_safetensor: str | None = None, lora: LoraConfig | dict[str, Any] | None = None)
Bases: WanModel
HunyuanVideo per-role model.
Inherits most behaviour from WanModel (noise scheduler, timestep sampling, attention metadata, backward). Overrides only the pieces that differ for Hunyuan.
Source code in fastvideo/train/models/hunyuan/hunyuan.py
fastvideo.train.models.hunyuan.hunyuan.HunyuanModel.ensure_negative_conditioning
¶Encode the negative prompt with dual text encoders (LLaMA + CLIP).
Every rank encodes independently to avoid NCCL deadlocks when only a subset of ranks would otherwise participate.
Source code in fastvideo/train/models/hunyuan/hunyuan.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 | |
fastvideo.train.models.hunyuan.hunyuan.HunyuanModel.prepare_batch
¶prepare_batch(raw_batch: dict[str, Any], *, generator: Generator, latents_source: Literal['data', 'zeros'] = 'data') -> TrainingBatch
Same flow as Wan, but uses Hunyuan VAE normalisation.
Source code in fastvideo/train/models/hunyuan/hunyuan.py
fastvideo.train.models.longcat
¶
LongCat model plugin package.
Classes¶
Modules¶
fastvideo.train.models.longcat.longcat
¶
LongCat model plugin (per-role instance).
Classes¶
fastvideo.train.models.longcat.longcat.LongCatModel
¶LongCatModel(*, init_from: str, training_config: TrainingConfig, trainable: bool = True, disable_custom_init_weights: bool = False, flow_shift: float = 12.0, enable_gradient_checkpointing_type: str | None = None, transformer_override_safetensor: str | None = None)
Bases: WanModel
LongCat per-role model for training and distillation.
Source code in fastvideo/train/models/longcat/longcat.py
fastvideo.train.models.longcat.longcat.LongCatModel.predict_noise
¶predict_noise(noisy_latents: Tensor, timestep: Tensor, batch: TrainingBatch, *, conditional: bool, cfg_uncond: dict[str, Any] | None = None, attn_kind: Literal['dense', 'vsa'] = 'dense') -> Tensor
Adapt LongCat's sign convention to FineTuneMethod's target.
LongCatTransformer3DModel is pretrained to output the
clean - noise direction; LongCatDenoisingStage (the
bidirectional inference pipeline) explicitly negates the
transformer output before handing it to
FlowMatchEulerDiscreteScheduler.step. Training methods on
the other hand (FineTuneMethod,
DiffusionForcingSFTMethod) target noise - clean
directly (the standard rectified-flow velocity Wan uses).
Without the negation here, the loss MSE pushes the transformer
toward noise - clean, flipping its native output sign over
training. Inference then applies its own negation on top, so
the scheduler receives the wrong direction and produces noise
even while the training loss is dropping. Verified empirically
on a 100-step LongCat overfit run: step 0 generated meaningful
video, step 100 was pure noise despite low loss.
Negating in predict_noise keeps the transformer's
pretrained sign convention intact while presenting the
training methods with a Wan-compatible
pred ≈ noise - clean for MSE.
Source code in fastvideo/train/models/longcat/longcat.py
fastvideo.train.models.wan
¶
Wan model plugin package.
Classes¶
Modules¶
fastvideo.train.models.wan.wan
¶
Wan model plugin (per-role instance).
Classes¶
fastvideo.train.models.wan.wan.WanModel
¶WanModel(*, init_from: str, training_config: TrainingConfig, trainable: bool = True, disable_custom_init_weights: bool = False, flow_shift: float = 3.0, enable_gradient_checkpointing_type: str | None = None, transformer_override_safetensor: str | None = None, lora: LoraConfig | dict[str, Any] | None = None)
Bases: ModelBase
Wan per-role model: owns transformer + noise_scheduler.
Source code in fastvideo/train/models/wan/wan.py
Functions:¶
fastvideo.train.models.wan.wan_causal
¶
Wan causal model plugin (per-role instance, streaming/cache).
Classes¶
fastvideo.train.models.wan.wan_causal.WanCausalModel
¶WanCausalModel(*, init_from: str, training_config: TrainingConfig, trainable: bool = True, disable_custom_init_weights: bool = False, flow_shift: float = 3.0, enable_gradient_checkpointing_type: str | None = None, transformer_override_safetensor: str | None = None, lora: LoraConfig | dict[str, Any] | None = None)
Bases: WanModel, CausalModelBase
Wan per-role model with causal/streaming primitives.