train
¶
Modules¶
fastvideo.train.callbacks
¶
Classes¶
fastvideo.train.callbacks.Callback
¶
Base callback with no-op hooks.
Subclasses override whichever hooks they need. The
training_config and method attributes are set by
CallbackDict after instantiation.
fastvideo.train.callbacks.CallbackDict
¶
Manages a collection of named callbacks.
Instantiates each callback from its _target_ config and
dispatches hook calls to all registered callbacks.
Source code in fastvideo/train/callbacks/callback.py
fastvideo.train.callbacks.EMACallback
¶
Bases: Callback
Manage EMA shadow weights for the student transformer.
All configuration lives in the YAML callbacks.ema section:
.. code-block:: yaml
callbacks:
ema:
decay: 0.9999
start_iter: 0
The callback creates an EMA_FSDP instance at train start,
updates it after each optimizer step, and exposes an
ema_context() context manager for temporarily swapping
EMA weights into the live model (used by validation).
Source code in fastvideo/train/callbacks/ema.py
Functions¶
fastvideo.train.callbacks.EMACallback.ema_context
¶ema_context(transformer: Module) -> Generator[Module, None, None]
Temporarily swap EMA weights into transformer.
If EMA is not active, yields the transformer unchanged.
Source code in fastvideo/train/callbacks/ema.py
fastvideo.train.callbacks.GradNormClipCallback
¶
Bases: Callback
Clip gradient norms before the optimizer step.
max_grad_norm must be set explicitly in the callback
config (callbacks.grad_clip.max_grad_norm).
Source code in fastvideo/train/callbacks/grad_clip.py
fastvideo.train.callbacks.ValidationCallback
¶
ValidationCallback(*, pipeline_target: str, dataset_file: str, every_steps: int = 100, sampling_steps: list[int] | None = None, guidance_scale: float | None = None, num_frames: int | None = None, output_dir: str | None = None, sampling_timesteps: list[int] | None = None, **pipeline_kwargs: Any)
Bases: Callback
Generic validation callback driven entirely by YAML config.
Works with any pipeline that follows the
PipelineCls.from_pretrained(...) + pipeline.forward()
contract.
Source code in fastvideo/train/callbacks/validation.py
Modules¶
fastvideo.train.callbacks.callback
¶
Callback base class and CallbackDict manager.
Adapted from FastGen's callback pattern to FastVideo's types.
Classes¶
fastvideo.train.callbacks.callback.Callback
¶Base callback with no-op hooks.
Subclasses override whichever hooks they need. The
training_config and method attributes are set by
CallbackDict after instantiation.
fastvideo.train.callbacks.callback.CallbackDict
¶Manages a collection of named callbacks.
Instantiates each callback from its _target_ config and
dispatches hook calls to all registered callbacks.
Source code in fastvideo/train/callbacks/callback.py
Functions¶
fastvideo.train.callbacks.ema
¶
EMA (Exponential Moving Average) callback.
Owns the full EMA lifecycle: creation, per-step updates, weight
swapping for validation, and checkpoint state. All EMA config
lives under callbacks.ema in the YAML file.
Classes¶
fastvideo.train.callbacks.ema.EMACallback
¶
Bases: Callback
Manage EMA shadow weights for the student transformer.
All configuration lives in the YAML callbacks.ema section:
.. code-block:: yaml
callbacks:
ema:
decay: 0.9999
start_iter: 0
The callback creates an EMA_FSDP instance at train start,
updates it after each optimizer step, and exposes an
ema_context() context manager for temporarily swapping
EMA weights into the live model (used by validation).
Source code in fastvideo/train/callbacks/ema.py
fastvideo.train.callbacks.ema.EMACallback.ema_context
¶ema_context(transformer: Module) -> Generator[Module, None, None]
Temporarily swap EMA weights into transformer.
If EMA is not active, yields the transformer unchanged.
Source code in fastvideo/train/callbacks/ema.py
Functions¶
fastvideo.train.callbacks.grad_clip
¶
Gradient norm clipping callback.
Clips gradients on modules returned by
method.get_grad_clip_targets() before the optimizer step.
Optionally logs per-module grad norms to the tracker.
fastvideo.train.callbacks.validation
¶
Validation callback.
All configuration is read from the YAML callbacks.validation
section. The pipeline class is resolved from
pipeline_target.
Classes¶
fastvideo.train.callbacks.validation.ValidationCallback
¶ValidationCallback(*, pipeline_target: str, dataset_file: str, every_steps: int = 100, sampling_steps: list[int] | None = None, guidance_scale: float | None = None, num_frames: int | None = None, output_dir: str | None = None, sampling_timesteps: list[int] | None = None, **pipeline_kwargs: Any)
Bases: Callback
Generic validation callback driven entirely by YAML config.
Works with any pipeline that follows the
PipelineCls.from_pretrained(...) + pipeline.forward()
contract.
Source code in fastvideo/train/callbacks/validation.py
Functions¶
fastvideo.train.entrypoint
¶
Modules¶
fastvideo.train.entrypoint.dcp_to_diffusers
¶
Convert a DCP training checkpoint to a diffusers-style model directory.
Works on a single GPU regardless of how many GPUs were used for training (DCP handles resharding automatically).
Usage (no torchrun needed)::
python -m fastvideo.train.entrypoint.dcp_to_diffusers --checkpoint /path/to/checkpoint-1000 --output-dir /path/to/diffusers_output
Or with torchrun (also fine)::
torchrun --nproc_per_node=1 -m fastvideo.train.entrypoint.dcp_to_diffusers --checkpoint ... --output-dir ...
The checkpoint must contain metadata.json (written by
CheckpointManager). If the checkpoint predates metadata
support, pass --config explicitly to provide the training
YAML.
Functions¶
fastvideo.train.entrypoint.dcp_to_diffusers.convert
¶convert(*, checkpoint_dir: str, output_dir: str, config_path: str | None = None, role: str = 'student', overwrite: bool = False) -> str
Load a DCP checkpoint and export as a diffusers model.
Returns the path to the exported model directory.
Source code in fastvideo/train/entrypoint/dcp_to_diffusers.py
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 | |
fastvideo.train.entrypoint.misc
¶
Modules¶
fastvideo.train.entrypoint.misc.wan_ode_init_conversion
¶Convert Self-Forcing ode_init.pt to HuggingFace diffusers format.
The official ode_init.pt from
https://huggingface.co/gdhe17/Self-Forcing/resolve/main/checkpoints/ode_init.pt
stores weights under {"generator": {<original_wan_keys>}}.
This script converts those keys to diffusers
WanTransformer3DModel format, verifies them against a reference
model, and saves a complete diffusers-compatible model directory
(transformer + scheduler + vae + text_encoder + tokenizer).
Usage
python -m fastvideo.train.entrypoint.misc.wan_ode_init_conversion --input /path/to/ode_init.pt --output /path/to/WanOdeInit --base-model Wan-AI/Wan2.1-T2V-1.3B-Diffusers
fastvideo.train.entrypoint.train
¶
YAML-only training entrypoint.
Usage::
torchrun --nproc_per_node=<N> -m fastvideo.train.entrypoint.train --config path/to/run.yaml
Any unknown --dotted.key value arguments are applied as
overrides to the YAML config before parsing. For example::
torchrun --nproc_per_node=8 -m fastvideo.train.entrypoint.train --config path/to/run.yaml --training.distributed.num_gpus 8 --training.optimizer.learning_rate 1e-5
Functions¶
fastvideo.train.entrypoint.train.run_training_from_config
¶run_training_from_config(config_path: str, *, dry_run: bool = False, overrides: list[str] | None = None) -> None
YAML-only training entrypoint (schema v2).
Source code in fastvideo/train/entrypoint/train.py
fastvideo.train.methods
¶
Classes¶
fastvideo.train.methods.TrainingMethod
¶
Bases: Module, ABC
Base training method (algorithm layer).
Subclasses own their role models (student, teacher, critic, …) as
plain attributes and manage optimizers directly — no RoleManager
or RoleHandle.
The constructor receives role_models (a dict[str, ModelBase])
and a cfg object. It calls init_preprocessors on the student
and builds self.role_modules for FSDP wrapping.
A single shared CUDA RNG generator (cuda_generator) is
created in :meth:on_train_start. All torch.randn /
torch.randint calls in methods and models must use this
generator instead of relying on global RNG state.
Source code in fastvideo/train/methods/base.py
Functions¶
fastvideo.train.methods.TrainingMethod.checkpoint_state
¶Return DCP-ready checkpoint state for all trainable roles.
Keys follow the convention:
roles.<role>.<module>, optimizers.<role>,
schedulers.<role>, random_state.*.
EMA state is managed by the EMACallback and is
checkpointed through the callback state mechanism.
Source code in fastvideo/train/methods/base.py
fastvideo.train.methods.TrainingMethod.get_grad_clip_targets
¶Return modules whose gradients should be clipped.
Override in subclasses to add/conditionally include modules (e.g. critic, conditionally student). Default: student transformer.
Source code in fastvideo/train/methods/base.py
fastvideo.train.methods.TrainingMethod.seed_optimizer_state_for_resume
¶Seed optimizer state so DCP can load saved state.
A fresh optimizer has empty state (exp_avg, exp_avg_sq, step are only created on the first optimizer.step()). DCP needs matching entries to load into; without them the saved optimizer state is silently dropped.
Source code in fastvideo/train/methods/base.py
Modules¶
fastvideo.train.methods.base
¶
Classes¶
fastvideo.train.methods.base.TrainingMethod
¶
Bases: Module, ABC
Base training method (algorithm layer).
Subclasses own their role models (student, teacher, critic, …) as
plain attributes and manage optimizers directly — no RoleManager
or RoleHandle.
The constructor receives role_models (a dict[str, ModelBase])
and a cfg object. It calls init_preprocessors on the student
and builds self.role_modules for FSDP wrapping.
A single shared CUDA RNG generator (cuda_generator) is
created in :meth:on_train_start. All torch.randn /
torch.randint calls in methods and models must use this
generator instead of relying on global RNG state.
Source code in fastvideo/train/methods/base.py
fastvideo.train.methods.base.TrainingMethod.checkpoint_state
¶Return DCP-ready checkpoint state for all trainable roles.
Keys follow the convention:
roles.<role>.<module>, optimizers.<role>,
schedulers.<role>, random_state.*.
EMA state is managed by the EMACallback and is
checkpointed through the callback state mechanism.
Source code in fastvideo/train/methods/base.py
fastvideo.train.methods.base.TrainingMethod.get_grad_clip_targets
¶Return modules whose gradients should be clipped.
Override in subclasses to add/conditionally include modules (e.g. critic, conditionally student). Default: student transformer.
Source code in fastvideo/train/methods/base.py
fastvideo.train.methods.base.TrainingMethod.seed_optimizer_state_for_resume
¶Seed optimizer state so DCP can load saved state.
A fresh optimizer has empty state (exp_avg, exp_avg_sq, step are only created on the first optimizer.step()). DCP needs matching entries to load into; without them the saved optimizer state is silently dropped.
Source code in fastvideo/train/methods/base.py
Functions¶
fastvideo.train.methods.distribution_matching
¶
Classes¶
fastvideo.train.methods.distribution_matching.DMD2Method
¶
Bases: TrainingMethod
DMD2 distillation algorithm (method layer).
Owns role model instances directly:
- self.student — trainable student :class:ModelBase
- self.teacher — frozen teacher :class:ModelBase
- self.critic — trainable critic :class:ModelBase
Source code in fastvideo/train/methods/distribution_matching/dmd2.py
fastvideo.train.methods.distribution_matching.SelfForcingMethod
¶
Bases: DMD2Method
Self-Forcing DMD2 (distribution matching) method.
Requires a causal student implementing CausalModelBase.
Source code in fastvideo/train/methods/distribution_matching/self_forcing.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | |
Modules¶
fastvideo.train.methods.distribution_matching.dmd2
¶DMD2 distillation method (algorithm layer).
fastvideo.train.methods.distribution_matching.dmd2.DMD2Method
¶
Bases: TrainingMethod
DMD2 distillation algorithm (method layer).
Owns role model instances directly:
- self.student — trainable student :class:ModelBase
- self.teacher — frozen teacher :class:ModelBase
- self.critic — trainable critic :class:ModelBase
Source code in fastvideo/train/methods/distribution_matching/dmd2.py
fastvideo.train.methods.distribution_matching.self_forcing
¶Self-Forcing distillation method (algorithm layer).
fastvideo.train.methods.distribution_matching.self_forcing.SelfForcingMethod
¶
Bases: DMD2Method
Self-Forcing DMD2 (distribution matching) method.
Requires a causal student implementing CausalModelBase.
Source code in fastvideo/train/methods/distribution_matching/self_forcing.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | |
fastvideo.train.methods.fine_tuning
¶
Classes¶
fastvideo.train.methods.fine_tuning.DiffusionForcingSFTMethod
¶
Bases: TrainingMethod
Diffusion-forcing SFT (DFSFT): train only student
with inhomogeneous timesteps.
Source code in fastvideo/train/methods/fine_tuning/dfsft.py
fastvideo.train.methods.fine_tuning.FineTuneMethod
¶
Bases: TrainingMethod
Supervised finetuning: only student participates.
Source code in fastvideo/train/methods/fine_tuning/finetune.py
Modules¶
fastvideo.train.methods.fine_tuning.dfsft
¶Diffusion-forcing SFT method (DFSFT; algorithm layer).
fastvideo.train.methods.fine_tuning.dfsft.DiffusionForcingSFTMethod
¶
Bases: TrainingMethod
Diffusion-forcing SFT (DFSFT): train only student
with inhomogeneous timesteps.
Source code in fastvideo/train/methods/fine_tuning/dfsft.py
fastvideo.train.methods.fine_tuning.finetune
¶Supervised finetuning method (algorithm layer).
fastvideo.train.methods.fine_tuning.finetune.FineTuneMethod
¶
Bases: TrainingMethod
Supervised finetuning: only student participates.
Source code in fastvideo/train/methods/fine_tuning/finetune.py
fastvideo.train.methods.knowledge_distillation
¶
Classes¶
fastvideo.train.methods.knowledge_distillation.KDCausalMethod
¶
Bases: KDMethod
KD for causal Wan: per-frame block-quantized timestep sampling.
Identical to :class:KDMethod except single_train_step samples
a per-frame denoising step index (block-quantized to groups of
num_frames_per_block frames) instead of one index per batch.
This matches the legacy ODEInitTrainingPipeline training scheme
required by causal / streaming student models.
Additional YAML field under method::
num_frames_per_block: 3 # frames sharing the same noise level
Source code in fastvideo/train/methods/knowledge_distillation/kd.py
fastvideo.train.methods.knowledge_distillation.KDMethod
¶
Bases: TrainingMethod
Knowledge Distillation training method.
Trains the student with MSE loss on teacher ODE trajectories cached
to method_config.teacher_path_cache.
Roles
student(required, trainable): the model being distilled.teacher(optional, non-trainable): used to generate the cache on first run; freed from GPU memory afterwards.
If the cache is incomplete and no teacher is configured, an error is raised at the start of training.
Source code in fastvideo/train/methods/knowledge_distillation/kd.py
Modules¶
fastvideo.train.methods.knowledge_distillation.kd
¶Knowledge Distillation method for ODE-init training.
Trains a student model with MSE loss to reproduce a teacher model's
multi-step ODE denoising trajectories. The resulting checkpoint
(exported via dcp_to_diffusers) serves as the ode_init weight
initialization for downstream Self-Forcing training.
Teacher path generation is cached to disk so it only runs once. Interrupted generation resumes from the last completed sample.
Typical YAML::
models:
student:
_target_: fastvideo.train.models.wan.WanModel
init_from: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
trainable: true
teacher: # omit once cache is complete
_target_: fastvideo.train.models.wan.WanModel
init_from: Wan-AI/Wan2.1-T2V-14B-Diffusers
trainable: false
disable_custom_init_weights: true
method:
_target_: fastvideo.train.methods.knowledge_distillation.kd.KDMethod
teacher_path_cache: /data/kd_cache/wan14b_4step
t_list: [999, 937, 833, 624, 0] # integer timesteps
student_sample_steps: 4
teacher_guidance_scale: 1.0
fastvideo.train.methods.knowledge_distillation.kd.KDCausalMethod
¶
Bases: KDMethod
KD for causal Wan: per-frame block-quantized timestep sampling.
Identical to :class:KDMethod except single_train_step samples
a per-frame denoising step index (block-quantized to groups of
num_frames_per_block frames) instead of one index per batch.
This matches the legacy ODEInitTrainingPipeline training scheme
required by causal / streaming student models.
Additional YAML field under method::
num_frames_per_block: 3 # frames sharing the same noise level
Source code in fastvideo/train/methods/knowledge_distillation/kd.py
fastvideo.train.methods.knowledge_distillation.kd.KDMethod
¶
Bases: TrainingMethod
Knowledge Distillation training method.
Trains the student with MSE loss on teacher ODE trajectories cached
to method_config.teacher_path_cache.
Roles
student(required, trainable): the model being distilled.teacher(optional, non-trainable): used to generate the cache on first run; freed from GPU memory afterwards.
If the cache is incomplete and no teacher is configured, an error is raised at the start of training.
Source code in fastvideo/train/methods/knowledge_distillation/kd.py
fastvideo.train.models
¶
Model build plugins for Phase 2/2.9 distillation.
These are "model plugins" selected by recipe.family / roles.<role>.family.
Modules¶
fastvideo.train.models.base
¶
Classes¶
fastvideo.train.models.base.CausalModelBase
¶
Bases: ModelBase
Extension for causal / streaming model plugins.
Cache state is internal to the model instance and keyed by cache_tag (no role handle needed).
fastvideo.train.models.base.CausalModelBase.clear_caches
abstractmethod
¶clear_caches(*, cache_tag: str = 'pos') -> None
fastvideo.train.models.base.CausalModelBase.predict_noise_streaming
abstractmethod
¶predict_noise_streaming(noisy_latents: Tensor, timestep: Tensor, batch: TrainingBatch, *, conditional: bool, cache_tag: str = 'pos', store_kv: bool = False, cur_start_frame: int = 0, cfg_uncond: dict[str, Any] | None = None, attn_kind: Literal['dense', 'vsa'] = 'dense') -> Tensor | None
Streaming predict-noise that may update internal caches.
Source code in fastvideo/train/models/base.py
fastvideo.train.models.base.CausalModelBase.predict_x0_streaming
¶predict_x0_streaming(noisy_latents: Tensor, timestep: Tensor, batch: TrainingBatch, *, conditional: bool, cache_tag: str = 'pos', store_kv: bool = False, cur_start_frame: int = 0, cfg_uncond: dict[str, Any] | None = None, attn_kind: Literal['dense', 'vsa'] = 'dense') -> Tensor | None
Predict x0 streaming via
predict_noise_streaming + conversion.
Source code in fastvideo/train/models/base.py
fastvideo.train.models.base.ModelBase
¶
Bases: ABC
Per-role model instance.
Every role (student, teacher, critic, …) gets its own ModelBase
instance. Each instance owns its own transformer and
noise_scheduler. Heavyweight resources (VAE, dataloader, RNG
seeds) are loaded lazily via :meth:init_preprocessors, which the
method calls only on the student.
fastvideo.train.models.base.ModelBase.device
property
¶The local CUDA device for this rank.
fastvideo.train.models.base.ModelBase.num_train_timesteps
property
¶num_train_timesteps: int
Return the scheduler's training timestep horizon.
fastvideo.train.models.base.ModelBase.add_noise
abstractmethod
¶ fastvideo.train.models.base.ModelBase.backward
abstractmethod
¶ fastvideo.train.models.base.ModelBase.init_preprocessors
¶Load VAE, build dataloader, seed RNGs.
Called only on the student by the method's __init__.
Default is a no-op so teacher/critic instances skip this.
Source code in fastvideo/train/models/base.py
fastvideo.train.models.base.ModelBase.on_train_start
¶ fastvideo.train.models.base.ModelBase.predict_noise
abstractmethod
¶predict_noise(noisy_latents: Tensor, timestep: Tensor, batch: TrainingBatch, *, conditional: bool, cfg_uncond: dict[str, Any] | None = None, attn_kind: Literal['dense', 'vsa'] = 'dense') -> Tensor
Predict noise/flow for the given noisy latents.
Source code in fastvideo/train/models/base.py
fastvideo.train.models.base.ModelBase.predict_x0
¶predict_x0(noisy_latents: Tensor, timestep: Tensor, batch: TrainingBatch, *, conditional: bool, cfg_uncond: dict[str, Any] | None = None, attn_kind: Literal['dense', 'vsa'] = 'dense') -> Tensor
Predict x0 via predict_noise + conversion.
Source code in fastvideo/train/models/base.py
fastvideo.train.models.base.ModelBase.prepare_batch
abstractmethod
¶prepare_batch(raw_batch: dict[str, Any], *, generator: Generator, latents_source: Literal['data', 'zeros'] = 'data') -> TrainingBatch
Convert a dataloader batch into forward primitives.
fastvideo.train.models.base.ModelBase.shift_and_clamp_timestep
¶Functions¶
fastvideo.train.models.wan
¶
Wan model plugin package.
Classes¶
Modules¶
fastvideo.train.models.wan.wan
¶Wan model plugin (per-role instance).
fastvideo.train.models.wan.wan.WanModel
¶WanModel(*, init_from: str, training_config: TrainingConfig, trainable: bool = True, disable_custom_init_weights: bool = False, flow_shift: float = 3.0, enable_gradient_checkpointing_type: str | None = None, transformer_override_safetensor: str | None = None)
Bases: ModelBase
Wan per-role model: owns transformer + noise_scheduler.
Source code in fastvideo/train/models/wan/wan.py
fastvideo.train.models.wan.wan_causal
¶Wan causal model plugin (per-role instance, streaming/cache).
fastvideo.train.models.wan.wan_causal.WanCausalModel
¶WanCausalModel(*, init_from: str, training_config: TrainingConfig, trainable: bool = True, disable_custom_init_weights: bool = False, flow_shift: float = 3.0, enable_gradient_checkpointing_type: str | None = None, transformer_override_safetensor: str | None = None)
Bases: WanModel, CausalModelBase
Wan per-role model with causal/streaming primitives.
Source code in fastvideo/train/models/wan/wan_causal.py
fastvideo.train.utils
¶
Distillation utilities shared across families/methods/entrypoints.
Modules¶
fastvideo.train.utils.builder
¶
Assembly: build method + dataloader from a _target_-based config.
Classes¶
Functions¶
fastvideo.train.utils.builder.build_from_config
¶build_from_config(cfg: RunConfig) -> tuple[TrainingConfig, TrainingMethod, Any, int]
Build method + dataloader from a v3 run config.
- Instantiate each model in
cfg.modelsvia_target_. - Resolve the method class from
cfg.method["_target_"]and construct it with(cfg=cfg, role_models=...). - Return
(training_args, method, dataloader, start_step).
Source code in fastvideo/train/utils/builder.py
fastvideo.train.utils.checkpoint
¶
Classes¶
fastvideo.train.utils.checkpoint.CheckpointManager
¶CheckpointManager(*, method: Any, dataloader: Any, output_dir: str, config: CheckpointConfig, callbacks: Any | None = None, raw_config: dict[str, Any] | None = None)
Role-based checkpoint manager for training runtime.
- Checkpoint policy lives in YAML (via TrainingArgs fields).
- Resume path is typically provided via CLI (
--resume-from-checkpoint).
Source code in fastvideo/train/utils/checkpoint.py
fastvideo.train.utils.checkpoint.CheckpointManager.load_metadata
staticmethod
¶Read metadata.json from a checkpoint dir.
Source code in fastvideo/train/utils/checkpoint.py
fastvideo.train.utils.checkpoint.CheckpointManager.load_rng_snapshot
¶load_rng_snapshot(checkpoint_path: str) -> None
Restore per-rank RNG state from the snapshot file.
Must be called AFTER dcp.load and after
iter(dataloader) so no later operation can
clobber the restored state.
Source code in fastvideo/train/utils/checkpoint.py
Functions¶
fastvideo.train.utils.config
¶
Training run config (_target_ based YAML).
Classes¶
fastvideo.train.utils.config.RunConfig
dataclass
¶RunConfig(models: dict[str, dict[str, Any]], method: dict[str, Any], training: TrainingConfig, callbacks: dict[str, dict[str, Any]], raw: dict[str, Any])
Parsed run config loaded from YAML.
fastvideo.train.utils.config.RunConfig.resolved_config
¶Return a fully-resolved config dict with defaults.
Suitable for logging to W&B so that every parameter (including defaults) is visible.
Source code in fastvideo/train/utils/config.py
Functions¶
fastvideo.train.utils.config.load_run_config
¶Load a run config from YAML.
Expected top-level keys: models, method,
training (nested), and optionally callbacks
and pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to the YAML config file. |
required |
overrides
|
list[str] | None
|
Optional list of CLI override tokens,
e.g. |
None
|
Source code in fastvideo/train/utils/config.py
fastvideo.train.utils.config.require_bool
¶require_bool(mapping: dict[str, Any], key: str, *, default: bool | None = None, where: str | None = None) -> bool
Read a bool value.
Source code in fastvideo/train/utils/config.py
fastvideo.train.utils.config.require_choice
¶require_choice(mapping: dict[str, Any], key: str, choices: set[str] | frozenset[str], *, default: str | None = None, where: str | None = None) -> str
Read a string that must be one of choices.
Source code in fastvideo/train/utils/config.py
fastvideo.train.utils.config.require_non_negative_float
¶require_non_negative_float(mapping: dict[str, Any], key: str, *, default: float | None = None, where: str | None = None) -> float
Read a float that must be >= 0.
Source code in fastvideo/train/utils/config.py
fastvideo.train.utils.config.require_non_negative_int
¶require_non_negative_int(mapping: dict[str, Any], key: str, *, default: int | None = None, where: str | None = None) -> int
Read an int that must be >= 0.
Source code in fastvideo/train/utils/config.py
fastvideo.train.utils.config.require_positive_int
¶require_positive_int(mapping: dict[str, Any], key: str, *, default: int | None = None, where: str | None = None) -> int
Read an int that must be > 0.
Source code in fastvideo/train/utils/config.py
fastvideo.train.utils.dataloader
¶
Functions¶
fastvideo.train.utils.dataloader.build_parquet_t2v_train_dataloader
¶build_parquet_t2v_train_dataloader(data_config: DataConfig, *, text_len: int, parquet_schema: Any) -> Any
Build a parquet dataloader for T2V-style datasets.
Source code in fastvideo/train/utils/dataloader.py
fastvideo.train.utils.instantiate
¶
_target_-based instantiation utilities.
These helpers resolve a dotted Python path to a class and instantiate it,
filtering constructor kwargs through inspect.signature so that only
recognized parameters are forwarded. Unrecognized keys emit a warning
rather than raising — this keeps YAML configs forward-compatible when
a class drops a parameter in a later version.
Functions¶
fastvideo.train.utils.instantiate.instantiate
¶Instantiate the class specified by cfg["_target_"].
All remaining keys in cfg (minus _target_) plus any extra
keyword arguments are forwarded to the constructor. Keys that do
not match an __init__ parameter are silently warned about and
dropped, so callers can safely pass a superset.
Source code in fastvideo/train/utils/instantiate.py
fastvideo.train.utils.instantiate.resolve_target
¶Import and return the class (or callable) at target.
target must be a fully-qualified dotted path, e.g.
"fastvideo.train.models.wan.wan.WanModel".
Source code in fastvideo/train/utils/instantiate.py
fastvideo.train.utils.module_state
¶
fastvideo.train.utils.moduleloader
¶
Classes¶
Functions¶
fastvideo.train.utils.moduleloader.load_module_from_path
¶load_module_from_path(*, model_path: str, module_type: str, training_config: TrainingConfig, disable_custom_init_weights: bool = False, override_transformer_cls_name: str | None = None, transformer_override_safetensor: str | None = None) -> Module
Load a single pipeline component module.
Accepts a TrainingConfig and internally builds the
TrainingArgs needed by PipelineComponentLoader.
Source code in fastvideo/train/utils/moduleloader.py
fastvideo.train.utils.moduleloader.make_inference_args
¶make_inference_args(tc: TrainingConfig, *, model_path: str) -> TrainingArgs
Build a TrainingArgs for inference (validation / pipelines).
Source code in fastvideo/train/utils/moduleloader.py
fastvideo.train.utils.optimizer
¶
Functions¶
fastvideo.train.utils.optimizer.build_optimizer_and_scheduler
¶build_optimizer_and_scheduler(*, params: list[Parameter], optimizer_config: OptimizerConfig, loop_config: TrainingLoopConfig, learning_rate: float, betas: tuple[float, float], scheduler_name: str) -> tuple[Optimizer, object]
Build an AdamW optimizer and LR scheduler.
Returns (optimizer, lr_scheduler) so the caller can store them
as method-level attributes.
Source code in fastvideo/train/utils/optimizer.py
fastvideo.train.utils.tracking
¶
Functions¶
fastvideo.train.utils.tracking.build_tracker
¶build_tracker(tracker_config: TrackerConfig, checkpoint_config: CheckpointConfig, *, config: dict[str, Any] | None) -> Any
Build a tracker instance for a distillation run.