kd
¶
Knowledge Distillation method for ODE-init training.
Trains a student model with MSE loss to reproduce a teacher model's
multi-step ODE denoising trajectories. The resulting checkpoint
(exported via dcp_to_diffusers) serves as the ode_init weight
initialization for downstream Self-Forcing training.
Teacher path generation is cached to disk so it only runs once. Interrupted generation resumes from the last completed sample.
Typical YAML::
models:
student:
_target_: fastvideo.train.models.wan.WanModel
init_from: Wan-AI/Wan2.1-T2V-1.3B-Diffusers
trainable: true
teacher: # omit once cache is complete
_target_: fastvideo.train.models.wan.WanModel
init_from: Wan-AI/Wan2.1-T2V-14B-Diffusers
trainable: false
disable_custom_init_weights: true
method:
_target_: fastvideo.train.methods.knowledge_distillation.kd.KDMethod
teacher_path_cache: /data/kd_cache/wan14b_4step
t_list: [999, 937, 833, 624, 0] # integer timesteps
student_sample_steps: 4
teacher_guidance_scale: 1.0
Classes¶
fastvideo.train.methods.knowledge_distillation.kd.KDCausalMethod
¶
Bases: KDMethod
KD for causal Wan: per-frame block-quantized timestep sampling.
Identical to :class:KDMethod except single_train_step samples
a per-frame denoising step index (block-quantized to groups of
num_frames_per_block frames) instead of one index per batch.
This matches the legacy ODEInitTrainingPipeline training scheme
required by causal / streaming student models.
Additional YAML field under method::
num_frames_per_block: 3 # frames sharing the same noise level
Source code in fastvideo/train/methods/knowledge_distillation/kd.py
fastvideo.train.methods.knowledge_distillation.kd.KDMethod
¶
Bases: TrainingMethod
Knowledge Distillation training method.
Trains the student with MSE loss on teacher ODE trajectories cached
to method_config.teacher_path_cache.
Roles
student(required, trainable): the model being distilled.teacher(optional, non-trainable): used to generate the cache on first run; freed from GPU memory afterwards.
If the cache is incomplete and no teacher is configured, an error is raised at the start of training.