rl
¶
RL training methods.
Classes¶
fastvideo.train.methods.rl.DiffusionNFTMethod
¶
Bases: TrainingMethod
DiffusionNFT-style RL for diffusion models.
This method owns the algorithm's sample-then-inner-train loop. One
Trainer step corresponds to one DiffusionNFT outer epoch.
Source code in fastvideo/train/methods/rl/diffusion_nft.py
Modules¶
fastvideo.train.methods.rl.common
¶
Reusable RL training primitives.
Classes¶
fastvideo.train.methods.rl.common.DiffusionSampler
¶
DiffusionSampler(config: SamplingConfig)
Thin model/scheduler sampler used by RL methods.
This intentionally does not call FastVideo's full inference pipelines.
RL training needs a reusable sampling primitive that works with
ModelBase wrappers and scheduler math without binding a method to
model-family pipeline classes such as WanDMDPipeline.
Source code in fastvideo/train/methods/rl/common/sampling.py
fastvideo.train.methods.rl.common.KRepeatSample
dataclass
¶
Local prompt indices for one distributed K-repeat sampling batch.
fastvideo.train.methods.rl.common.SamplingConfig
dataclass
¶
SamplingConfig(num_steps: int = 25, scheduler: SchedulerName = 'model_default', trajectory: TrajectoryName = 'ode', flow_shift: float | None = None, timesteps: list[float] | None = None, sigmas: list[float] | None = None)
YAML-backed sampling knobs shared by RL methods.
Functions:¶
fastvideo.train.methods.rl.common.distributed_k_repeat_indices
¶
distributed_k_repeat_indices(*, dataset_length: int, batch_size: int, repeats_per_prompt: int, world_size: int, rank: int, seed: int) -> KRepeatSample
Mirror DiffusionNFT's distributed K-repeat prompt sampler.
Adapted from DiffusionNFT's
scripts/train_nft_sd3.py::DistributedKRepeatSampler.
Source code in fastvideo/train/methods/rl/common/prompt_sampling.py
fastvideo.train.methods.rl.common.media_to_video_array
¶
media_to_video_array(media: Tensor) -> Any
Convert decoded media to a tracker video array.
Accepts [C, T, H, W] tensors. [C, H, W] tensors are treated as
T=1 media. Output follows the existing tracker convention used
elsewhere in FastVideo: [T, C, H, W] uint8.
Source code in fastvideo/train/methods/rl/common/validation.py
fastvideo.train.methods.rl.common.validation_shard_indices
¶
Return fixed validation prompt indices for one distributed rank.
Source code in fastvideo/train/methods/rl/common/validation.py
Modules¶
fastvideo.train.methods.rl.common.prompt_sampling
¶
Prompt-row sampling helpers for online RL methods.
This module chooses and repeats dataset prompt rows across ranks for RL training batches. Here, "sampling" means selection, not generator sampling.
Classes¶
fastvideo.train.methods.rl.common.prompt_sampling.KRepeatSample
dataclass
¶Local prompt indices for one distributed K-repeat sampling batch.
Functions:¶
fastvideo.train.methods.rl.common.prompt_sampling.distributed_k_repeat_indices
¶distributed_k_repeat_indices(*, dataset_length: int, batch_size: int, repeats_per_prompt: int, world_size: int, rank: int, seed: int) -> KRepeatSample
Mirror DiffusionNFT's distributed K-repeat prompt sampler.
Adapted from DiffusionNFT's
scripts/train_nft_sd3.py::DistributedKRepeatSampler.
Source code in fastvideo/train/methods/rl/common/prompt_sampling.py
fastvideo.train.methods.rl.common.sampling
¶
Configurable diffusion samplers for RL training methods.
Classes¶
fastvideo.train.methods.rl.common.sampling.DiffusionSampler
¶DiffusionSampler(config: SamplingConfig)
Thin model/scheduler sampler used by RL methods.
This intentionally does not call FastVideo's full inference pipelines.
RL training needs a reusable sampling primitive that works with
ModelBase wrappers and scheduler math without binding a method to
model-family pipeline classes such as WanDMDPipeline.
Source code in fastvideo/train/methods/rl/common/sampling.py
fastvideo.train.methods.rl.common.sampling.SamplingConfig
dataclass
¶SamplingConfig(num_steps: int = 25, scheduler: SchedulerName = 'model_default', trajectory: TrajectoryName = 'ode', flow_shift: float | None = None, timesteps: list[float] | None = None, sigmas: list[float] | None = None)
YAML-backed sampling knobs shared by RL methods.
fastvideo.train.methods.rl.common.validation
¶
Shared validation helpers for RL training methods.
Functions:¶
fastvideo.train.methods.rl.common.validation.media_to_video_array
¶media_to_video_array(media: Tensor) -> Any
Convert decoded media to a tracker video array.
Accepts [C, T, H, W] tensors. [C, H, W] tensors are treated as
T=1 media. Output follows the existing tracker convention used
elsewhere in FastVideo: [T, C, H, W] uint8.
Source code in fastvideo/train/methods/rl/common/validation.py
fastvideo.train.methods.rl.common.validation.validation_shard_indices
¶Return fixed validation prompt indices for one distributed rank.
Source code in fastvideo/train/methods/rl/common/validation.py
fastvideo.train.methods.rl.diffusion_nft
¶
DiffusionNFT multi-reward policy optimization method.
Classes¶
fastvideo.train.methods.rl.diffusion_nft.DiffusionNFTMethod
¶
Bases: TrainingMethod
DiffusionNFT-style RL for diffusion models.
This method owns the algorithm's sample-then-inner-train loop. One
Trainer step corresponds to one DiffusionNFT outer epoch.
Source code in fastvideo/train/methods/rl/diffusion_nft.py
Functions:¶
fastvideo.train.methods.rl.rewards
¶
Reusable reward models for training methods.
Classes¶
fastvideo.train.methods.rl.rewards.ClipScoreScorer
¶
ClipScoreScorer(*, device: device | str = 'cuda')
Bases: Module
CLIPScore reward, matching DiffusionNFT normalization.
Ported from DiffusionNFT's flow_grpo/clip_scorer.py.
Source code in fastvideo/train/methods/rl/rewards/frame_rewards.py
fastvideo.train.methods.rl.rewards.MultiRewardScorer
¶
Weighted sum of reusable media reward scorers.
Mirrors DiffusionNFT's flow_grpo/rewards.py::multi_score behavior,
while leaving frame selection to each concrete reward.
Source code in fastvideo/train/methods/rl/rewards/media.py
fastvideo.train.methods.rl.rewards.PickScoreScorer
¶
PickScoreScorer(*, device: device | str = 'cuda', dtype: dtype = float32)
Bases: Module
PickScore reward, matching DiffusionNFT normalization.
Ported from DiffusionNFT's flow_grpo/pickscore_scorer.py.
Source code in fastvideo/train/methods/rl/rewards/frame_rewards.py
Functions:¶
fastvideo.train.methods.rl.rewards.select_first_frame
¶
Return first-frame media as [B, C, H, W].
This is a helper for reward models that are intrinsically frame-based
(for example PickScore and CLIPScore). Video-aware rewards should inspect
the full [B, C, T, H, W] tensor themselves.
Source code in fastvideo/train/methods/rl/rewards/media.py
Modules¶
fastvideo.train.methods.rl.rewards.frame_rewards
¶
Frame-based reward scorers used by RL training methods.
Classes¶
fastvideo.train.methods.rl.rewards.frame_rewards.ClipScoreScorer
¶ClipScoreScorer(*, device: device | str = 'cuda')
Bases: Module
CLIPScore reward, matching DiffusionNFT normalization.
Ported from DiffusionNFT's flow_grpo/clip_scorer.py.
Source code in fastvideo/train/methods/rl/rewards/frame_rewards.py
fastvideo.train.methods.rl.rewards.frame_rewards.PickScoreScorer
¶PickScoreScorer(*, device: device | str = 'cuda', dtype: dtype = float32)
Bases: Module
PickScore reward, matching DiffusionNFT normalization.
Ported from DiffusionNFT's flow_grpo/pickscore_scorer.py.
Source code in fastvideo/train/methods/rl/rewards/frame_rewards.py
Functions:¶
fastvideo.train.methods.rl.rewards.media
¶
Generic media reward composition utilities.
Classes¶
fastvideo.train.methods.rl.rewards.media.MultiRewardScorer
¶Weighted sum of reusable media reward scorers.
Mirrors DiffusionNFT's flow_grpo/rewards.py::multi_score behavior,
while leaving frame selection to each concrete reward.
Source code in fastvideo/train/methods/rl/rewards/media.py
Functions:¶
fastvideo.train.methods.rl.rewards.media.select_first_frame
¶Return first-frame media as [B, C, H, W].
This is a helper for reward models that are intrinsically frame-based
(for example PickScore and CLIPScore). Video-aware rewards should inspect
the full [B, C, T, H, W] tensor themselves.