vaes
¶
Classes¶
fastvideo.configs.models.vaes.Cosmos25VAEConfig
dataclass
¶
Cosmos25VAEConfig(arch_config: Cosmos25VAEArchConfig = Cosmos25VAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = False, use_temporal_tiling: bool = False, use_parallel_tiling: bool = False, use_temporal_scaling_frames: bool = True, use_feature_cache: bool = True)
fastvideo.configs.models.vaes.Flux2VAEConfig
dataclass
¶
Flux2VAEConfig(arch_config: Flux2VAEArchConfig = Flux2VAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = False, use_temporal_tiling: bool = False, use_parallel_tiling: bool = False, use_temporal_scaling_frames: bool = True)
fastvideo.configs.models.vaes.GameCraftVAEConfig
dataclass
¶
GameCraftVAEConfig(arch_config: VAEArchConfig = GameCraftVAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = True, use_temporal_tiling: bool = True, use_parallel_tiling: bool = True, use_temporal_scaling_frames: bool = True)
fastvideo.configs.models.vaes.Gen3CVAEConfig
dataclass
¶
Gen3CVAEConfig(arch_config: CosmosVAEArchConfig = CosmosVAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = False, use_temporal_tiling: bool = False, use_parallel_tiling: bool = False, use_temporal_scaling_frames: bool = True, use_feature_cache: bool = True)
Bases: CosmosVAEConfig
GEN3C VAE config placeholder.
GEN3C uses tokenizer-backed VAE loading logic at runtime, but we keep a model-specific config class so pipeline/model configs stay model-scoped.
fastvideo.configs.models.vaes.OobleckVAEArchConfig
dataclass
¶
OobleckVAEArchConfig(stacked_params_mapping: list[tuple[str, str, str]] = list(), scaling_factor: float | Tensor = 0, temporal_compression_ratio: int = 4, spatial_compression_ratio: int = 8, architectures: list[str] = (lambda: ['AutoencoderOobleck'])(), encoder_hidden_size: int = 128, downsampling_ratios: list[int] = (lambda: [2, 4, 4, 8, 8])(), channel_multiples: list[int] = (lambda: [1, 2, 4, 8, 16])(), decoder_channels: int = 128, decoder_input_channels: int = 64, audio_channels: int = 2, sampling_rate: int = 44100)
Bases: VAEArchConfig
Stable Audio Open 1.0 VAE architecture constants.
fastvideo.configs.models.vaes.OobleckVAEConfig
dataclass
¶
OobleckVAEConfig(arch_config: VAEArchConfig = OobleckVAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = False, use_temporal_tiling: bool = False, use_parallel_tiling: bool = False, use_temporal_scaling_frames: bool = True, pretrained_path: str = 'stabilityai/stable-audio-open-1.0', pretrained_subfolder: str = 'vae', pretrained_dtype: str = 'float16')
Bases: VAEConfig
FastVideo VAE config wrapping the Oobleck arch.
Audio VAEs don't use the temporal/spatial tiling defaults that the base VAEConfig is shaped for (those exist for video VAEs); they are retained but irrelevant for audio.
Modules¶
fastvideo.configs.models.vaes.base
¶
Classes¶
fastvideo.configs.models.vaes.base.VAEConfig
dataclass
¶
VAEConfig(arch_config: VAEArchConfig = VAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = True, use_temporal_tiling: bool = True, use_parallel_tiling: bool = True, use_temporal_scaling_frames: bool = True)
Bases: ModelConfig
Methods:¶
fastvideo.configs.models.vaes.base.VAEConfig.add_cli_args
staticmethod
¶Add CLI arguments for VAEConfig fields
Source code in fastvideo/configs/models/vaes/base.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
fastvideo.configs.models.vaes.cosmos2_5vae
¶
Cosmos 2.5 (Wan2.1-style) VAE config and checkpoint-key mapping.
Classes¶
fastvideo.configs.models.vaes.cosmos2_5vae.Cosmos25VAEArchConfig
dataclass
¶
Cosmos25VAEArchConfig(stacked_params_mapping: list[tuple[str, str, str]] = list(), scaling_factor: float | Tensor = 0, temporal_compression_ratio: int = 4, spatial_compression_ratio: int = 8, _name_or_path: str = '', base_dim: int = 96, decoder_base_dim: int | None = None, z_dim: int = 16, dim_mult: tuple[int, ...] = (1, 2, 4, 4), num_res_blocks: int = 2, attn_scales: tuple[float, ...] = (), temperal_downsample: tuple[bool, ...] = (False, True, True), dropout: float = 0.0, is_residual: bool = False, in_channels: int = 3, out_channels: int = 3, patch_size: int | None = None, scale_factor_temporal: int = 4, scale_factor_spatial: int = 8, clip_output: bool = True, latents_mean: tuple[float, ...] = (-0.7571, -0.7089, -0.9113, 0.1075, -0.1745, 0.9653, -0.1517, 1.5508, 0.4134, -0.0715, 0.5517, -0.3632, -0.1922, -0.9497, 0.2503, -0.2921), latents_std: tuple[float, ...] = (2.8184, 1.4541, 2.3275, 2.6558, 1.2196, 1.7708, 2.6052, 2.0743, 3.2687, 2.1526, 2.8652, 1.5579, 1.6382, 1.1253, 2.8251, 1.916), param_names_mapping: dict[str, str] = (lambda: {'^conv1\\.(.*)$': 'quant_conv.\\1', '^conv2\\.(.*)$': 'post_quant_conv.\\1', '^encoder\\.conv1\\.(.*)$': 'encoder.conv_in.\\1', '^decoder\\.conv1\\.(.*)$': 'decoder.conv_in.\\1', '^encoder\\.head\\.0\\.gamma$': 'encoder.norm_out.gamma', '^encoder\\.head\\.2\\.(.*)$': 'encoder.conv_out.\\1', '^decoder\\.head\\.0\\.gamma$': 'decoder.norm_out.gamma', '^decoder\\.head\\.2\\.(.*)$': 'decoder.conv_out.\\1'})())
Bases: VAEArchConfig
Methods:¶
fastvideo.configs.models.vaes.cosmos2_5vae.Cosmos25VAEArchConfig.map_official_key
staticmethod
¶Map a single official checkpoint key into FastVideo key space.
Source code in fastvideo/configs/models/vaes/cosmos2_5vae.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
fastvideo.configs.models.vaes.cosmos2_5vae.Cosmos25VAEConfig
dataclass
¶
Cosmos25VAEConfig(arch_config: Cosmos25VAEArchConfig = Cosmos25VAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = False, use_temporal_tiling: bool = False, use_parallel_tiling: bool = False, use_temporal_scaling_frames: bool = True, use_feature_cache: bool = True)
fastvideo.configs.models.vaes.flux2vae
¶
Classes¶
fastvideo.configs.models.vaes.flux2vae.Flux2VAEArchConfig
dataclass
¶
Flux2VAEArchConfig(stacked_params_mapping: list[tuple[str, str, str]] = list(), scaling_factor: float = 0.13025, temporal_compression_ratio: int = 1, spatial_compression_ratio: int = 8, in_channels: int = 3, out_channels: int = 3, down_block_types: tuple[str, ...] = ('DownEncoderBlock2D', 'DownEncoderBlock2D', 'DownEncoderBlock2D', 'AttnDownEncoderBlock2D'), up_block_types: tuple[str, ...] = ('AttnUpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D', 'UpDecoderBlock2D'), block_out_channels: tuple[int, ...] = (128, 256, 512, 512), layers_per_block: int = 2, act_fn: str = 'silu', latent_channels: int = 16, norm_num_groups: int = 32, sample_size: int = 512, force_upcast: bool = False, use_quant_conv: bool = True, use_post_quant_conv: bool = True, mid_block_add_attention: bool = True, batch_norm_eps: float = 1e-05, batch_norm_momentum: float = 0.1, patch_size: tuple[int, int] = (1, 1))
Bases: VAEArchConfig
Architecture configuration for Flux2 VAE model.
fastvideo.configs.models.vaes.flux2vae.Flux2VAEConfig
dataclass
¶
Flux2VAEConfig(arch_config: Flux2VAEArchConfig = Flux2VAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = False, use_temporal_tiling: bool = False, use_parallel_tiling: bool = False, use_temporal_scaling_frames: bool = True)
fastvideo.configs.models.vaes.gamecraftvae
¶
GameCraft VAE config - matches official config.json from Hunyuan-GameCraft-1.0.
Classes¶
fastvideo.configs.models.vaes.gamecraftvae.GameCraftVAEArchConfig
dataclass
¶
GameCraftVAEArchConfig(stacked_params_mapping: list[tuple[str, str, str]] = list(), scaling_factor: float = 0.476986, temporal_compression_ratio: int = 4, spatial_compression_ratio: int = 8, in_channels: int = 3, out_channels: int = 3, latent_channels: int = 16, down_block_types: tuple[str, ...] = ('DownEncoderBlockCausal3D', 'DownEncoderBlockCausal3D', 'DownEncoderBlockCausal3D', 'DownEncoderBlockCausal3D'), up_block_types: tuple[str, ...] = ('UpDecoderBlockCausal3D', 'UpDecoderBlockCausal3D', 'UpDecoderBlockCausal3D', 'UpDecoderBlockCausal3D'), block_out_channels: tuple[int, ...] = (128, 256, 512, 512), layers_per_block: int = 2, act_fn: str = 'silu', norm_num_groups: int = 32, time_compression_ratio: int = 4, mid_block_add_attention: bool = True, mid_block_causal_attn: bool = True, sample_size: int = 256, sample_tsize: int = 64)
Bases: VAEArchConfig
Architecture config matching official AutoencoderKLCausal3D config.json.
fastvideo.configs.models.vaes.gamecraftvae.GameCraftVAEConfig
dataclass
¶
GameCraftVAEConfig(arch_config: VAEArchConfig = GameCraftVAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = True, use_temporal_tiling: bool = True, use_parallel_tiling: bool = True, use_temporal_scaling_frames: bool = True)
fastvideo.configs.models.vaes.gen3cvae
¶
Classes¶
fastvideo.configs.models.vaes.gen3cvae.Gen3CVAEConfig
dataclass
¶
Gen3CVAEConfig(arch_config: CosmosVAEArchConfig = CosmosVAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = False, use_temporal_tiling: bool = False, use_parallel_tiling: bool = False, use_temporal_scaling_frames: bool = True, use_feature_cache: bool = True)
Bases: CosmosVAEConfig
GEN3C VAE config placeholder.
GEN3C uses tokenizer-backed VAE loading logic at runtime, but we keep a model-specific config class so pipeline/model configs stay model-scoped.
fastvideo.configs.models.vaes.oobleck
¶
Config for the Stable Audio Open 1.0 "Oobleck" VAE.
Mirrors the per-channel vae/config.json shipped in
stabilityai/stable-audio-open-1.0 1:1 (see
fastvideo/models/vaes/oobleck.py::OobleckVAE.from_pretrained, which
constructs the VAE from these fields). Inherits the FastVideo VAEConfig
base so the standard load_encoder / load_decoder flags + tiling
knobs apply.
Naming: the VAE architecture is officially "Oobleck" (per Stability
AI's stable-audio-tools) — the surrounding model family is "Stable
Audio Open 1.0". This config is named after the architecture
(OobleckVAEConfig) since the same VAE is shared across Stable Audio
checkpoints; downstream pipelines reference it by its arch name, not
by a host-pipeline name.
Classes¶
fastvideo.configs.models.vaes.oobleck.OobleckVAEArchConfig
dataclass
¶
OobleckVAEArchConfig(stacked_params_mapping: list[tuple[str, str, str]] = list(), scaling_factor: float | Tensor = 0, temporal_compression_ratio: int = 4, spatial_compression_ratio: int = 8, architectures: list[str] = (lambda: ['AutoencoderOobleck'])(), encoder_hidden_size: int = 128, downsampling_ratios: list[int] = (lambda: [2, 4, 4, 8, 8])(), channel_multiples: list[int] = (lambda: [1, 2, 4, 8, 16])(), decoder_channels: int = 128, decoder_input_channels: int = 64, audio_channels: int = 2, sampling_rate: int = 44100)
Bases: VAEArchConfig
Stable Audio Open 1.0 VAE architecture constants.
fastvideo.configs.models.vaes.oobleck.OobleckVAEConfig
dataclass
¶
OobleckVAEConfig(arch_config: VAEArchConfig = OobleckVAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = False, use_temporal_tiling: bool = False, use_parallel_tiling: bool = False, use_temporal_scaling_frames: bool = True, pretrained_path: str = 'stabilityai/stable-audio-open-1.0', pretrained_subfolder: str = 'vae', pretrained_dtype: str = 'float16')
Bases: VAEConfig
FastVideo VAE config wrapping the Oobleck arch.
Audio VAEs don't use the temporal/spatial tiling defaults that the base VAEConfig is shaped for (those exist for video VAEs); they are retained but irrelevant for audio.