oobleck
¶
Config for the Stable Audio Open 1.0 "Oobleck" VAE.
Mirrors the per-channel vae/config.json shipped in
stabilityai/stable-audio-open-1.0 1:1 (see
fastvideo/models/vaes/oobleck.py::OobleckVAE.from_pretrained, which
constructs the VAE from these fields). Inherits the FastVideo VAEConfig
base so the standard load_encoder / load_decoder flags + tiling
knobs apply.
Naming: the VAE architecture is officially "Oobleck" (per Stability
AI's stable-audio-tools) — the surrounding model family is "Stable
Audio Open 1.0". This config is named after the architecture
(OobleckVAEConfig) since the same VAE is shared across Stable Audio
checkpoints; downstream pipelines reference it by its arch name, not
by a host-pipeline name.
Classes¶
fastvideo.configs.models.vaes.oobleck.OobleckVAEArchConfig
dataclass
¶
OobleckVAEArchConfig(stacked_params_mapping: list[tuple[str, str, str]] = list(), scaling_factor: float | Tensor = 0, temporal_compression_ratio: int = 4, spatial_compression_ratio: int = 8, architectures: list[str] = (lambda: ['AutoencoderOobleck'])(), encoder_hidden_size: int = 128, downsampling_ratios: list[int] = (lambda: [2, 4, 4, 8, 8])(), channel_multiples: list[int] = (lambda: [1, 2, 4, 8, 16])(), decoder_channels: int = 128, decoder_input_channels: int = 64, audio_channels: int = 2, sampling_rate: int = 44100)
Bases: VAEArchConfig
Stable Audio Open 1.0 VAE architecture constants.
fastvideo.configs.models.vaes.oobleck.OobleckVAEConfig
dataclass
¶
OobleckVAEConfig(arch_config: VAEArchConfig = OobleckVAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = False, use_temporal_tiling: bool = False, use_parallel_tiling: bool = False, use_temporal_scaling_frames: bool = True, pretrained_path: str = 'stabilityai/stable-audio-open-1.0', pretrained_subfolder: str = 'vae', pretrained_dtype: str = 'float16')
Bases: VAEConfig
FastVideo VAE config wrapping the Oobleck arch.
Audio VAEs don't use the temporal/spatial tiling defaults that the base VAEConfig is shaped for (those exist for video VAEs); they are retained but irrelevant for audio.