Skip to content

oobleck

Config for the Stable Audio Open 1.0 "Oobleck" VAE.

Mirrors the per-channel vae/config.json shipped in stabilityai/stable-audio-open-1.0 1:1 (see fastvideo/models/vaes/oobleck.py::OobleckVAE.from_pretrained, which constructs the VAE from these fields). Inherits the FastVideo VAEConfig base so the standard load_encoder / load_decoder flags + tiling knobs apply.

Naming: the VAE architecture is officially "Oobleck" (per Stability AI's stable-audio-tools) — the surrounding model family is "Stable Audio Open 1.0". This config is named after the architecture (OobleckVAEConfig) since the same VAE is shared across Stable Audio checkpoints; downstream pipelines reference it by its arch name, not by a host-pipeline name.

Classes

fastvideo.configs.models.vaes.oobleck.OobleckVAEArchConfig dataclass

OobleckVAEArchConfig(stacked_params_mapping: list[tuple[str, str, str]] = list(), scaling_factor: float | Tensor = 0, temporal_compression_ratio: int = 4, spatial_compression_ratio: int = 8, architectures: list[str] = (lambda: ['AutoencoderOobleck'])(), encoder_hidden_size: int = 128, downsampling_ratios: list[int] = (lambda: [2, 4, 4, 8, 8])(), channel_multiples: list[int] = (lambda: [1, 2, 4, 8, 16])(), decoder_channels: int = 128, decoder_input_channels: int = 64, audio_channels: int = 2, sampling_rate: int = 44100)

Bases: VAEArchConfig

Stable Audio Open 1.0 VAE architecture constants.

fastvideo.configs.models.vaes.oobleck.OobleckVAEConfig dataclass

OobleckVAEConfig(arch_config: VAEArchConfig = OobleckVAEArchConfig(), load_encoder: bool = True, load_decoder: bool = True, tile_sample_min_height: int = 256, tile_sample_min_width: int = 256, tile_sample_min_num_frames: int = 16, tile_sample_stride_height: int = 192, tile_sample_stride_width: int = 192, tile_sample_stride_num_frames: int = 12, blend_num_frames: int = 0, use_tiling: bool = False, use_temporal_tiling: bool = False, use_parallel_tiling: bool = False, use_temporal_scaling_frames: bool = True, pretrained_path: str = 'stabilityai/stable-audio-open-1.0', pretrained_subfolder: str = 'vae', pretrained_dtype: str = 'float16')

Bases: VAEConfig

FastVideo VAE config wrapping the Oobleck arch.

Audio VAEs don't use the temporal/spatial tiling defaults that the base VAEConfig is shaped for (those exist for video VAEs); they are retained but irrelevant for audio.