Skip to content

stable_audio

PipelineConfig for Stable Audio Open 1.0.

Classes

fastvideo.configs.pipelines.stable_audio.StableAudioOpenSmallConfig dataclass

StableAudioOpenSmallConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = None, flow_shift_sr: float | None = None, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = StableAudioConfig(), dit_precision: str = 'fp16', upsampler_config: UpsamplerConfig = UpsamplerConfig(), upsampler_precision: str = 'fp32', vae_config: VAEConfig = OobleckVAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple = tuple(), text_encoder_precisions: tuple[str, ...] = tuple(), preprocess_text_funcs: tuple = tuple(), postprocess_text_funcs: tuple = tuple(), dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, num_inference_steps: int = 100, guidance_scale: float = 7.0, audio_end_in_s: float = 6.0, audio_start_in_s: float = 0.0, sampling_rate: int = 44100, audio_channels: int = 2, sample_size: int = 524288, max_audio_duration_s: float = 524288 / 44100, precision: str = 'fp16')

Bases: StableAudioT2AConfig

stable-audio-open-small overrides: shorter training window (524288 samples ≈ 11.89s @ 44.1 kHz) and a faster default sampler config carried by the small preset.

fastvideo.configs.pipelines.stable_audio.StableAudioT2AConfig dataclass

StableAudioT2AConfig(model_path: str = '', pipeline_config_path: str | None = None, embedded_cfg_scale: float = 6.0, flow_shift: float | None = None, flow_shift_sr: float | None = None, disable_autocast: bool = False, is_causal: bool = False, dit_config: DiTConfig = StableAudioConfig(), dit_precision: str = 'fp16', upsampler_config: UpsamplerConfig = UpsamplerConfig(), upsampler_precision: str = 'fp32', vae_config: VAEConfig = OobleckVAEConfig(), vae_precision: str = 'fp16', vae_tiling: bool = False, vae_sp: bool = False, image_encoder_config: EncoderConfig = EncoderConfig(), image_encoder_precision: str = 'fp32', text_encoder_configs: tuple = tuple(), text_encoder_precisions: tuple[str, ...] = tuple(), preprocess_text_funcs: tuple = tuple(), postprocess_text_funcs: tuple = tuple(), dmd_denoising_steps: list[int] | None = None, ti2v_task: bool = False, boundary_ratio: float | None = None, num_inference_steps: int = 100, guidance_scale: float = 7.0, audio_end_in_s: float = 10.0, audio_start_in_s: float = 0.0, sampling_rate: int = 44100, audio_channels: int = 2, sample_size: int = 2097152, max_audio_duration_s: float = 2097152 / 44100, precision: str = 'fp16')

Bases: PipelineConfig

Stable Audio Open 1.0 pipeline config.