Skip to content

nvfp4_config

LTX-2 NVFP4 quantization (FlashInfer-backed).

NVFP4 is NVIDIA's block-scaled FP4 format (e2m1 mantissa, fp32 alpha, layout_128x4 scale layout, group size 16) — distinct from generic FP4 / OCP-FP4 / MX-FP4. We name the public surface NVFP4 explicitly so downstream callers don't conflate it with other FP4 variants that may land later (e.g. AMD's MX-FP4 or vendor-neutral e3m0).

Upstreamed from FastVideo-internal so consumers that load LTX-2 weights with NVFP4 quantization can drive the public package end-to-end.

flashinfer is imported lazily inside the call paths that need it. This keeps import fastvideo cheap on hosts where flashinfer is not installed; only the actual NVFP4 quantize / matmul ops fail at use time, with a clear error.

Classes

fastvideo.layers.quantization.nvfp4_config.NVFP4Config

NVFP4Config(layer_profile: str = 'refine')

Bases: QuantizationConfig

LTX-2-specific NVFP4 quantization configuration.

NVFP4 is NVIDIA's block-scaled FP4 (e2m1 mantissa, fp32 alpha, layout_128x4 scale layout, group size 16). Today this class hardcodes the LTX-2 layer paths it covers. When a second model wants NVFP4, lift the layer-path list into a config field instead of hardcoding it here.

Source code in fastvideo/layers/quantization/nvfp4_config.py
def __init__(self, layer_profile: str = "refine"):
    super().__init__()
    # ``base``: stage-1 set (no attn2.to_out, no cross-modal AV
    # projections). ``refine``: full stage-2 set.
    self.layer_profile = layer_profile

Functions