nvfp4_config
¶
LTX-2 NVFP4 quantization (FlashInfer-backed).
NVFP4 is NVIDIA's block-scaled FP4 format (e2m1 mantissa, fp32 alpha,
layout_128x4 scale layout, group size 16) — distinct from
generic FP4 / OCP-FP4 / MX-FP4. We name the public surface NVFP4
explicitly so downstream callers don't conflate it with other FP4
variants that may land later (e.g. AMD's MX-FP4 or vendor-neutral
e3m0).
Upstreamed from FastVideo-internal so consumers that load LTX-2
weights with NVFP4 quantization can drive the public package
end-to-end.
flashinfer is imported lazily inside the call paths that need it.
This keeps import fastvideo cheap on hosts where flashinfer is
not installed; only the actual NVFP4 quantize / matmul ops fail at
use time, with a clear error.
Classes¶
fastvideo.layers.quantization.nvfp4_config.NVFP4Config
¶
NVFP4Config(layer_profile: str = 'refine')
Bases: QuantizationConfig
LTX-2-specific NVFP4 quantization configuration.
NVFP4 is NVIDIA's block-scaled FP4 (e2m1 mantissa, fp32 alpha,
layout_128x4 scale layout, group size 16). Today this class
hardcodes the LTX-2 layer paths it covers. When a second model
wants NVFP4, lift the layer-path list into a config field
instead of hardcoding it here.