Skip to content

tests

Modules

fastvideo.tests.api

Modules

fastvideo.tests.api.test_compat_translation

Tests for fastvideo.api.compat translation helpers covering the typed CompileConfig + PipelineSelection.vae_tiling surfaces promoted in PR 6.

Classes
fastvideo.tests.api.test_compat_translation.TestCompileConfigRoundTrip

typed CompileConfig -> FastVideoArgs.torch_compile_kwargs reconstruction drops None typed fields and merges extras.

fastvideo.tests.api.test_compat_translation.TestLegacyLtx2VaeTilingTranslation

ltx2_vae_tiling flat kwarg promotes to generator.pipeline.vae_tiling; reverse direction emits the legacy name back to FastVideoArgs.

fastvideo.tests.api.test_compat_translation.TestLegacyTextEncoderCompileTranslation

enable_torch_compile_text_encoder flat kwarg promotes to generator.engine.compile.text_encoder_enabled; reverse direction emits the legacy name back onto the FastVideoArgs kwargs dict so realtime-runtime consumers can read it before FastVideoArgs filters unknown fields.

fastvideo.tests.api.test_compat_translation.TestLegacyTorchCompileKwargsTranslation

Legacy torch_compile_kwargs={...} gets split across the four first-class :class:CompileConfig fields and anything unknown falls into extras.

fastvideo.tests.api.test_extra_overrides_routing

LTX-2 audio-conditioning kwargs must reach ForwardBatch.extra and SamplingParam.update() must reject unknown kwargs instead of silently dropping them.

Background: prior to PR-1288 a chain of LTX-2 audio kwargs (audio_num_frames, ltx2_audio_clean_latent …) silently flowed into SamplingParam.update() which logger.error'd and dropped them. That made every continuation segment generate audio for the default num_frames duration, which in turn fed an A/V duration mismatch into av_streaming.stream_fmp4 whose -shortest ffmpeg invocation closed stdin before the writer thread had pushed every frame, surfacing as BrokenPipeError in the streaming server.

These tests pin two contracts
  1. _BATCH_EXTRA_PASSTHROUGH_KEYS lists the exact set of kwargs pulled out of generate_video(**kwargs) for batch.extra.
  2. SamplingParam.update() raises ValueError on unknown keys.
Classes
Functions:
fastvideo.tests.api.test_extra_overrides_routing.test_passthrough_keys_are_not_sampling_param_fields
test_passthrough_keys_are_not_sampling_param_fields() -> None

If any of these become SamplingParam fields, the routing block in video_generator.py needs to be re-evaluated — they would no longer need to be popped before sampling_param.update().

Source code in fastvideo/tests/api/test_extra_overrides_routing.py
def test_passthrough_keys_are_not_sampling_param_fields() -> None:
    """If any of these become SamplingParam fields, the routing block
    in video_generator.py needs to be re-evaluated — they would no
    longer need to be popped before ``sampling_param.update()``."""
    import dataclasses
    sp_fields = {f.name for f in dataclasses.fields(SamplingParam())}
    leaked = sp_fields & set(_BATCH_EXTRA_PASSTHROUGH_KEYS)
    assert not leaked, (
        f"Passthrough keys collide with SamplingParam fields: {leaked}. "
        "Remove from _BATCH_EXTRA_PASSTHROUGH_KEYS or rename the field.")
fastvideo.tests.api.test_extra_overrides_routing.test_sampling_param_update_error_mentions_passthrough_route
test_sampling_param_update_error_mentions_passthrough_route() -> None

The error message should point future contributors at the right routing mechanism so they don't re-introduce silent dropping.

Source code in fastvideo/tests/api/test_extra_overrides_routing.py
def test_sampling_param_update_error_mentions_passthrough_route() -> None:
    """The error message should point future contributors at the right
    routing mechanism so they don't re-introduce silent dropping."""
    sp = SamplingParam()
    with pytest.raises(ValueError, match="_BATCH_EXTRA_PASSTHROUGH_KEYS"):
        sp.update({"audio_num_frames": 161})
fastvideo.tests.api.test_extra_overrides_routing.test_sampling_param_update_rejects_audio_passthrough_keys
test_sampling_param_update_rejects_audio_passthrough_keys() -> None

LTX-2 audio kwargs must NOT slip through update() — they belong to ForwardBatch.extra and the routing block in video_generator.py is responsible for popping them first.

Source code in fastvideo/tests/api/test_extra_overrides_routing.py
def test_sampling_param_update_rejects_audio_passthrough_keys() -> None:
    """LTX-2 audio kwargs must NOT slip through ``update()`` — they
    belong to ``ForwardBatch.extra`` and the routing block in
    ``video_generator.py`` is responsible for popping them first."""
    sp = SamplingParam()
    for key in _BATCH_EXTRA_PASSTHROUGH_KEYS:
        with pytest.raises(ValueError, match="unknown field"):
            sp.update({key: object()})
fastvideo.tests.api.test_extra_overrides_routing.test_sampling_param_update_rejects_partially_unknown_keys
test_sampling_param_update_rejects_partially_unknown_keys() -> None

Even when most keys are valid, a single unknown key must raise. Partial-success was the silent-failure mode this regression fixes.

Source code in fastvideo/tests/api/test_extra_overrides_routing.py
def test_sampling_param_update_rejects_partially_unknown_keys() -> None:
    """Even when most keys are valid, a single unknown key must raise.
    Partial-success was the silent-failure mode this regression fixes."""
    sp = SamplingParam()
    with pytest.raises(ValueError, match=r"\['bogus_field'\]"):
        sp.update({"prompt": "hello", "bogus_field": 42})
fastvideo.tests.api.test_ltx2_continuation

Tests for the typed LTX-2 continuation state.

Covers:

  • round-trip through :class:ContinuationState (inline and blob-backed)
  • payload is JSON-serializable (Dynamo RPC / HTTP client constraint)
  • kind / schema_version validation on deserialization
  • compat-layer validation (known kinds, payload shape)
  • round-trip through :func:request_to_sampling_param attaches the state to the resulting :class:SamplingParam without losing fidelity
Attributes
Classes
fastvideo.tests.api.test_ltx2_continuation.TestBlobIndirection

Large tensors live in the :class:BlobStore instead of the payload.

Methods:
fastvideo.tests.api.test_ltx2_continuation.TestBlobIndirection.test_blob_id_held_when_store_unavailable
test_blob_id_held_when_store_unavailable()

Deserializing without a blob store preserves the blob id so the caller can fetch it later.

Source code in fastvideo/tests/api/test_ltx2_continuation.py
def test_blob_id_held_when_store_unavailable(self):
    """Deserializing without a blob store preserves the blob id so
    the caller can fetch it later."""
    blob_store = InMemoryBlobStore()
    envelope = _make_typed_state().to_continuation_state(
        blob_store=blob_store,
        inline_threshold_bytes=0,
    )
    blob_id_video = envelope.payload["video"]["blob_id"]
    blob_id_audio = envelope.payload["audio"]["blob_id"]

    restored = LTX2ContinuationState.from_continuation_state(envelope)
    assert restored.video_frames is None
    assert restored.video_frames_blob_id == blob_id_video
    assert restored.audio_latents is None
    assert restored.audio_latents_blob_id == blob_id_audio
fastvideo.tests.api.test_ltx2_continuation.TestCompatLayerWireUp

The public compat layer accepts request.state without reverting to NotImplementedError and attaches it to the SamplingParam path.

fastvideo.tests.api.test_ltx2_continuation.TestRoundTrip

Round-trip through :class:ContinuationState preserves all fields.

Methods:
fastvideo.tests.api.test_ltx2_continuation.TestRoundTrip.test_bf16_audio_latents_preserved
test_bf16_audio_latents_preserved()

safetensors serialization must preserve bf16 dtype (numpy has no bf16, so a raw-bytes path would silently promote).

Source code in fastvideo/tests/api/test_ltx2_continuation.py
def test_bf16_audio_latents_preserved(self):
    """safetensors serialization must preserve bf16 dtype (numpy
    has no bf16, so a raw-bytes path would silently promote)."""
    state = LTX2ContinuationState(
        segment_index=0,
        audio_latents=torch.randn(1, 4, 16, 64, dtype=torch.bfloat16),
    )
    envelope = state.to_continuation_state()
    restored = LTX2ContinuationState.from_continuation_state(envelope)
    assert restored.audio_latents is not None
    assert restored.audio_latents.dtype == torch.bfloat16
    torch.testing.assert_close(
        restored.audio_latents, state.audio_latents)
fastvideo.tests.api.test_ltx2_continuation.TestValidation

Invalid payloads error cleanly.

Modules
fastvideo.tests.api.test_ltx2_gpu_pool_translation

gpu_pool-style flat-kwarg integration tests.

Mirrors the load_kwargs dict that the FastVideo-internal ui/ltx2-streaming/server/gpu_pool.py passes to VideoGenerator.from_pretrained(**load_kwargs) and asserts that the public typed GeneratorConfig surface (introduced across PRs 0-6) can represent it end-to-end, with no fields silently falling through to pipeline.experimental.

This is the parity guard PR 7.6 depends on: the public gpu_pool upstream must be able to construct a typed GeneratorConfig without knowing any legacy LTX-2 kwarg name, and downstream Dynamo (FastVideoArgGroup) must be able to do the same.

Classes
fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestCompileExtrasPreserved

Additional torch.compile kwargs beyond the four typed fields round-trip through CompileConfig.extras.

fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestGpuPoolForwardTranslation

gpu_pool flat kwargs -> typed GeneratorConfig.

Methods:
fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestGpuPoolForwardTranslation.test_no_experimental_leakage
test_no_experimental_leakage(config) -> None

Every gpu_pool kwarg should have a typed home — nothing should silently fall through to pipeline.experimental.

Source code in fastvideo/tests/api/test_ltx2_gpu_pool_translation.py
def test_no_experimental_leakage(self, config) -> None:
    """Every gpu_pool kwarg should have a typed home — nothing should
    silently fall through to ``pipeline.experimental``."""
    assert config.pipeline.experimental == {}
fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestGpuPoolReverseTranslation

typed GeneratorConfig -> FastVideoArgs kwargs reproduces the original gpu_pool flat-kwarg shape.

This is what lets PR 7.6 wire the public gpu_pool through generator_config_to_fastvideo_args without the runtime noticing.

Methods:
fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestGpuPoolReverseTranslation.test_no_stray_refine_dict
test_no_stray_refine_dict(args_kwargs) -> None

preset_overrides.refine must flatten to ltx2_refine_* kwargs rather than landing as a nested refine kwarg that FastVideoArgs doesn't understand.

Source code in fastvideo/tests/api/test_ltx2_gpu_pool_translation.py
def test_no_stray_refine_dict(self, args_kwargs) -> None:
    """preset_overrides.refine must flatten to ltx2_refine_* kwargs
    rather than landing as a nested ``refine`` kwarg that
    FastVideoArgs doesn't understand."""
    assert "refine" not in args_kwargs
fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestRefineFlattenCoversAllTypedFields

Every field on LTX2Refine{Preset,Stage}Override must survive the round-trip through preset_overrides.refine back to ltx2_refine_* kwargs. Guards against the hardcoded-key-tuple regression where image_crf / video_position_offset_sec silently dropped.

fastvideo.tests.api.test_ltx2_param_mapping

Regression tests for LTX2VideoArchConfig.param_names_mapping.

The to_gate_compress -> to_gate_logits rename is the LTX-2.3 gated attention loader rule. It must only fire when apply_gated_attention=True; otherwise it would silently retarget:

  • LTX-2.0 VIDEO_SPARSE_ATTN checkpoints, whose attention modules legitimately carry a to_gate_compress VSA-QAT gate (a sibling of attn_masked in fastvideo/models/dits/ltx2.py).
  • LoRAs trained with the default lora_target_modules list (which includes to_gate_compress; see fastvideo/train/utils/lora.py:36 and fastvideo/pipelines/lora_pipeline.py:171).
Classes
fastvideo.tests.api.test_ltx2_param_mapping.TestLTX20ParamMappingDefault

apply_gated_attention=False (LTX-2.0 default): to_gate_compress must pass through unmodified except for prefix normalization.

Methods:
fastvideo.tests.api.test_ltx2_param_mapping.TestLTX20ParamMappingDefault.test_default_lora_target_not_renamed
test_default_lora_target_not_renamed(source)

to_gate_compress is in DEFAULT_LORA_TARGET_MODULES, so LoRAs trained with default targets ship these keys.

Source code in fastvideo/tests/api/test_ltx2_param_mapping.py
@pytest.mark.parametrize("source", [
    "transformer_blocks.0.attn1.to_gate_compress.lora_A.weight",
    "transformer_blocks.0.attn1.to_gate_compress.lora_B.weight",
    "model.transformer_blocks.3.attn1.to_gate_compress.lora_A.weight",
])
def test_default_lora_target_not_renamed(self, source):
    """``to_gate_compress`` is in ``DEFAULT_LORA_TARGET_MODULES``, so LoRAs
    trained with default targets ship these keys."""
    target = _map(source, apply_gated_attention=False)
    assert "to_gate_logits" not in target, (
        f"Default-target LoRA key {source!r} was rewritten to {target!r}; "
        "this would break every LTX-2.0 LoRA in the wild.")
fastvideo.tests.api.test_ltx2_param_mapping.TestLTX20ParamMappingDefault.test_unrelated_param_still_prefix_stripped
test_unrelated_param_still_prefix_stripped()

Generic prefix-strip behavior is unchanged in LTX-2.0 mode.

Source code in fastvideo/tests/api/test_ltx2_param_mapping.py
def test_unrelated_param_still_prefix_stripped(self):
    """Generic prefix-strip behavior is unchanged in LTX-2.0 mode."""
    assert (_map("diffusion_model.transformer_blocks.0.attn1.to_q.weight",
                 apply_gated_attention=False) ==
            "model.transformer_blocks.0.attn1.to_q.weight")
fastvideo.tests.api.test_ltx2_param_mapping.TestLTX23ParamMappingGated

apply_gated_attention=True (LTX-2.3): the gated-attention to_gate_compress upstream key is renamed to to_gate_logits.

Methods:
fastvideo.tests.api.test_ltx2_param_mapping.TestLTX23ParamMappingGated.test_non_gate_param_still_prefix_stripped
test_non_gate_param_still_prefix_stripped()

Non-gate params still get the prefix-strip behavior.

Source code in fastvideo/tests/api/test_ltx2_param_mapping.py
def test_non_gate_param_still_prefix_stripped(self):
    """Non-gate params still get the prefix-strip behavior."""
    assert (_map("diffusion_model.transformer_blocks.0.attn1.to_q.weight",
                 apply_gated_attention=True) ==
            "model.transformer_blocks.0.attn1.to_q.weight")
fastvideo.tests.api.test_ltx2_param_mapping.TestParamMappingRuleOrdering

The gate rules must be inserted before the generic prefix-strip rules so first-match-wins matching fires the rename first.

Functions:
fastvideo.tests.api.test_ltx2_stage_overrides

Tests for typed LTX-2 stage override dataclasses.

Classes
fastvideo.tests.api.test_ltx2_stage_overrides.TestStageOverridesMirrorPresetSchema

The ltx2_two_stage preset's refine stage schema must list exactly the :class:LTX2RefineStageOverride field names.

Functions:
fastvideo.tests.api.test_presets
Classes
fastvideo.tests.api.test_presets.TestPresetCountIntegrity
Methods:
fastvideo.tests.api.test_presets.TestPresetCountIntegrity.test_total_preset_count
test_total_preset_count() -> None

At least the baseline 37 presets from 13 families are registered.

Source code in fastvideo/tests/api/test_presets.py
def test_total_preset_count(self) -> None:
    """At least the baseline 37 presets from 13 families are registered."""
    import fastvideo.registry  # noqa: F401
    names = get_all_preset_names()
    assert len(names) >= 37
fastvideo.tests.api.test_presets.TestPresetDefaultTypes

Preset defaults values must match the types on :class:SamplingParam. Assigning None to a typed-str field (e.g. negative_prompt) breaks downstream stages that assert the runtime type — see the CFG branch in pipelines/stages/text_encoding.py:81.

Methods:
fastvideo.tests.api.test_presets.TestPresetDefaultTypes.test_ltx2_cfg_defaults_are_off
test_ltx2_cfg_defaults_are_off() -> None

SamplingParam's LTX-2 CFG class defaults must be 1.0 (CFG off). ForwardBatch.__post_init__ force-enables do_classifier_free_guidance when either ltx2_cfg_scale_video or ltx2_cfg_scale_audio is != 1.0, so any non-1.0 default silently forces CFG on for every model family that doesn't explicitly override these fields. Guard against the regression that surfaced as the TurboDiffusion I2V SSIM crash (text_encoding.py:81 assertion on negative_prompt).

Source code in fastvideo/tests/api/test_presets.py
def test_ltx2_cfg_defaults_are_off(self) -> None:
    """SamplingParam's LTX-2 CFG class defaults must be 1.0 (CFG
    off). ``ForwardBatch.__post_init__`` force-enables
    ``do_classifier_free_guidance`` when either
    ``ltx2_cfg_scale_video`` or ``ltx2_cfg_scale_audio`` is != 1.0,
    so any non-1.0 default silently forces CFG on for every model
    family that doesn't explicitly override these fields. Guard
    against the regression that surfaced as the TurboDiffusion I2V
    SSIM crash (``text_encoding.py:81`` assertion on
    ``negative_prompt``)."""
    from fastvideo.api.sampling_param import SamplingParam
    sp = SamplingParam()
    assert sp.ltx2_cfg_scale_video == 1.0
    assert sp.ltx2_cfg_scale_audio == 1.0
fastvideo.tests.api.test_presets.TestWanPresets

Verify the Wan presets registered from registry.py.

Functions:
fastvideo.tests.api.test_schema_parity_inventory
Classes
Modules
fastvideo.tests.api.test_typed_quant_flow

Typed quantization flow contract tests.

Locks in the path from typed GeneratorConfig.engine.quantization.transformer_quant: "NVFP4" through the compat layer to a concrete NVFP4Config instance pinned on pipeline_config.dit_config.quant_config.

The model loader detects FP4 by isinstance(quant_method, NVFP4QuantizeMethod) rather than by a flag, so the typed surface must reliably produce that class on the DiT config — otherwise the loader silently runs full bf16.

Classes
Functions:
fastvideo.tests.api.test_typed_quant_flow.captured_kwargs
captured_kwargs(monkeypatch)

Replace FastVideoArgs.from_kwargs with a capturer so the test doesn't try to download model_index.json.

Source code in fastvideo/tests/api/test_typed_quant_flow.py
@pytest.fixture
def captured_kwargs(monkeypatch):
    """Replace ``FastVideoArgs.from_kwargs`` with a capturer so the
    test doesn't try to download model_index.json.
    """
    from fastvideo import fastvideo_args as fva

    captured: dict[str, object] = {}

    def _capture(**kw):
        captured.update(kw)

        class _Stub:
            kwargs = kw

        return _Stub()

    monkeypatch.setattr(fva.FastVideoArgs, "from_kwargs", _capture)
    return captured
fastvideo.tests.api.test_typed_quant_flow.test_apply_transformer_quant_does_not_overwrite_explicit_dit_config
test_apply_transformer_quant_does_not_overwrite_explicit_dit_config() -> None

When the caller has explicitly set pipeline_config.dit_config.quant_config already, the typed carrier defers — the explicit setter wins.

Source code in fastvideo/tests/api/test_typed_quant_flow.py
def test_apply_transformer_quant_does_not_overwrite_explicit_dit_config(
) -> None:
    """When the caller has explicitly set
    ``pipeline_config.dit_config.quant_config`` already, the typed
    carrier defers — the explicit setter wins.
    """
    from fastvideo.fastvideo_args import FastVideoArgs

    explicit = NVFP4Config(layer_profile="base")
    args = FastVideoArgs(model_path="FastVideo/LTX2-Distilled-Diffusers")
    args.pipeline_config.dit_config.quant_config = explicit
    args.transformer_quant = NVFP4Config(layer_profile="refine")
    args._apply_transformer_quant()
    assert args.pipeline_config.dit_config.quant_config is explicit
fastvideo.tests.api.test_typed_quant_flow.test_apply_transformer_quant_pins_to_dit_config
test_apply_transformer_quant_pins_to_dit_config(monkeypatch) -> None

FastVideoArgs.__post_init__._apply_transformer_quant must copy the transformer_quant instance onto pipeline_config.dit_config.quant_config so the DiT loader sees it during construction.

Source code in fastvideo/tests/api/test_typed_quant_flow.py
def test_apply_transformer_quant_pins_to_dit_config(monkeypatch) -> None:
    """``FastVideoArgs.__post_init__._apply_transformer_quant`` must
    copy the ``transformer_quant`` instance onto
    ``pipeline_config.dit_config.quant_config`` so the DiT loader sees
    it during construction.
    """
    from fastvideo.fastvideo_args import FastVideoArgs

    args = FastVideoArgs(model_path="FastVideo/LTX2-Distilled-Diffusers")
    # ``transformer_quant`` defaults to None so __post_init__ leaves it.
    assert args.transformer_quant is None
    assert args.pipeline_config.dit_config.quant_config is None

    nvfp4 = NVFP4Config()
    args.transformer_quant = nvfp4
    args._apply_transformer_quant()
    assert args.pipeline_config.dit_config.quant_config is nvfp4
fastvideo.tests.api.test_typed_quant_flow.test_no_typed_quant_omits_transformer_quant_kwarg
test_no_typed_quant_omits_transformer_quant_kwarg(captured_kwargs) -> None

Default GeneratorConfig has quantization=None — the carrier must not be set, so the existing legacy path (pipeline_config.dit_config.quant_config = NVFP4Config()) keeps working as before.

Source code in fastvideo/tests/api/test_typed_quant_flow.py
def test_no_typed_quant_omits_transformer_quant_kwarg(captured_kwargs) -> None:
    """Default GeneratorConfig has ``quantization=None`` — the carrier
    must not be set, so the existing legacy path
    (``pipeline_config.dit_config.quant_config = NVFP4Config()``)
    keeps working as before.
    """
    cfg = GeneratorConfig(model_path="FastVideo/LTX2-Distilled-Diffusers")
    generator_config_to_fastvideo_args(cfg)
    assert "transformer_quant" not in captured_kwargs

fastvideo.tests.conftest

Functions:

fastvideo.tests.conftest.distributed_setup
distributed_setup()

Fixture to set up and tear down the distributed environment for tests.

This ensures proper cleanup even if tests fail.

Source code in fastvideo/tests/conftest.py
@pytest.fixture(scope="function")
def distributed_setup():
    """
    Fixture to set up and tear down the distributed environment for tests.

    This ensures proper cleanup even if tests fail.
    """
    torch.manual_seed(42)
    np.random.seed(42)
    maybe_init_distributed_environment_and_model_parallel(1, 1)
    yield

    cleanup_dist_env_and_memory()

fastvideo.tests.contract

Contract tests guarding FastVideo's public API against drift.

These tests run against the public surface only (fastvideo.VideoGenerator, fastvideo.api.*) — never via private helpers. They fail at FastVideo CI if a change breaks the shape the Dynamo backend package and the private Dreamverse adapter depend on, so drift is caught here before it reaches downstream integrators.

Modules

fastvideo.tests.contract.test_dreamverse_shape

Contract test: Dreamverse-style inputs normalize through the public typed API without needing any private-only compatibility promise.

The private Dreamverse UI server (FastVideo-internal/ui/ltx2-streaming/ server/gpu_pool.py) has historically called VideoGenerator.from_pretrained(**load_kwargs) with a flat kwarg bag containing LTX-2-specific names (ltx2_refine_enabled, ltx2_refine_upsampler_path, etc.). PR 6 gave every one of those kwargs a typed home under GeneratorConfig.

This test makes sure:

  1. The public typed API can represent everything Dreamverse currently passes at init time (legacy_from_pretrained_to_config).
  2. The request-path Dreamverse uses (generator.generate_video(**kwargs) with per-segment flags) round-trips through the typed GenerationRequest without reintroducing private-only fields at the public boundary.
  3. Private-only Dreamverse fields that don't belong on the public surface either go to pipeline.experimental / request.extensions (the documented escape hatch) or raise explicitly, rather than silently becoming part of the public compatibility promise.

Regression guard for the scoping rule in apirefactor.md §"Schema Parity Requirement".

Classes
fastvideo.tests.contract.test_dreamverse_shape.TestDreamverseLoadKwargsShape

Every current Dreamverse init-time kwarg must land on a typed field, not in the experimental escape hatch.

fastvideo.tests.contract.test_dreamverse_shape.TestDreamverseNoPrivateImports

The public entry points must not force a Dreamverse integrator to import from fastvideo.pipelines.* or other internal paths.

fastvideo.tests.contract.test_dreamverse_shape.TestDreamversePrivateOnlyFields

Dreamverse carries a handful of private-only names (e.g. legacy internal aliases). These must NOT silently turn into a public compatibility promise — the documented contract is that unknown fields land on pipeline.experimental so integrators see them but FastVideo does not promise to preserve them.

fastvideo.tests.contract.test_dreamverse_shape.TestDreamverseRequestShape

The per-segment Dreamverse request path mirrors OpenAI's shape plus a few LTX-2 knobs. All of them must have a typed home.

Methods:
fastvideo.tests.contract.test_dreamverse_shape.TestDreamverseRequestShape.test_return_state_reaches_output_config
test_return_state_reaches_output_config()

PR 7 added output.return_state — must survive the legacy translation path so Dreamverse callers can opt in.

Source code in fastvideo/tests/contract/test_dreamverse_shape.py
def test_return_state_reaches_output_config(self):
    """PR 7 added ``output.return_state`` — must survive the legacy
    translation path so Dreamverse callers can opt in."""
    request = GenerationRequest(
        prompt="x",
        output=__import__(
            "fastvideo.api", fromlist=["OutputConfig"]).OutputConfig(
                return_state=True),
    )
    normalized = normalize_generation_request(request)
    assert normalized.output.return_state is True
fastvideo.tests.contract.test_dynamo_shape

Contract test: a mock Dynamo-style handler wraps FastVideo's public API without touching any private module.

The Dynamo backend package (components/src/dynamo/fastvideo/ in the Dynamo repo) imports only these symbols:

from fastvideo import VideoGenerator
from fastvideo.api import (
    ContinuationState, GenerationRequest, InputConfig, OutputConfig,
    SamplingConfig,
)

If a FastVideo refactor breaks the adapter shape this test fails at FastVideo CI — before the Dynamo-side integration knows. The plan (PR 7.10) requires the backend to be expressible without flat legacy LTX-2 kwargs or FastVideo-internal imports; this file asserts the subset that exists today and is stable.

Classes
fastvideo.tests.contract.test_dynamo_shape.TestDynamoHandlerContract
Methods:
fastvideo.tests.contract.test_dynamo_shape.TestDynamoHandlerContract.test_handler_serializes_state_back_to_nvext
test_handler_serializes_state_back_to_nvext()

When the request carries state, the handler should be able to include a matching serialized state on the response. (Dynamo's NvVideosResponse has nvext.continuation_state reserved for this in the pending disaggregation path.)

Source code in fastvideo/tests/contract/test_dynamo_shape.py
def test_handler_serializes_state_back_to_nvext(self):
    """When the request carries state, the handler should be able to
    include a matching serialized state on the response. (Dynamo's
    NvVideosResponse has nvext.continuation_state reserved for this
    in the pending disaggregation path.)"""
    handler = _MockFastVideoHandler({"b64_json": "abc"})
    state_in = {
        "kind": "ltx2.v1",
        "payload": {"schema_version": 1, "segment_index": 4},
    }

    async def run():
        async for chunk in handler.generate(
                _nv_create_video_request(
                    nvext={"continuation_state": state_in}),
                _FakeContext()):
            return chunk
        raise AssertionError("no chunk emitted")

    chunk = asyncio.run(run())
    assert chunk["nvext"]["continuation_state"]["kind"] == "ltx2.v1"
    assert (
        chunk["nvext"]["continuation_state"]["payload"]["segment_index"]
        == 4)
fastvideo.tests.contract.test_dynamo_shape.TestNoInternalImports

The adapter template in this file imports only the public surface.

Any change to FastVideo that requires the Dynamo adapter to reach into a private module would make this test fail at review time.

Functions:
fastvideo.tests.contract.test_dynamo_shape.nv_request_to_generation_request
nv_request_to_generation_request(request: dict[str, Any]) -> GenerationRequest

Translate Dynamo's request shape into FastVideo's typed request.

This function is the template integrators copy into the Dynamo repo. It uses only public FastVideo symbols.

Source code in fastvideo/tests/contract/test_dynamo_shape.py
def nv_request_to_generation_request(
    request: dict[str, Any], ) -> GenerationRequest:
    """Translate Dynamo's request shape into FastVideo's typed request.

    This function is the template integrators copy into the Dynamo repo.
    It uses only public FastVideo symbols.
    """
    nvext = request.get("nvext") or {}
    width, height = _parse_size(request.get("size"))
    fps = nvext.get("fps") or 24
    num_frames = nvext.get("num_frames") or (request.get("seconds") or 4) * fps

    state = nvext.get("continuation_state")
    if isinstance(state, dict):
        state = ContinuationState(kind=state["kind"], payload=state["payload"])

    return GenerationRequest(
        prompt=request["prompt"],
        negative_prompt=nvext.get("negative_prompt"),
        inputs=InputConfig(image_path=request.get("input_reference")),
        sampling=SamplingConfig(
            width=width,
            height=height,
            num_frames=num_frames,
            fps=fps,
            num_inference_steps=nvext.get("num_inference_steps", 50),
            guidance_scale=nvext.get("guidance_scale", 1.0),
            seed=nvext.get("seed", 1024),
            true_cfg_scale=nvext.get("true_cfg_scale"),
        ),
        output=OutputConfig(save_video=False, return_frames=False),
        state=state,
    )
fastvideo.tests.contract.test_generate_async

Contract tests for VideoGenerator.generate_async.

These tests monkey-patch the synchronous _generate_request_impl so the suite runs CPU-only -- the async wrapper is the piece under test, not the pipeline.

Classes
fastvideo.tests.contract.test_generate_async.TestDynamoStyleHandlerIntegration

Mirror the shape the Dynamo backend package uses.

fastvideo.tests.eval

Modules

fastvideo.tests.eval.test_datasets_vbench

Smoke tests for the VBench prompt dataset and dataset registry.

Classes
Functions:
fastvideo.tests.eval.test_evaluator_multi_gpu

Multi-replica eval through the public Evaluator API.

Skipped automatically when fewer than 2 CUDA devices are visible.

Functions:
fastvideo.tests.eval.test_evaluator_multi_gpu.baseline_scores
baseline_scores()

Reference scores computed on a single-GPU evaluator. The multi-GPU runs must reproduce these exactly when handed the same input list — that's the only way to verify round-robin dispatch isn't dropping or reordering samples.

Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
@pytest.fixture
def baseline_scores():
    """Reference scores computed on a single-GPU evaluator. The multi-GPU
    runs must reproduce these exactly when handed the same input list —
    that's the only way to verify round-robin dispatch isn't dropping or
    reordering samples."""
    samples = _make_samples(8)
    ev = create_evaluator(
        metrics=["common.psnr", "common.ssim"],
        device="cuda:0",
        num_gpus=1,
    )
    try:
        out = ev.evaluate(samples=samples)
    finally:
        ev.shutdown()
    return samples, out
fastvideo.tests.eval.test_evaluator_multi_gpu.test_multi_gpu_dispatch_preserves_order_and_scores
test_multi_gpu_dispatch_preserves_order_and_scores(baseline_scores)

Same samples, multi-GPU dispatch — results must match the single-GPU baseline element-for-element. This verifies (a) the round-robin doesn't reorder, (b) every sample is scored exactly once, © the workers don't share mutable state.

Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
def test_multi_gpu_dispatch_preserves_order_and_scores(baseline_scores):
    """Same samples, multi-GPU dispatch — results must match the single-GPU
    baseline element-for-element. This verifies (a) the round-robin doesn't
    reorder, (b) every sample is scored exactly once, (c) the workers
    don't share mutable state."""
    samples, expected = baseline_scores

    ev = create_evaluator(
        metrics=["common.psnr", "common.ssim"],
        num_gpus=2,
    )
    try:
        got = ev.evaluate(samples=samples)
    finally:
        ev.shutdown()

    assert len(got) == len(expected)
    for i, (g, e) in enumerate(zip(got, expected)):
        assert set(g.keys()) == {"common.psnr", "common.ssim"}, f"row {i}"
        assert g["common.psnr"].score == pytest.approx(e["common.psnr"].score), \
            f"row {i} psnr drift"
        assert g["common.ssim"].score == pytest.approx(e["common.ssim"].score), \
            f"row {i} ssim drift"
fastvideo.tests.eval.test_evaluator_multi_gpu.test_multi_gpu_evaluator_kwargs_form_runs_on_one_replica
test_multi_gpu_evaluator_kwargs_form_runs_on_one_replica()

The kwargs form (single sample) is documented to always hit worker 0; this test pins the contract so future refactors don't accidentally fan out a single call.

Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
def test_multi_gpu_evaluator_kwargs_form_runs_on_one_replica():
    """The kwargs form (single sample) is documented to always hit worker
    0; this test pins the contract so future refactors don't accidentally
    fan out a single call."""
    ev = create_evaluator(metrics=["common.psnr"], num_gpus=2)
    try:
        torch.manual_seed(0)
        gen = torch.rand(_T, _C, _H, _W)
        out = ev.evaluate(video=gen, reference=gen)
        assert out["common.psnr"].score > 50.0     # PSNR(x, x) is huge
    finally:
        ev.shutdown()
fastvideo.tests.eval.test_evaluator_multi_gpu.test_multi_gpu_release_cuda_memory_runs_clean
test_multi_gpu_release_cuda_memory_runs_clean()

release_cuda_memory must hit every replica without crashing.

Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
def test_multi_gpu_release_cuda_memory_runs_clean():
    """``release_cuda_memory`` must hit every replica without crashing."""
    ev = create_evaluator(metrics=["common.psnr"], num_gpus=2)
    try:
        samples = _make_samples(2)
        _ = ev.evaluate(samples=samples)
        ev.release_cuda_memory()                   # should not raise
    finally:
        ev.shutdown()
fastvideo.tests.eval.test_evaluator_paths

Path-input variants of the public Evaluator API.

The worker boundary accepts video / reference as either a pre-loaded (T, C, H, W) tensor or a path-like (str / Path). These tests pin the path-form so future refactors don't accidentally re-require pre-loaded tensors.

Classes
Functions:
fastvideo.tests.eval.test_evaluator_paths.test_dispatcher_holds_paths_not_tensors_in_queue
test_dispatcher_holds_paths_not_tensors_in_queue(tmp_path)

Memory invariant: when many paths are passed, the queued samples are tiny strings, not full tensors. Verify by checking the length of the per-sample reference set the dispatcher materializes.

Source code in fastvideo/tests/eval/test_evaluator_paths.py
def test_dispatcher_holds_paths_not_tensors_in_queue(tmp_path):
    """Memory invariant: when many paths are passed, the queued samples
    are tiny strings, not full tensors. Verify by checking the length
    of the per-sample reference set the dispatcher materializes."""
    torch.manual_seed(1)
    n = 8
    paths: list[Path] = []
    for i in range(n):
        t = torch.rand(_T, _C, _H, _W)
        p = tmp_path / f"clip_{i}.mp4"
        _write_tensor_as_mp4(t, p)
        paths.append(p)

    samples = [{"video": str(p), "reference": str(p)} for p in paths]
    # Each sample dict is just two strings — no tensor allocations until
    # the worker's _resolve_video_input runs.
    for s in samples:
        assert isinstance(s["video"], str)
        assert isinstance(s["reference"], str)

    ev = create_evaluator(metrics=["common.psnr"], device="cpu")
    try:
        out = ev.evaluate(samples=samples)
    finally:
        ev.shutdown()
    assert len(out) == n
    # Self-paired ⇒ all PSNRs should be very high.
    for row in out:
        assert row["common.psnr"].score > 50.0
fastvideo.tests.eval.test_evaluator_paths.test_missing_path_surfaces_as_exception
test_missing_path_surfaces_as_exception(evaluator, tmp_path)

Decode failures must propagate, not silently produce a None score.

Source code in fastvideo/tests/eval/test_evaluator_paths.py
def test_missing_path_surfaces_as_exception(evaluator, tmp_path):
    """Decode failures must propagate, not silently produce a None score."""
    bogus = tmp_path / "does_not_exist.mp4"
    with pytest.raises(Exception):
        evaluator.evaluate(video=str(bogus), reference=str(bogus))
fastvideo.tests.eval.test_evaluator_paths.test_one_shot_evaluate_accepts_paths
test_one_shot_evaluate_accepts_paths(video_paths)

The top-level fastvideo.eval.evaluate helper also flows paths.

Source code in fastvideo/tests/eval/test_evaluator_paths.py
def test_one_shot_evaluate_accepts_paths(video_paths):
    """The top-level ``fastvideo.eval.evaluate`` helper also flows paths."""
    paths, _ = video_paths
    out = evaluate(generated=str(paths[0]), reference=str(paths[0]),
                   metrics=["common.psnr"], device="cpu")
    assert out["common.psnr"].score > 50.0
fastvideo.tests.eval.test_evaluator_paths.test_path_form_score_matches_tensor_form
test_path_form_score_matches_tensor_form(evaluator, video_paths)

Loading via path must produce the same score as loading via the public load_video helper and passing the tensor in directly.

Source code in fastvideo/tests/eval/test_evaluator_paths.py
def test_path_form_score_matches_tensor_form(evaluator, video_paths):
    """Loading via path must produce the same score as loading via the
    public ``load_video`` helper and passing the tensor in directly."""
    from fastvideo.eval.io import load_video

    paths, _ = video_paths
    via_path = evaluator.evaluate(video=str(paths[0]), reference=str(paths[0]))
    tensor = load_video(str(paths[0]))
    via_tensor = evaluator.evaluate(video=tensor, reference=tensor)
    assert via_path["common.psnr"].score == pytest.approx(
        via_tensor["common.psnr"].score, abs=1e-4)
    assert via_path["common.ssim"].score == pytest.approx(
        via_tensor["common.ssim"].score, abs=1e-4)
fastvideo.tests.eval.test_evaluator_paths.test_samples_list_can_mix_paths_and_tensors
test_samples_list_can_mix_paths_and_tensors(evaluator, video_paths)

A single samples call can mix path and tensor entries.

Source code in fastvideo/tests/eval/test_evaluator_paths.py
def test_samples_list_can_mix_paths_and_tensors(evaluator, video_paths):
    """A single ``samples`` call can mix path and tensor entries."""
    paths, tensors = video_paths
    samples = [
        {"video": str(paths[0]), "reference": tensors[0]},   # path + tensor
        {"video": tensors[1], "reference": str(paths[1])},   # tensor + path
    ]
    out = evaluator.evaluate(samples=samples)
    assert len(out) == 2
    for row in out:
        assert row["common.psnr"].score > 0.0
fastvideo.tests.eval.test_evaluator_paths.video_paths
video_paths(tmp_path)

Two reproducible mp4s on disk + their pre-loaded tensors for parity.

Source code in fastvideo/tests/eval/test_evaluator_paths.py
@pytest.fixture
def video_paths(tmp_path):
    """Two reproducible mp4s on disk + their pre-loaded tensors for parity."""
    torch.manual_seed(0)
    paths: list[Path] = []
    tensors: list[torch.Tensor] = []
    for i in range(2):
        t = torch.rand(_T, _C, _H, _W)
        p = tmp_path / f"clip_{i}.mp4"
        _write_tensor_as_mp4(t, p)
        paths.append(p)
        tensors.append(t)
    return paths, tensors
fastvideo.tests.eval.test_evaluator_single

End-to-end tests for single-replica eval through the public API.

Runs the lightweight pixel-space metrics — common.psnr and common.ssim — under both shapes that real callers use:

  • one-shot evaluate(video=..., reference=...) (the helper in fastvideo.eval.api);
  • a long-lived Evaluator, called once per sample;
  • a long-lived Evaluator, called with a list of sample dicts to fan out (samples=[...]).

GPU-only metrics live in separate test modules / classes; everything here runs on CPU so the suite stays cheap to invoke.

Classes
Functions:
fastvideo.tests.eval.test_evaluator_single.gen_ref
gen_ref()

Reproducible (gen, ref) pair shaped (T, C, H, W).

Source code in fastvideo/tests/eval/test_evaluator_single.py
@pytest.fixture
def gen_ref():
    """Reproducible (gen, ref) pair shaped (T, C, H, W)."""
    torch.manual_seed(0)
    gen = torch.rand(_T, _C, _H, _W)
    ref = torch.rand(_T, _C, _H, _W)
    return gen, ref
fastvideo.tests.eval.test_evaluator_single.test_evaluator_accepts_legacy_5d_input
test_evaluator_accepts_legacy_5d_input(evaluator, gen_ref)

Callers that still pass (1, T, C, H, W) should get unwrapped.

Source code in fastvideo/tests/eval/test_evaluator_single.py
def test_evaluator_accepts_legacy_5d_input(evaluator, gen_ref):
    """Callers that still pass ``(1, T, C, H, W)`` should get unwrapped."""
    gen, ref = gen_ref
    out = evaluator.evaluate(video=gen.unsqueeze(0), reference=ref.unsqueeze(0))
    _assert_well_formed(out["common.psnr"], "common.psnr")
fastvideo.tests.eval.test_evaluator_single.test_evaluator_psnr_identical_videos_is_high
test_evaluator_psnr_identical_videos_is_high(evaluator, gen_ref)

PSNR(x, x) is unbounded above; with our clamp it caps near 100 dB.

Source code in fastvideo/tests/eval/test_evaluator_single.py
def test_evaluator_psnr_identical_videos_is_high(evaluator, gen_ref):
    """PSNR(x, x) is unbounded above; with our clamp it caps near 100 dB."""
    gen, _ = gen_ref
    out = evaluator.evaluate(video=gen, reference=gen)
    assert out["common.psnr"].score > 50.0
    assert out["common.ssim"].score == pytest.approx(1.0, abs=1e-5)
fastvideo.tests.eval.test_evaluator_single.test_evaluator_samples_list_preserves_input_order
test_evaluator_samples_list_preserves_input_order(evaluator)

When samples=[...] is passed, results must come back per sample.

Source code in fastvideo/tests/eval/test_evaluator_single.py
def test_evaluator_samples_list_preserves_input_order(evaluator):
    """When ``samples=[...]`` is passed, results must come back per sample."""
    torch.manual_seed(1)
    samples = []
    for i in range(4):
        # Vary the reference enough that scores differ across rows.
        gen = torch.rand(_T, _C, _H, _W)
        ref = gen + 0.01 * (i + 1) * torch.rand_like(gen)
        samples.append({"video": gen, "reference": ref})

    out = evaluator.evaluate(samples=samples)
    assert isinstance(out, list)
    assert len(out) == len(samples)
    for row in out:
        assert set(row.keys()) == {"common.psnr", "common.ssim"}
        _assert_well_formed(row["common.psnr"], "common.psnr")

    # Re-running should give bit-identical scores (no nondeterministic
    # scheduling effects under single-GPU dispatch).
    out2 = evaluator.evaluate(samples=samples)
    for a, b in zip(out, out2):
        assert a["common.psnr"].score == pytest.approx(b["common.psnr"].score)
fastvideo.tests.eval.test_evaluator_with_dataset

End-to-end test: prompt dataset → Evaluator.

Mirrors the canonical user flow:

ds = get_dataset("vbench", dimensions=[...])
ev = create_evaluator(metrics=[...], device=...)
for row in ds:
    video = my_generator(row["prompt"])
    scores = ev.evaluate(video=video, **row)

We don't actually generate videos — that would pull in a diffusion model. Instead we synthesize a reproducible random tensor per row, so the test exercises the dataset-iteration → evaluator-call wiring without depending on any model weights.

Classes
Functions:
fastvideo.tests.eval.test_evaluator_with_dataset.test_dataset_samples_form_through_evaluator
test_dataset_samples_form_through_evaluator()

Evaluator.evaluate(samples=[...]) is the canonical batched entry point; verify it works when the per-row dicts come from a dataset (kwargs form) rather than being hand-built in the test.

Source code in fastvideo/tests/eval/test_evaluator_with_dataset.py
def test_dataset_samples_form_through_evaluator():
    """``Evaluator.evaluate(samples=[...])`` is the canonical batched
    entry point; verify it works when the per-row dicts come from a
    dataset (kwargs form) rather than being hand-built in the test."""
    ds = get_dataset("vbench", dimensions=["color"])
    rows = [ds[i] for i in range(3)]

    samples = []
    for i, row in enumerate(rows):
        samples.append({
            "video": _synth_video(seed=i),
            "reference": _synth_video(seed=100 + i),
            **row,
        })

    ev = create_evaluator(metrics=["common.psnr"], device="cpu")
    try:
        out = ev.evaluate(samples=samples)
    finally:
        ev.shutdown()

    assert len(out) == len(rows)
    for row in out:
        assert "common.psnr" in row
        assert row["common.psnr"].score is not None
fastvideo.tests.eval.test_evaluator_with_dataset.test_vbench_dataset_full_corpus_iteration
test_vbench_dataset_full_corpus_iteration()

Iterating the whole dataset should be cheap (no evaluator calls). This guards against a future refactor that accidentally makes __iter__ do real work.

Source code in fastvideo/tests/eval/test_evaluator_with_dataset.py
def test_vbench_dataset_full_corpus_iteration():
    """Iterating the whole dataset should be cheap (no evaluator calls).
    This guards against a future refactor that accidentally makes
    ``__iter__`` do real work."""
    ds = get_dataset("vbench")
    rows = list(ds)
    assert len(rows) == len(ds)
    assert all(isinstance(r, dict) for r in rows)
fastvideo.tests.eval.test_evaluator_with_dataset.test_vbench_dataset_rows_drop_into_evaluator
test_vbench_dataset_rows_drop_into_evaluator()

Every row from the corpus must be a kwargs-friendly dict for Evaluator.evaluate: extra keys flow through without breaking the metric, and the metric returns a well-formed MetricResult.

Source code in fastvideo/tests/eval/test_evaluator_with_dataset.py
def test_vbench_dataset_rows_drop_into_evaluator():
    """Every row from the corpus must be a kwargs-friendly dict for
    ``Evaluator.evaluate``: extra keys flow through without breaking the
    metric, and the metric returns a well-formed ``MetricResult``."""
    ds = get_dataset("vbench", dimensions=["color"])
    rows = [ds[i] for i in range(3)]
    assert all("prompt" in r for r in rows)
    assert all("auxiliary_info" in r for r in rows)

    ev = create_evaluator(metrics=["common.psnr"], device="cpu")
    try:
        for i, row in enumerate(rows):
            video = _synth_video(seed=i)
            reference = _synth_video(seed=100 + i)
            # Real-world shape: pass dataset row through verbatim, plus
            # the generated/reference tensors. Extra dataset fields
            # ('prompt', 'n_samples', 'dimensions', 'auxiliary_info') are
            # ignored by common.psnr — that's the contract we want to pin.
            scores = ev.evaluate(video=video, reference=reference, **row)
            mr = scores["common.psnr"]
            assert isinstance(mr, MetricResult)
            assert mr.name == "common.psnr"
            assert mr.score is not None
    finally:
        ev.shutdown()
fastvideo.tests.eval.test_registry

Smoke tests for the metric registry surface.

These exercise the public fastvideo.eval API only: list_metrics, get_metric, and the group-resolution logic that create_evaluator(metrics="vbench") uses.

Functions:
fastvideo.tests.eval.test_registry.test_create_evaluator_resolves_group_prefix
test_create_evaluator_resolves_group_prefix()

metrics="<group>" should expand to every <group>.* sub-metric.

Use the physics_iq group because it has multiple sub-metrics and none of them load model weights — the group-resolution behavior is what we're testing, not metric setup.

Source code in fastvideo/tests/eval/test_registry.py
def test_create_evaluator_resolves_group_prefix():
    """``metrics="<group>"`` should expand to every ``<group>.*`` sub-metric.

    Use the ``physics_iq`` group because it has multiple sub-metrics and
    none of them load model weights — the group-resolution behavior is
    what we're testing, not metric setup."""
    ev = create_evaluator(metrics="physics_iq", device="cpu")
    try:
        names = ev.metric_names
        assert all(n.startswith("physics_iq.") for n in names)
        # Don't pin an exact count — sub-metrics may be added later.
        assert "physics_iq.spatial_iou" in names
        assert "physics_iq.spatiotemporal_iou" in names
        assert "physics_iq.mse" in names
    finally:
        ev.shutdown()

fastvideo.tests.performance

Modules

fastvideo.tests.performance.compare_baseline

Track performance results and compare against historical baseline.

This script: 1) reads current benchmark results from fastvideo/tests/performance/results, 2) syncs the canonical baseline from the configured HF dataset repo, 3) compares each current record against the median of up to 5 prior records (filtered by gpu_type, successful only), 4) on persist runs (full-suite on main branch), writes the normalized record back to the HF dataset repo, 5) exits non-zero if any metric regresses by more than PERF_MAX_REGRESSION (default 5%).

Functions:
fastvideo.tests.performance.compare_baseline.normalize_performance_result
normalize_performance_result(result: dict[str, Any]) -> dict[str, Any]

Normalize a raw perf_*.json result into the HF tracking schema.

The Buildkite artifact intentionally keeps the raw benchmark output from test_inference_performance.py. Baseline comparison, main-branch persistence, and manual baseline reseeds should all use this mapping so the stored HF records do not drift from the artifact schema.

Source code in fastvideo/tests/performance/compare_baseline.py
def normalize_performance_result(result: dict[str, Any]) -> dict[str, Any]:
    """Normalize a raw perf_*.json result into the HF tracking schema.

    The Buildkite artifact intentionally keeps the raw benchmark output from
    test_inference_performance.py. Baseline comparison, main-branch persistence,
    and manual baseline reseeds should all use this mapping so the stored HF
    records do not drift from the artifact schema.
    """
    benchmark_id = result.get("benchmark_id", "unknown")
    model_id = benchmark_id

    timestamp = result.get("timestamp")
    if not timestamp:
        timestamp = datetime.now(timezone.utc).isoformat()

    commit_sha = result.get("commit") or os.environ.get("BUILDKITE_COMMIT", "")
    latency = safe_float(result.get("avg_generation_time_s"))
    throughput = safe_float(result.get("throughput_fps"))
    memory = safe_float(result.get("max_peak_memory_mb"))
    text_encoder_time = safe_float(result.get("text_encoder_time_s"))
    dit_time = safe_float(result.get("dit_time_s"))
    vae_decode_time = safe_float(result.get("vae_decode_time_s"))

    return {
        "model_id": model_id,
        "timestamp": timestamp,
        "commit_sha": commit_sha,
        "gpu_type": result.get("device", "unknown"),
        "latency": latency,
        "throughput": throughput,
        "memory": memory,
        "text_encoder_time_s": text_encoder_time,
        "dit_time_s": dit_time,
        "vae_decode_time_s": vae_decode_time,
        "success": True,
    }
fastvideo.tests.performance.hf_store

Shared HuggingFace storage utilities for performance tracking.

Provides a single place for: - Syncing the HF dataset repo to a local directory - Loading raw JSON records (with optional recency filter) - Loading records as a normalized pandas DataFrame - Uploading individual result files back to HF - Common helpers: sanitize, safe_float

Functions:
fastvideo.tests.performance.hf_store.load_as_dataframe
load_as_dataframe(local_dir: str, *, days: int | None = None, successful_only: bool = False) -> DataFrame

Load and normalize records from local_dir into a pandas DataFrame.

Combines :func:load_records + :func:normalize_dataframe into a single call for consumers (e.g. the dashboard) that work exclusively with DataFrames.

Parameters:

Name Type Description Default
local_dir str

Root directory previously populated by :func:sync_from_hf.

required
days int | None

Passed through to :func:load_records.

None
successful_only bool

Passed through to :func:load_records.

False

Returns:

Type Description
DataFrame

Normalized DataFrame, or an empty DataFrame if no records were found.

Source code in fastvideo/tests/performance/hf_store.py
def load_as_dataframe(
    local_dir: str,
    *,
    days: int | None = None,
    successful_only: bool = False,
) -> pd.DataFrame:
    """Load and normalize records from *local_dir* into a pandas DataFrame.

    Combines :func:`load_records` + :func:`normalize_dataframe` into a single
    call for consumers (e.g. the dashboard) that work exclusively with
    DataFrames.

    Args:
        local_dir: Root directory previously populated by :func:`sync_from_hf`.
        days: Passed through to :func:`load_records`.
        successful_only: Passed through to :func:`load_records`.

    Returns:
        Normalized DataFrame, or an empty DataFrame if no records were found.
    """
    records = load_records(local_dir, days=days, successful_only=successful_only)
    if not records:
        return pd.DataFrame()

    df = pd.DataFrame(records)
    return normalize_dataframe(df)
fastvideo.tests.performance.hf_store.load_records
load_records(local_dir: str, *, days: int | None = None, successful_only: bool = False) -> list[dict[str, Any]]

Return raw JSON dicts from local_dir.

Parameters:

Name Type Description Default
local_dir str

Root directory previously populated by :func:sync_from_hf.

required
days int | None

When set, discard records whose timestamp is older than this many days. Records with a missing/unparsable timestamp are kept.

None
successful_only bool

When True, only records with success=True are returned. Useful when building a regression baseline.

False

Returns:

Type Description
list[dict[str, Any]]

List of raw dicts sorted by timestamp ascending (records that could

list[dict[str, Any]]

not be parsed are silently skipped).

Source code in fastvideo/tests/performance/hf_store.py
def load_records(
    local_dir: str,
    *,
    days: int | None = None,
    successful_only: bool = False,
) -> list[dict[str, Any]]:
    """Return raw JSON dicts from *local_dir*.

    Args:
        local_dir: Root directory previously populated by :func:`sync_from_hf`.
        days: When set, discard records whose ``timestamp`` is older than this
            many days. Records with a missing/unparsable timestamp are kept.
        successful_only: When True, only records with ``success=True`` are
            returned. Useful when building a regression baseline.

    Returns:
        List of raw dicts sorted by ``timestamp`` ascending (records that could
        not be parsed are silently skipped).
    """
    cutoff: datetime | None = None
    if days is not None:
        cutoff = datetime.now(timezone.utc) - timedelta(days=days)

    records: list[dict[str, Any]] = []

    for path in sorted(glob.glob(os.path.join(local_dir, "**", "*.json"), recursive=True)):
        try:
            with open(path, encoding="utf-8") as fh:
                data: dict[str, Any] = json.load(fh)
        except (OSError, json.JSONDecodeError):
            continue

        if successful_only and not data.get("success", True):
            continue

        if cutoff is not None:
            raw_ts = data.get("timestamp")
            if raw_ts:
                try:
                    ts = datetime.fromisoformat(raw_ts)
                    if ts.tzinfo is None:
                        ts = ts.replace(tzinfo=timezone.utc)
                    if ts < cutoff:
                        continue
                except ValueError:
                    pass  # keep records with unparsable timestamps

        records.append(data)

    return records
fastvideo.tests.performance.hf_store.load_records_for_model
load_records_for_model(local_dir: str, model_id: str, gpu_type: str | None = None, *, last_n: int | None = None, successful_only: bool = True) -> list[dict[str, Any]]

Return records for a specific model_id, optionally filtered by GPU.

Parameters:

Name Type Description Default
local_dir str

Root directory previously populated by :func:sync_from_hf.

required
model_id str

Matches the model_id field inside each JSON record.

required
gpu_type str | None

When set, only records whose gpu_type matches are returned.

None
last_n int | None

When set, return only the most recent n records (after all other filters). Useful for sliding-window baseline calculations.

None
successful_only bool

Passed through to :func:load_records.

True

Returns:

Type Description
list[dict[str, Any]]

List of matching dicts sorted by timestamp ascending.

Source code in fastvideo/tests/performance/hf_store.py
def load_records_for_model(
    local_dir: str,
    model_id: str,
    gpu_type: str | None = None,
    *,
    last_n: int | None = None,
    successful_only: bool = True,
) -> list[dict[str, Any]]:
    """Return records for a specific *model_id*, optionally filtered by GPU.

    Args:
        local_dir: Root directory previously populated by :func:`sync_from_hf`.
        model_id: Matches the ``model_id`` field inside each JSON record.
        gpu_type: When set, only records whose ``gpu_type`` matches are returned.
        last_n: When set, return only the most recent *n* records (after all
            other filters). Useful for sliding-window baseline calculations.
        successful_only: Passed through to :func:`load_records`.

    Returns:
        List of matching dicts sorted by timestamp ascending.
    """
    model_dir = os.path.join(local_dir, sanitize(model_id))
    if not os.path.isdir(model_dir):
        return []

    records = load_records(model_dir, successful_only=successful_only)

    if gpu_type is not None:
        records = [r for r in records if r.get("gpu_type") == gpu_type]

    if last_n is not None:
        records = records[-last_n:]

    return records
fastvideo.tests.performance.hf_store.normalize_dataframe
normalize_dataframe(df: DataFrame) -> DataFrame

Apply standard type coercions to a raw records DataFrame.

  • Parses timestamp to UTC-aware datetime.
  • Coerces latency, throughput, memory, text_encoder_time_s, dit_time_s, vae_decode_time_s to float.
  • Adds a config_id column (first 7 chars of commit_sha).

Returns the mutated DataFrame (also modifies in place for efficiency).

Source code in fastvideo/tests/performance/hf_store.py
def normalize_dataframe(df: pd.DataFrame) -> pd.DataFrame:
    """Apply standard type coercions to a raw records DataFrame.

    - Parses ``timestamp`` to UTC-aware datetime.
    - Coerces ``latency``, ``throughput``, ``memory``, ``text_encoder_time_s``,
      ``dit_time_s``, ``vae_decode_time_s`` to float.
    - Adds a ``config_id`` column (first 7 chars of ``commit_sha``).

    Returns the mutated DataFrame (also modifies in place for efficiency).
    """
    if df.empty:
        return df

    df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True, errors="coerce")
    df["config_id"] = df.get("commit_sha", pd.Series(dtype=str)).fillna("unknown").str[:7]

    for col in _NUMERIC_COLS:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors="coerce")

    return df
fastvideo.tests.performance.hf_store.safe_float
safe_float(value: Any) -> float | None

Coerce value to float, returning None on failure.

Source code in fastvideo/tests/performance/hf_store.py
def safe_float(value: Any) -> float | None:
    """Coerce *value* to float, returning None on failure."""
    if value is None:
        return None
    try:
        return float(value)
    except (TypeError, ValueError):
        return None
fastvideo.tests.performance.hf_store.sanitize
sanitize(value: str) -> str

Return a filesystem- and HF-path-safe version of value.

Source code in fastvideo/tests/performance/hf_store.py
def sanitize(value: str) -> str:
    """Return a filesystem- and HF-path-safe version of *value*."""
    return re.sub(r"[^A-Za-z0-9._-]", "_", value)
fastvideo.tests.performance.hf_store.sync_from_hf
sync_from_hf(local_dir: str, *, strict: bool = False, reuse_existing: bool = False) -> str

Download the HF dataset repo snapshot to local_dir.

Returns local_dir so callers can chain: load_records(sync_from_hf(...)).

By default (strict=False) failures are logged and local_dir is returned unchanged, so dashboard / PR consumers stay resilient when HF is unavailable. Callers that depend on the sync for correctness (e.g. the main-branch baseline writer) must pass strict=True so that misconfig or transient HF errors fail loud rather than silently reset the baseline.

When reuse_existing=True, a previous successful sync in local_dir is reused only while its marker is fresh. This avoids duplicate HF snapshot checks when compare and dashboard scripts run sequentially in the same CI job, without silently reusing stale data in persistent local or long-lived runner environments.

Source code in fastvideo/tests/performance/hf_store.py
def sync_from_hf(
    local_dir: str,
    *,
    strict: bool = False,
    reuse_existing: bool = False,
) -> str:
    """Download the HF dataset repo snapshot to *local_dir*.

    Returns *local_dir* so callers can chain: ``load_records(sync_from_hf(...))``.

    By default (``strict=False``) failures are logged and *local_dir* is
    returned unchanged, so dashboard / PR consumers stay resilient when HF is
    unavailable. Callers that depend on the sync for correctness (e.g. the
    main-branch baseline writer) must pass ``strict=True`` so that misconfig
    or transient HF errors fail loud rather than silently reset the baseline.

    When ``reuse_existing=True``, a previous successful sync in ``local_dir``
    is reused only while its marker is fresh. This avoids duplicate HF
    snapshot checks when compare and dashboard scripts run sequentially in the
    same CI job, without silently reusing stale data in persistent local or
    long-lived runner environments.
    """
    marker_path = _sync_marker_path(local_dir)
    if reuse_existing and os.path.exists(marker_path):
        if _sync_marker_is_fresh(marker_path):
            print(f"hf_store: reusing existing sync at {local_dir}")
            return local_dir
        os.remove(marker_path)
        print(f"hf_store: existing sync at {local_dir} is stale; refreshing")

    if not reuse_existing and os.path.exists(marker_path):
        os.remove(marker_path)

    if not HF_REPO_ID:
        msg = "hf_store: HF_REPO_ID not set"
        if strict:
            raise RuntimeError(f"{msg}; cannot sync.")
        print(f"{msg}, skipping sync.")
        return local_dir

    print(f"hf_store: syncing from {HF_REPO_ID}{local_dir}")
    try:
        snapshot_download(
            repo_id=HF_REPO_ID,
            repo_type="dataset",
            local_dir=local_dir,
            token=HF_TOKEN,
            allow_patterns="*.json",
        )
        os.makedirs(local_dir, exist_ok=True)
        with open(marker_path, "w", encoding="utf-8") as marker:
            json.dump({
                "repo_id": HF_REPO_ID,
                "synced_at": datetime.now(timezone.utc).isoformat(),
            }, marker)
    except Exception as exc:
        if strict:
            raise
        print(f"hf_store: sync skipped — {exc}")

    return local_dir
fastvideo.tests.performance.hf_store.upload_record
upload_record(local_path: str, record: dict[str, Any], *, strict: bool = False) -> None

Upload local_path to the HF repo under <model_id>/<filename>.

By default failures (missing token, network errors) are logged and swallowed. Pass strict=True when the upload is part of a write-path that must not silently lose records — otherwise the rolling baseline can stop advancing without any signal in the build log.

Source code in fastvideo/tests/performance/hf_store.py
def upload_record(
    local_path: str,
    record: dict[str, Any],
    *,
    strict: bool = False,
) -> None:
    """Upload *local_path* to the HF repo under ``<model_id>/<filename>``.

    By default failures (missing token, network errors) are logged and
    swallowed. Pass ``strict=True`` when the upload is part of a write-path
    that must not silently lose records — otherwise the rolling baseline can
    stop advancing without any signal in the build log.
    """
    if not HF_TOKEN:
        msg = "hf_store: HF_API_KEY not set"
        if strict:
            raise RuntimeError(f"{msg}; cannot upload.")
        print(f"{msg}, skipping upload.")
        return

    model_id = record.get("model_id", "unknown")
    path_in_repo = f"{sanitize(model_id)}/{os.path.basename(local_path)}"
    commit_sha = (record.get("commit_sha") or "unknown")[:7]

    api = HfApi(token=HF_TOKEN)
    try:
        api.upload_file(
            path_or_fileobj=local_path,
            path_in_repo=path_in_repo,
            repo_id=HF_REPO_ID,
            repo_type="dataset",
            commit_message=f"Perf: {model_id} at {commit_sha}",
        )
        print(f"hf_store: uploaded → {HF_REPO_ID}/{path_in_repo}")
    except Exception as exc:
        if strict:
            raise
        print(f"hf_store: upload failed — {exc}")
fastvideo.tests.performance.test_inference_performance

Config-driven inference performance tests.

Benchmark configs live in .buildkite/performance-benchmarks/tests/*.json. Each JSON file defines model params, generation kwargs, run config, and per-device thresholds. This test module auto-discovers all configs and parametrizes a single test function over them.

Classes
Functions:
fastvideo.tests.performance.test_inference_performance.test_inference_performance
test_inference_performance(cfg)

Measure generation latency, peak GPU memory, and component-level timings (text encoder, DiT, VAE decode). Assert each against device-aware thresholds.

Source code in fastvideo/tests/performance/test_inference_performance.py
@pytest.mark.parametrize(
    "cfg",
    _BENCHMARK_CONFIGS,
    ids=[c["benchmark_id"] for c in _BENCHMARK_CONFIGS],
)
def test_inference_performance(cfg):
    """Measure generation latency, peak GPU memory, and component-level timings
    (text encoder, DiT, VAE decode). Assert each against device-aware thresholds.
    """

    original_env = os.environ.get("FASTVIDEO_STAGE_LOGGING")
    os.environ["FASTVIDEO_STAGE_LOGGING"] = "1"
    try:
        _run_benchmark(cfg)
    finally:
        if original_env is None:
            os.environ.pop("FASTVIDEO_STAGE_LOGGING", None)
        else:
            os.environ["FASTVIDEO_STAGE_LOGGING"] = original_env

fastvideo.tests.ssim

Modules

fastvideo.tests.ssim.conftest
Functions:
fastvideo.tests.ssim.conftest.pytest_collection_modifyitems
pytest_collection_modifyitems(config, items)

Optionally keep only tests with a matching model_id parameter.

Source code in fastvideo/tests/ssim/conftest.py
def pytest_collection_modifyitems(config, items):
    """Optionally keep only tests with a matching model_id parameter."""
    model_id = os.environ.get("FASTVIDEO_SSIM_MODEL_ID")
    if not model_id:
        return

    selected = []
    deselected = []
    for item in items:
        callspec = getattr(item, "callspec", None)
        if callspec is None:
            deselected.append(item)
            continue
        if callspec.params.get("model_id") == model_id:
            selected.append(item)
        else:
            deselected.append(item)

    if deselected:
        config.hook.pytest_deselected(items=deselected)
    items[:] = selected
fastvideo.tests.ssim.latent_similarity_utils

Latent-space regression helpers for numerically fragile SSIM tests.

Motivation

Pixel-space SSIM is a poor regression signal for distilled / few-step models (e.g. LTX-2 distilled): a single mis-rounded bf16 accumulator in the VAE decoder can drive mean SSIM from ~0.95 to ~0.50 without any real quality regression.

Inspired by diffusers' "small signature slice + bounded full-tensor distance" testing philosophy, applied here to the pre-VAE latent rather than the decoded pixel/audio output:

  • tests/pipelines/ltx2/test_ltx2.py (diffusers) compares output_type='pt' (pixel) slices via torch.allclose(generated_slice, expected_slice, atol=1e-4);
  • tests/pipelines/stable_audio/test_stable_audio.py (diffusers) compares decoded audio samples via np.abs(expected - actual).max() < 1.5e-3;
  • tests/pipelines/cogvideo/test_cogvideox.py (diffusers) compares full pixel video tensors via numpy_cosine_similarity_distance(...) < 1e-3.

Diffusers does not assert on latents directly — that is a FastVideo adaptation. Distilled few-step pipelines amplify per-step bf16 noise enough that VAE-decoded comparisons are unreliable across our heterogeneous CI pool, so we move the assertion upstream of the VAE.

Design
  • Inference is run with output_type='latent' so DecodingStage hands back the un-decoded latent on result["samples"].
  • The reference artefact is a .pt bundle (tensor + metadata) hosted on the same HF dataset as the mp4 references, selected by <GPU>_reference_videos/<model_id>/<backend>/<prompt>.pt.
  • Two assertions are performed: 1. A small signature slice (default video: latent[0, :, 0, :3, :3]; audio: latent[0, :, :8]) is compared via cosine distance with a loose tolerance. Primary pass/fail gate. 2. The full latent is compared via cosine distance with a slightly tighter tolerance, guarding against shape-correct but globally drifted outputs.
  • Tolerances default to 5e-3 (slice) and 1e-2 (full). diffusers uses 1e-3 against deterministic CPU dummy components; we relax for cross-GPU-arch bf16 differences on the rented CI pool (A40/L40S/H100/B200).

The helper intentionally reuses build_init_kwargs / build_generation_kwargs from :mod:inference_similarity_utils so model params (vae tiling, sp_size, flow shift, …) flow through a single source of truth.

Classes
Functions:
fastvideo.tests.ssim.latent_similarity_utils.load_latent_reference
load_latent_reference(path: str) -> dict[str, Any]

Inverse of :func:save_latent_reference — always loads to cpu.

Enforces format_version == LATENT_REFERENCE_FORMAT_VERSION so a schema change forces a deliberate reseed instead of silently misinterpreting old artefacts.

Source code in fastvideo/tests/ssim/latent_similarity_utils.py
def load_latent_reference(path: str) -> dict[str, Any]:
    """Inverse of :func:`save_latent_reference` — always loads to cpu.

    Enforces ``format_version == LATENT_REFERENCE_FORMAT_VERSION`` so a
    schema change forces a deliberate reseed instead of silently
    misinterpreting old artefacts.
    """
    # ``weights_only=False`` is required because the payload is a dict of
    # tensors + plain-Python metadata (slice_spec, prompt, ...). The trust
    # boundary is the controlled HF dataset configured via
    # FASTVIDEO_SSIM_REFERENCE_HF_REPO (default
    # FastVideo/ssim-reference-videos), which is org-write-gated.
    payload = torch.load(path, map_location="cpu", weights_only=False)
    fmt = payload.get("format_version") if isinstance(payload, dict) else None
    if fmt != LATENT_REFERENCE_FORMAT_VERSION:
        raise ValueError(
            f"Latent reference at {path!r} has format_version={fmt!r}; "
            f"expected {LATENT_REFERENCE_FORMAT_VERSION}. Re-seed via the "
            "test that produces this artefact, then re-upload through "
            "fastvideo/tests/ssim/reference_videos_cli.py upload.")
    return payload
fastvideo.tests.ssim.latent_similarity_utils.run_text_to_latent_similarity_test
run_text_to_latent_similarity_test(*, logger: Logger, script_dir: str, device_reference_folder: str, prompt: str, attention_backend_name: str, model_id: str, default_params_map: dict[str, dict[str, object]], full_quality_params_map: dict[str, dict[str, object]], slice_cosine_threshold: float = 0.005, full_cosine_threshold: float = 0.01, init_kwargs_override: dict[str, object] | None = None, generation_kwargs_override: dict[str, object] | None = None, slice_spec: dict[str, Any] | None = None) -> dict[str, float]

Run T2V (or T2A) inference with output_type='latent' and compare to a reference latent.

Returns the computed metrics dict on success. Raises AssertionError if any cosine tolerance is exceeded and FileNotFoundError if the reference artefact is missing.

Source code in fastvideo/tests/ssim/latent_similarity_utils.py
def run_text_to_latent_similarity_test(
    *,
    logger: Logger,
    script_dir: str,
    device_reference_folder: str,
    prompt: str,
    attention_backend_name: str,
    model_id: str,
    default_params_map: dict[str, dict[str, object]],
    full_quality_params_map: dict[str, dict[str, object]],
    slice_cosine_threshold: float = 5e-3,
    full_cosine_threshold: float = 1e-2,
    init_kwargs_override: dict[str, object] | None = None,
    generation_kwargs_override: dict[str, object] | None = None,
    slice_spec: dict[str, Any] | None = None,
) -> dict[str, float]:
    """Run T2V (or T2A) inference with ``output_type='latent'`` and
    compare to a reference latent.

    Returns the computed metrics dict on success. Raises
    ``AssertionError`` if any cosine tolerance is exceeded and
    ``FileNotFoundError`` if the reference artefact is missing.
    """
    spec = slice_spec if slice_spec is not None else DEFAULT_SLICE_SPEC
    with attention_backend(attention_backend_name):
        output_dir = build_generated_output_dir(
            script_dir,
            device_reference_folder,
            model_id,
            attention_backend_name,
        )
        prompt_prefix = prompt[:100].strip()
        output_latent_name = f"{prompt_prefix}{LATENT_REFERENCE_EXTENSION}"
        os.makedirs(output_dir, exist_ok=True)

        params_map = select_ssim_params(
            default_params_map,
            full_quality_params_map,
        )
        base_params = params_map[model_id]
        num_inference_steps = int(base_params["num_inference_steps"])

        init_kwargs = build_init_kwargs(base_params)
        if init_kwargs_override:
            init_kwargs.update(init_kwargs_override)
        # Always wins: the helper exists specifically to compare on latents,
        # so an override can never silently turn it back into a pixel run.
        init_kwargs["output_type"] = "latent"

        generation_kwargs = build_generation_kwargs(
            base_params,
            num_inference_steps,
            output_dir,
        )
        # We serialize latents ourselves; skip the RGB encoder path.
        generation_kwargs["save_video"] = False
        generation_kwargs["return_frames"] = True
        if generation_kwargs_override:
            generation_kwargs.update(generation_kwargs_override)

        generator: VideoGenerator | None = None
        try:
            generator = VideoGenerator.from_pretrained(
                model_path=base_params["model_path"],
                **init_kwargs,
            )
            result = generator.generate_video(prompt, **generation_kwargs)
        finally:
            shutdown_executor(generator)

    gen_latent = _extract_latent_from_result(result)

    generated_latent_path = os.path.join(output_dir, output_latent_name)
    save_latent_reference(
        generated_latent_path,
        gen_latent,
        metadata={
            "prompt": prompt,
            "model_id": model_id,
            "attention_backend": attention_backend_name,
            "num_inference_steps": num_inference_steps,
        },
        slice_spec=spec,
    )
    logger.info("Saved generated latent to %s", generated_latent_path)

    reference_folder = build_reference_folder_path(
        script_dir,
        device_reference_folder,
        model_id,
        attention_backend_name,
    )
    if not os.path.exists(reference_folder):
        raise FileNotFoundError(
            f"Reference folder does not exist: {reference_folder}\n"
            f"To download references, run:\n"
            f"  python fastvideo/tests/ssim/reference_videos_cli.py download")

    reference_latent_path = os.path.join(
        reference_folder,
        output_latent_name,
    )
    if not os.path.exists(reference_latent_path):
        raise FileNotFoundError(
            "Reference latent missing for prompt/backend: "
            f"{reference_latent_path}")

    return _assert_latent_similarity(
        logger=logger,
        gen_latent=gen_latent,
        reference_path=reference_latent_path,
        slice_cosine_threshold=slice_cosine_threshold,
        full_cosine_threshold=full_cosine_threshold,
        model_id=model_id,
        attention_backend_name=attention_backend_name,
        output_dir=output_dir,
        generated_path=generated_latent_path,
        num_inference_steps=num_inference_steps,
        prompt=prompt,
    )
fastvideo.tests.ssim.latent_similarity_utils.save_latent_reference
save_latent_reference(path: str, latent: Tensor, *, metadata: dict[str, Any], slice_spec: dict[str, Any] | None = None) -> None

Persist a latent bundle to path.

Storage format (dict pickled via torch.save):

  • latent: full latent as fp16 on cpu
  • shape: original shape (list)
  • dtype_original: str
  • expected_slice: fp32 1-D signature slice
  • slice_spec: dict describing how the slice was built
  • metadata: caller-provided context (prompt, backend, steps, …)
  • format_version: int

fp16 is lossy but bounded; it keeps ref artefacts small (~a few MB per prompt) while preserving enough dynamic range for cosine-based regression. Slice values stay fp32 because the primary assertion is computed against them.

Source code in fastvideo/tests/ssim/latent_similarity_utils.py
def save_latent_reference(
    path: str,
    latent: torch.Tensor,
    *,
    metadata: dict[str, Any],
    slice_spec: dict[str, Any] | None = None,
) -> None:
    """Persist a latent bundle to ``path``.

    Storage format (dict pickled via ``torch.save``):

    * ``latent``: full latent as fp16 on cpu
    * ``shape``: original shape (list)
    * ``dtype_original``: str
    * ``expected_slice``: fp32 1-D signature slice
    * ``slice_spec``: dict describing how the slice was built
    * ``metadata``: caller-provided context (prompt, backend, steps, …)
    * ``format_version``: int

    fp16 is lossy but bounded; it keeps ref artefacts small (~a few MB
    per prompt) while preserving enough dynamic range for cosine-based
    regression. Slice values stay fp32 because the primary assertion is
    computed against them.
    """
    spec = slice_spec if slice_spec is not None else DEFAULT_SLICE_SPEC
    parent = os.path.dirname(path)
    if parent:
        os.makedirs(parent, exist_ok=True)

    latent_cpu = latent.detach().to("cpu")
    payload: dict[str, Any] = {
        "latent": latent_cpu.to(torch.float16),
        "shape": list(latent_cpu.shape),
        "dtype_original": str(latent_cpu.dtype),
        "expected_slice": _extract_expected_slice(latent_cpu, spec),
        "slice_spec": spec,
        "metadata": metadata,
        "format_version": LATENT_REFERENCE_FORMAT_VERSION,
    }
    torch.save(payload, path)
fastvideo.tests.ssim.latent_similarity_utils.write_latent_similarity_results
write_latent_similarity_results(output_dir: str, metrics: dict[str, float], *, reference_path: str, generated_path: str, num_inference_steps: int, prompt: str, model_id: str, attention_backend_name: str, slice_spec: dict[str, Any], slice_cosine_threshold: float, full_cosine_threshold: float, passed: bool) -> bool

Persist latent regression metrics next to the generated artefact.

Mirrors :func:fastvideo.tests.utils.write_ssim_results so downstream CI tooling can scrape one schema for both pixel and latent runs. The filename is steps{N}_{prompt[:100]}_latent.json.

Source code in fastvideo/tests/ssim/latent_similarity_utils.py
def write_latent_similarity_results(
    output_dir: str,
    metrics: dict[str, float],
    *,
    reference_path: str,
    generated_path: str,
    num_inference_steps: int,
    prompt: str,
    model_id: str,
    attention_backend_name: str,
    slice_spec: dict[str, Any],
    slice_cosine_threshold: float,
    full_cosine_threshold: float,
    passed: bool,
) -> bool:
    """Persist latent regression metrics next to the generated artefact.

    Mirrors :func:`fastvideo.tests.utils.write_ssim_results` so downstream
    CI tooling can scrape one schema for both pixel and latent runs.
    The filename is ``steps{N}_{prompt[:100]}_latent.json``.
    """
    try:
        os.makedirs(output_dir, exist_ok=True)
        prompt_prefix = prompt[:100].strip()
        filename = f"steps{num_inference_steps}_{prompt_prefix}_latent.json"
        target = os.path.join(output_dir, filename)
        payload = {
            "metrics": metrics,
            "reference_latent": reference_path,
            "generated_latent": generated_path,
            "model_id": model_id,
            "attention_backend": attention_backend_name,
            "slice_spec": slice_spec,
            "thresholds": {
                "slice_cosine": slice_cosine_threshold,
                "full_cosine": full_cosine_threshold,
            },
            "passed": passed,
            "parameters": {
                "num_inference_steps": num_inference_steps,
                "prompt": prompt,
            },
        }
        with open(target, "w", encoding="utf-8") as handle:
            json.dump(payload, handle, indent=2, sort_keys=True)
        return True
    except OSError:
        return False
fastvideo.tests.ssim.reference_videos_cli
Functions:
fastvideo.tests.ssim.reference_videos_cli.ensure_reference_videos_available
ensure_reference_videos_available(*, local_dir: Path | None = None, repo_id: str | None = None, repo_type: str | None = None, quality_tier: str = DEFAULT_OUTPUT_QUALITY_TIER) -> bool

Return True if downloaded from HF, False if already present locally.

Source code in fastvideo/tests/ssim/reference_videos_cli.py
def ensure_reference_videos_available(
    *,
    local_dir: Path | None = None,
    repo_id: str | None = None,
    repo_type: str | None = None,
    quality_tier: str = DEFAULT_OUTPUT_QUALITY_TIER,
) -> bool:
    """Return True if downloaded from HF, False if already present locally."""
    if quality_tier not in QUALITY_TIERS:
        raise ValueError(f"Unsupported quality tier: {quality_tier}")
    target_dir = local_dir or _ssim_dir()
    lock_path = target_dir / ".reference_videos_download.lock"
    with _exclusive_download_lock(lock_path):
        if _has_local_reference_videos(target_dir, quality_tier):
            print(f"Reference videos ({quality_tier}) already available at {target_dir}")
            return False

        resolved_repo_id = repo_id or _default_repo_id()
        resolved_repo_type = repo_type or _default_repo_type()
        if not resolved_repo_id:
            raise RuntimeError(
                f"No local reference videos found and no HF repo configured.\nSet {HF_REPO_ENV_KEY} or pass --repo-id."
            )

        print(f"Repo ID: {resolved_repo_id}")
        print(f"Quality tier: {quality_tier}")
        print(f"No local {quality_tier} reference videos found under {target_dir}. Starting download...")
        try:
            download_reference_videos(
                repo_id=resolved_repo_id,
                repo_type=resolved_repo_type,
                local_dir=target_dir,
                quality_tiers=[quality_tier],
            )
            print(f"Download completed for {quality_tier} reference videos.")
        except Exception as exc:
            print(f"ERROR: Failed to download {quality_tier} reference videos from {resolved_repo_id}.")
            print(
                f"Suggested command to retry: "
                f"python fastvideo/tests/ssim/reference_videos_cli.py download "
                f"--quality-tier {quality_tier}"
            )
            raise

        if not _has_local_reference_videos(target_dir, quality_tier):
            raise RuntimeError(f"HF download completed but no {quality_tier} *_reference_videos content found.")
        return True
fastvideo.tests.ssim.test_flux2_similarity

Latent-slice regression tests for Flux2 text-to-image variants.

Flux2 currently has local parity coverage against the official/reference pipeline, but CI needs a small deterministic regression gate for seeded HF artefacts. Pixel-space comparisons are unnecessarily brittle for this first slot, so the test follows the latent helper pattern used by LTX-2: generate a single-image latent with the production recipe, persist the generated latent, and compare a stable latent signature plus the full tensor against the device reference.

The default and full-quality parameter maps intentionally carry the same recipe values for now. The --ssim-full-quality flag still switches the reference tier through conftest.py; separate full-quality recipes can be introduced after the initial Flux2 references have a stable CI window.

Functions:
fastvideo.tests.ssim.test_gamecraft_similarity

SSIM regression test for HunyuanGameCraft (T2V and I2V).

Generates a video with deterministic seed and camera trajectory, then compares against a device-specific reference video via MS-SSIM.

Reference videos must be pre-generated and stored under

reference_videos//_reference_videos/HunyuanGameCraft/ /

To create initial reference videos, run this test once and copy the generated videos into the appropriate reference folder.

Classes
Functions:
fastvideo.tests.ssim.test_gamecraft_similarity.test_gamecraft_i2v_similarity
test_gamecraft_i2v_similarity(prompt, ATTENTION_BACKEND, model_id)

Generate an I2V video with GameCraft and compare to reference via SSIM.

Source code in fastvideo/tests/ssim/test_gamecraft_similarity.py
@pytest.mark.parametrize("prompt", I2V_TEST_PROMPTS)
@pytest.mark.parametrize("ATTENTION_BACKEND", ["FLASH_ATTN"])
@pytest.mark.parametrize("model_id", list(I2V_MODEL_TO_PARAMS.keys()))
def test_gamecraft_i2v_similarity(prompt, ATTENTION_BACKEND, model_id):
    """Generate an I2V video with GameCraft and compare to reference via SSIM."""
    os.environ["FASTVIDEO_ATTENTION_BACKEND"] = ATTENTION_BACKEND

    script_dir = os.path.dirname(os.path.abspath(__file__))
    output_dir = build_generated_output_dir(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )
    output_video_name = f"{prompt[:100].strip()}.mp4"

    os.makedirs(output_dir, exist_ok=True)

    params_map = select_ssim_params(
        I2V_MODEL_TO_PARAMS,
        FULL_QUALITY_I2V_MODEL_TO_PARAMS,
    )
    BASE_PARAMS = params_map[model_id]
    num_inference_steps = BASE_PARAMS["num_inference_steps"]

    # Build camera trajectory
    camera_states = _create_camera_trajectory(
        action=BASE_PARAMS["action"],
        height=BASE_PARAMS["height"],
        width=BASE_PARAMS["width"],
        num_frames=BASE_PARAMS["num_frames"],
        action_speed=BASE_PARAMS["action_speed"],
        dtype=torch.bfloat16,
    )

    init_kwargs = {
        "num_gpus": BASE_PARAMS["num_gpus"],
        "use_fsdp_inference": True,
        "dit_cpu_offload": True,
        "vae_cpu_offload": True,
        "text_encoder_cpu_offload": True,
        "pin_cpu_memory": True,
    }

    generation_kwargs = {
        "num_inference_steps": num_inference_steps,
        "output_path": output_dir,
        "image_path": BASE_PARAMS["image_path"],
        "height": BASE_PARAMS["height"],
        "width": BASE_PARAMS["width"],
        "num_frames": BASE_PARAMS["num_frames"],
        "guidance_scale": BASE_PARAMS["guidance_scale"],
        "seed": BASE_PARAMS["seed"],
        "fps": 24,
        "camera_states": camera_states,
        "negative_prompt": BASE_PARAMS.get("negative_prompt", ""),
        "save_video": True,
    }

    generator: VideoGenerator | None = None
    try:
        generator = VideoGenerator.from_pretrained(
            model_path=BASE_PARAMS["model_path"], **init_kwargs
        )
        generator.generate_video(prompt, **generation_kwargs)
    finally:
        _shutdown_executor(generator)

    assert os.path.exists(output_dir), (
        f"Output video was not generated at {output_dir}"
    )

    # Compare to reference
    reference_folder = build_reference_folder_path(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )

    if not os.path.exists(reference_folder):
        logger.error("Reference folder missing")
        raise FileNotFoundError(
            f"Reference video folder does not exist: {reference_folder}"
        )

    reference_video_name = None
    for filename in os.listdir(reference_folder):
        if filename.endswith(".mp4") and prompt[:100].strip() in filename:
            reference_video_name = filename
            break

    if not reference_video_name:
        logger.error(
            f"Reference video not found for prompt: {prompt} "
            f"with backend: {ATTENTION_BACKEND}"
        )
        raise FileNotFoundError("Reference video missing")

    reference_video_path = os.path.join(reference_folder, reference_video_name)
    generated_video_path = os.path.join(output_dir, output_video_name)

    logger.info(
        f"Computing SSIM between {reference_video_path} and {generated_video_path}"
    )
    ssim_values = compute_video_ssim_torchvision(
        reference_video_path, generated_video_path, use_ms_ssim=True
    )

    mean_ssim = ssim_values[0]
    logger.info(f"SSIM mean value: {mean_ssim}")
    logger.info(f"Writing SSIM results to directory: {output_dir}")

    success = write_ssim_results(
        output_dir,
        ssim_values,
        reference_video_path,
        generated_video_path,
        num_inference_steps,
        prompt,
    )

    if not success:
        logger.error("Failed to write SSIM results to file")

    min_acceptable_ssim = 0.93
    assert mean_ssim >= min_acceptable_ssim, (
        f"SSIM value {mean_ssim} is below threshold {min_acceptable_ssim} "
        f"for {model_id} with backend {ATTENTION_BACKEND}"
    )
fastvideo.tests.ssim.test_gamecraft_similarity.test_gamecraft_t2v_similarity
test_gamecraft_t2v_similarity(prompt, ATTENTION_BACKEND, model_id)

Generate a T2V video with GameCraft and compare to reference via SSIM.

Source code in fastvideo/tests/ssim/test_gamecraft_similarity.py
@pytest.mark.parametrize("prompt", TEST_PROMPTS)
@pytest.mark.parametrize("ATTENTION_BACKEND", ["FLASH_ATTN"])
@pytest.mark.parametrize("model_id", list(MODEL_TO_PARAMS.keys()))
def test_gamecraft_t2v_similarity(prompt, ATTENTION_BACKEND, model_id):
    """Generate a T2V video with GameCraft and compare to reference via SSIM."""
    os.environ["FASTVIDEO_ATTENTION_BACKEND"] = ATTENTION_BACKEND

    script_dir = os.path.dirname(os.path.abspath(__file__))
    output_dir = build_generated_output_dir(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )
    output_video_name = f"{prompt[:100].strip()}.mp4"

    os.makedirs(output_dir, exist_ok=True)

    params_map = select_ssim_params(
        MODEL_TO_PARAMS,
        FULL_QUALITY_MODEL_TO_PARAMS,
    )
    BASE_PARAMS = params_map[model_id]
    num_inference_steps = BASE_PARAMS["num_inference_steps"]

    # Build camera trajectory
    camera_states = _create_camera_trajectory(
        action=BASE_PARAMS["action"],
        height=BASE_PARAMS["height"],
        width=BASE_PARAMS["width"],
        num_frames=BASE_PARAMS["num_frames"],
        action_speed=BASE_PARAMS["action_speed"],
        dtype=torch.bfloat16,
    )

    init_kwargs = {
        "num_gpus": BASE_PARAMS["num_gpus"],
        "use_fsdp_inference": True,
        "dit_cpu_offload": True,
        "vae_cpu_offload": True,
        "text_encoder_cpu_offload": True,
        "pin_cpu_memory": True,
    }

    generation_kwargs = {
        "num_inference_steps": num_inference_steps,
        "output_path": output_dir,
        "height": BASE_PARAMS["height"],
        "width": BASE_PARAMS["width"],
        "num_frames": BASE_PARAMS["num_frames"],
        "guidance_scale": BASE_PARAMS["guidance_scale"],
        "seed": BASE_PARAMS["seed"],
        "fps": 24,
        "camera_states": camera_states,
        "negative_prompt": BASE_PARAMS.get("negative_prompt", ""),
        "save_video": True,
    }

    generator: VideoGenerator | None = None
    try:
        generator = VideoGenerator.from_pretrained(
            model_path=BASE_PARAMS["model_path"], **init_kwargs
        )
        generator.generate_video(prompt, **generation_kwargs)
    finally:
        _shutdown_executor(generator)

    assert os.path.exists(output_dir), (
        f"Output video was not generated at {output_dir}"
    )

    # Compare to reference
    reference_folder = build_reference_folder_path(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )

    if not os.path.exists(reference_folder):
        logger.error("Reference folder missing")
        raise FileNotFoundError(
            f"Reference video folder does not exist: {reference_folder}"
        )

    reference_video_name = None
    for filename in os.listdir(reference_folder):
        if filename.endswith(".mp4") and prompt[:100].strip() in filename:
            reference_video_name = filename
            break

    if not reference_video_name:
        logger.error(
            f"Reference video not found for prompt: {prompt} "
            f"with backend: {ATTENTION_BACKEND}"
        )
        raise FileNotFoundError("Reference video missing")

    reference_video_path = os.path.join(reference_folder, reference_video_name)
    generated_video_path = os.path.join(output_dir, output_video_name)

    logger.info(
        f"Computing SSIM between {reference_video_path} and {generated_video_path}"
    )
    ssim_values = compute_video_ssim_torchvision(
        reference_video_path, generated_video_path, use_ms_ssim=True
    )

    mean_ssim = ssim_values[0]
    logger.info(f"SSIM mean value: {mean_ssim}")
    logger.info(f"Writing SSIM results to directory: {output_dir}")

    success = write_ssim_results(
        output_dir,
        ssim_values,
        reference_video_path,
        generated_video_path,
        num_inference_steps,
        prompt,
    )

    if not success:
        logger.error("Failed to write SSIM results to file")

    min_acceptable_ssim = 0.93
    assert mean_ssim >= min_acceptable_ssim, (
        f"SSIM value {mean_ssim} is below threshold {min_acceptable_ssim} "
        f"for {model_id} with backend {ATTENTION_BACKEND}"
    )
fastvideo.tests.ssim.test_gen3c_similarity

SSIM regression test for GEN3C video generation.

Compares newly generated GEN3C videos against device-specific reference videos using MS-SSIM to detect quality regressions across code changes.

Usage
Requires 1+ GPU and reference videos.

pytest fastvideo/tests/ssim/test_gen3c_similarity.py -v

Environment variables

GEN3C_MODEL_PATH - Diffusers-format GEN3C model path/repo id. Default: FastVideo/GEN3C-Cosmos-7B-Diffusers (local converted path also supported)

Classes
Functions:
fastvideo.tests.ssim.test_gen3c_similarity.test_gen3c_inference_similarity
test_gen3c_inference_similarity(prompt, ATTENTION_BACKEND, model_id)

Generate a GEN3C video and compare against the reference using MS-SSIM.

Source code in fastvideo/tests/ssim/test_gen3c_similarity.py
@pytest.mark.skipif(
    device_reference_folder is None,
    reason=f"No reference videos for device {device_name}",
)
@pytest.mark.parametrize("prompt", TEST_PROMPTS)
@pytest.mark.parametrize("ATTENTION_BACKEND", ["TORCH_SDPA"])
@pytest.mark.parametrize("model_id", list(MODEL_TO_PARAMS.keys()))
def test_gen3c_inference_similarity(prompt, ATTENTION_BACKEND, model_id):
    """
    Generate a GEN3C video and compare against the reference using MS-SSIM.
    """
    os.environ["FASTVIDEO_ATTENTION_BACKEND"] = ATTENTION_BACKEND

    script_dir = os.path.dirname(os.path.abspath(__file__))
    base_output_dir = os.path.join(script_dir, "generated_videos", model_id)
    output_dir = os.path.join(base_output_dir, ATTENTION_BACKEND)
    output_video_name = CANDIDATE_VIDEO_NAME
    os.makedirs(output_dir, exist_ok=True)

    BASE_PARAMS = MODEL_TO_PARAMS[model_id]
    num_inference_steps = BASE_PARAMS["num_inference_steps"]
    model_path = BASE_PARAMS["model_path"]

    # Guard common misconfigurations to keep CI behavior explicit.
    if model_path.lower() == "nvidia/gen3c-cosmos-7b":
        pytest.skip(
            "nvidia/GEN3C-Cosmos-7B is the official raw checkpoint repo, not Diffusers format. "
            "Use GEN3C_MODEL_PATH=FastVideo/GEN3C-Cosmos-7B-Diffusers or a local converted path."
        )

    local_like = model_path.startswith(("/", "./", "../"))
    if local_like and not os.path.exists(model_path):
        pytest.skip(
            f"Local GEN3C model path not found: {model_path}. "
            "Set GEN3C_MODEL_PATH to a valid local path or HF Diffusers repo id."
        )

    if os.path.exists(model_path):
        model_index_path = os.path.join(model_path, "model_index.json")
        if not os.path.exists(model_index_path):
            pytest.skip(
                f"GEN3C_MODEL_PATH is not Diffusers-format (missing model_index.json): {model_path}"
            )

    init_kwargs = {
        "num_gpus": BASE_PARAMS["num_gpus"],
        "sp_size": BASE_PARAMS["sp_size"],
        "tp_size": BASE_PARAMS["tp_size"],
    }
    if "flow_shift" in BASE_PARAMS:
        init_kwargs["flow_shift"] = BASE_PARAMS["flow_shift"]

    generation_kwargs = {
        "num_inference_steps": num_inference_steps,
        "output_path": os.path.join(output_dir, output_video_name),
        "height": BASE_PARAMS["height"],
        "width": BASE_PARAMS["width"],
        "num_frames": BASE_PARAMS["num_frames"],
        "guidance_scale": BASE_PARAMS["guidance_scale"],
        "embedded_cfg_scale": BASE_PARAMS["embedded_cfg_scale"],
        "seed": BASE_PARAMS["seed"],
        "image_path": BASE_PARAMS["image_path"],
        "fps": BASE_PARAMS["fps"],
    }

    if not os.path.exists(generation_kwargs["image_path"]):
        pytest.skip(
            f"GEN3C test image not found: {generation_kwargs['image_path']}. "
            "Set GEN3C_TEST_IMAGE_PATH to a valid local image."
        )

    # Keep local reruns deterministic: remove prior candidate outputs so
    # VideoGenerator does not auto-suffix (_1, _2, ...).
    stale_pattern = os.path.join(output_dir, "gen3c_ssim_candidate*.mp4")
    for stale_video in glob.glob(stale_pattern):
        os.remove(stale_video)

    generator = VideoGenerator.from_pretrained(
        model_path=model_path, **init_kwargs
    )
    generator.generate_video(prompt, **generation_kwargs)

    if isinstance(generator.executor, MultiprocExecutor):
        generator.executor.shutdown()

    assert os.path.exists(output_dir), f"Output not generated at {output_dir}"

    reference_folder = os.path.join(
        script_dir, device_reference_folder, model_id, ATTENTION_BACKEND
    )
    if not os.path.exists(reference_folder):
        raise FileNotFoundError(
            f"Reference video folder does not exist: {reference_folder}"
        )

    reference_video_path = os.path.join(reference_folder, BASELINE_VIDEO_NAME)
    if not os.path.exists(reference_video_path):
        raise FileNotFoundError(
            f"Reference video not found: {reference_video_path}"
        )

    generated_video_path = os.path.join(output_dir, output_video_name)

    logger.info(f"Computing SSIM: {reference_video_path} vs {generated_video_path}")
    ssim_values = compute_video_ssim_torchvision(
        reference_video_path, generated_video_path, use_ms_ssim=True
    )

    mean_ssim = ssim_values[0]
    logger.info(f"GEN3C SSIM mean: {mean_ssim}")

    write_ssim_results(
        output_dir,
        ssim_values,
        reference_video_path,
        generated_video_path,
        num_inference_steps,
        prompt,
    )

    # GEN3C SSIM threshold for stable L40S reference comparisons.
    min_acceptable_ssim = 0.93
    assert mean_ssim >= min_acceptable_ssim, (
        f"SSIM {mean_ssim:.4f} < {min_acceptable_ssim} for {model_id} / {ATTENTION_BACKEND}"
    )
fastvideo.tests.ssim.test_lingbot_similarity

SSIM-based similarity test for LingBotWorld I2V with camera control.

Camera trajectory is loaded from LingBot example npy files (poses.npy/intrinsics.npy), matching the official script workflow.

Note: num_inference_steps is reduced to 4 for faster CI.

Classes
Functions:
fastvideo.tests.ssim.test_longcat_similarity

SSIM-based similarity tests for LongCat video generation.

Tests three LongCat modes: - T2V (Text-to-Video): 480p video from text prompt - I2V (Image-to-Video): 480p video from image + text prompt
- VC (Video Continuation): 480p video continuation from input video + text prompt

Sampling parameters are derived from: - examples/inference/basic/basic_longcat_t2v.py - examples/inference/basic/basic_longcat_i2v.py - examples/inference/basic/basic_longcat_vc.py

Note: num_inference_steps is reduced for CI speed (4 steps vs 50 in examples).

Classes
Functions:
fastvideo.tests.ssim.test_longcat_similarity.test_longcat_i2v_similarity
test_longcat_i2v_similarity(prompt: str, ATTENTION_BACKEND: str)

Test LongCat I2V inference and compare output to reference videos using SSIM.

Parameters derived from examples/inference/basic/basic_longcat_i2v.py

Source code in fastvideo/tests/ssim/test_longcat_similarity.py
@pytest.mark.parametrize("prompt", I2V_TEST_PROMPTS)
@pytest.mark.parametrize("ATTENTION_BACKEND", ["FLASH_ATTN"])
def test_longcat_i2v_similarity(prompt: str, ATTENTION_BACKEND: str):
    """
    Test LongCat I2V inference and compare output to reference videos using SSIM.

    Parameters derived from examples/inference/basic/basic_longcat_i2v.py
    """
    os.environ["FASTVIDEO_ATTENTION_BACKEND"] = ATTENTION_BACKEND

    params = select_ssim_params(LONGCAT_I2V_PARAMS, LONGCAT_I2V_FULL_QUALITY_PARAMS)

    script_dir = os.path.dirname(os.path.abspath(__file__))
    model_id = "LongCat-Video-I2V"

    output_dir = build_generated_output_dir(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )
    output_video_name = f"{prompt[:100].strip()}.mp4"
    os.makedirs(output_dir, exist_ok=True)

    # Get image path for this prompt
    prompt_idx = I2V_TEST_PROMPTS.index(prompt)
    image_path = _resolve_asset_path(I2V_IMAGE_PATHS[prompt_idx])

    init_kwargs = {
        "num_gpus": params["num_gpus"],
        "use_fsdp_inference": True,
        "dit_cpu_offload": True,
        "vae_cpu_offload": True,
        "text_encoder_cpu_offload": True,
        "enable_bsa": False,
    }

    generation_kwargs = {
        "output_path": output_dir,
        "image_path": image_path,
        "height": params["height"],
        "width": params["width"],
        "num_frames": params["num_frames"],
        "num_inference_steps": params["num_inference_steps"],
        "guidance_scale": params["guidance_scale"],
        "fps": params["fps"],
        "seed": params["seed"],
        "negative_prompt": params["negative_prompt"],
    }

    generator = VideoGenerator.from_pretrained(
        model_path=params["model_path"], **init_kwargs
    )
    generator.generate_video(prompt, **generation_kwargs)
    generator.shutdown()

    generated_video_path = os.path.join(output_dir, output_video_name)
    assert os.path.exists(generated_video_path), (
        f"Output video was not generated at {generated_video_path}"
    )

    # Find reference video
    reference_folder = build_reference_folder_path(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )
    if not os.path.exists(reference_folder):
        raise FileNotFoundError(
            f"Reference video folder does not exist: {reference_folder}"
        )

    reference_video_name = None
    for filename in os.listdir(reference_folder):
        if filename.endswith(".mp4") and prompt[:100].strip() in filename:
            reference_video_name = filename
            break

    if not reference_video_name:
        raise FileNotFoundError(
            f"Reference video not found for prompt: {prompt[:50]}... with backend: {ATTENTION_BACKEND}"
        )

    reference_video_path = os.path.join(reference_folder, reference_video_name)

    logger.info(f"Computing SSIM between {reference_video_path} and {generated_video_path}")
    ssim_values = compute_video_ssim_torchvision(
        reference_video_path, generated_video_path, use_ms_ssim=True
    )

    mean_ssim = ssim_values[0]
    logger.info(f"SSIM mean value: {mean_ssim}")

    write_ssim_results(
        output_dir, ssim_values, reference_video_path, generated_video_path,
        params["num_inference_steps"], prompt
    )

    min_acceptable_ssim = 0.90
    assert mean_ssim >= min_acceptable_ssim, (
        f"SSIM value {mean_ssim} is below threshold {min_acceptable_ssim} "
        f"for {model_id} with backend {ATTENTION_BACKEND}"
    )
fastvideo.tests.ssim.test_longcat_similarity.test_longcat_t2v_similarity
test_longcat_t2v_similarity(prompt: str, ATTENTION_BACKEND: str)

Test LongCat T2V inference and compare output to reference videos using SSIM.

Parameters derived from examples/inference/basic/basic_longcat_t2v.py

Source code in fastvideo/tests/ssim/test_longcat_similarity.py
@pytest.mark.parametrize("prompt", T2V_TEST_PROMPTS)
@pytest.mark.parametrize("ATTENTION_BACKEND", ["FLASH_ATTN"])
def test_longcat_t2v_similarity(prompt: str, ATTENTION_BACKEND: str):
    """
    Test LongCat T2V inference and compare output to reference videos using SSIM.

    Parameters derived from examples/inference/basic/basic_longcat_t2v.py
    """
    os.environ["FASTVIDEO_ATTENTION_BACKEND"] = ATTENTION_BACKEND

    params = select_ssim_params(LONGCAT_T2V_PARAMS, LONGCAT_T2V_FULL_QUALITY_PARAMS)

    script_dir = os.path.dirname(os.path.abspath(__file__))
    model_id = "LongCat-Video-T2V"

    output_dir = build_generated_output_dir(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )
    output_video_name = f"{prompt[:100].strip()}.mp4"
    os.makedirs(output_dir, exist_ok=True)

    init_kwargs = {
        "num_gpus": params["num_gpus"],
        "use_fsdp_inference": True,
        "dit_cpu_offload": True,
        "vae_cpu_offload": True,
        "text_encoder_cpu_offload": True,
        "enable_bsa": False,
    }

    generation_kwargs = {
        "output_path": output_dir,
        "height": params["height"],
        "width": params["width"],
        "num_frames": params["num_frames"],
        "num_inference_steps": params["num_inference_steps"],
        "guidance_scale": params["guidance_scale"],
        "fps": params["fps"],
        "seed": params["seed"],
        "negative_prompt": params["negative_prompt"],
    }

    generator = VideoGenerator.from_pretrained(
        model_path=params["model_path"], **init_kwargs
    )
    generator.generate_video(prompt, **generation_kwargs)
    generator.shutdown()

    generated_video_path = os.path.join(output_dir, output_video_name)
    assert os.path.exists(generated_video_path), (
        f"Output video was not generated at {generated_video_path}"
    )

    # Find reference video
    reference_folder = build_reference_folder_path(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )
    if not os.path.exists(reference_folder):
        raise FileNotFoundError(
            f"Reference video folder does not exist: {reference_folder}"
        )

    reference_video_name = None
    for filename in os.listdir(reference_folder):
        if filename.endswith(".mp4") and prompt[:100].strip() in filename:
            reference_video_name = filename
            break

    if not reference_video_name:
        raise FileNotFoundError(
            f"Reference video not found for prompt: {prompt[:50]}... with backend: {ATTENTION_BACKEND}"
        )

    reference_video_path = os.path.join(reference_folder, reference_video_name)

    logger.info(f"Computing SSIM between {reference_video_path} and {generated_video_path}")
    ssim_values = compute_video_ssim_torchvision(
        reference_video_path, generated_video_path, use_ms_ssim=True
    )

    mean_ssim = ssim_values[0]
    logger.info(f"SSIM mean value: {mean_ssim}")

    write_ssim_results(
        output_dir, ssim_values, reference_video_path, generated_video_path,
        params["num_inference_steps"], prompt
    )

    min_acceptable_ssim = 0.90
    assert mean_ssim >= min_acceptable_ssim, (
        f"SSIM value {mean_ssim} is below threshold {min_acceptable_ssim} "
        f"for {model_id} with backend {ATTENTION_BACKEND}"
    )
fastvideo.tests.ssim.test_longcat_similarity.test_longcat_vc_similarity
test_longcat_vc_similarity(prompt: str, ATTENTION_BACKEND: str)

Test LongCat VC (Video Continuation) inference and compare output to reference videos using SSIM.

Parameters derived from examples/inference/basic/basic_longcat_vc.py

Source code in fastvideo/tests/ssim/test_longcat_similarity.py
@pytest.mark.parametrize("prompt", VC_TEST_PROMPTS)
@pytest.mark.parametrize("ATTENTION_BACKEND", ["FLASH_ATTN"])
def test_longcat_vc_similarity(prompt: str, ATTENTION_BACKEND: str):
    """
    Test LongCat VC (Video Continuation) inference and compare output to reference videos using SSIM.

    Parameters derived from examples/inference/basic/basic_longcat_vc.py
    """
    os.environ["FASTVIDEO_ATTENTION_BACKEND"] = ATTENTION_BACKEND

    params = select_ssim_params(LONGCAT_VC_PARAMS, LONGCAT_VC_FULL_QUALITY_PARAMS)

    script_dir = os.path.dirname(os.path.abspath(__file__))
    model_id = "LongCat-Video-VC"

    output_dir = build_generated_output_dir(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )
    output_video_name = f"{prompt[:100].strip()}.mp4"
    os.makedirs(output_dir, exist_ok=True)

    # Get video path for this prompt
    prompt_idx = VC_TEST_PROMPTS.index(prompt)
    video_path = _resolve_asset_path(VC_VIDEO_PATHS[prompt_idx])

    if not os.path.exists(video_path):
        pytest.skip(f"Input video not found at {video_path}")

    init_kwargs = {
        "num_gpus": params["num_gpus"],
        "use_fsdp_inference": False,
        "dit_cpu_offload": False,
        "vae_cpu_offload": True,
        "text_encoder_cpu_offload": True,
        "pin_cpu_memory": False,
        "enable_bsa": False,
    }

    generation_kwargs = {
        "output_path": output_dir,
        "video_path": video_path,
        "num_cond_frames": params["num_cond_frames"],
        "height": params["height"],
        "width": params["width"],
        "num_frames": params["num_frames"],
        "num_inference_steps": params["num_inference_steps"],
        "guidance_scale": params["guidance_scale"],
        "fps": params["fps"],
        "seed": params["seed"],
        "negative_prompt": params["negative_prompt"],
    }

    generator = VideoGenerator.from_pretrained(
        model_path=params["model_path"], **init_kwargs
    )
    generator.generate_video(prompt, **generation_kwargs)
    generator.shutdown()

    generated_video_path = os.path.join(output_dir, output_video_name)
    assert os.path.exists(generated_video_path), (
        f"Output video was not generated at {generated_video_path}"
    )

    # Find reference video
    reference_folder = build_reference_folder_path(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )
    if not os.path.exists(reference_folder):
        raise FileNotFoundError(
            f"Reference video folder does not exist: {reference_folder}"
        )

    reference_video_name = None
    for filename in os.listdir(reference_folder):
        if filename.endswith(".mp4") and prompt[:100].strip() in filename:
            reference_video_name = filename
            break

    if not reference_video_name:
        raise FileNotFoundError(
            f"Reference video not found for prompt: {prompt[:50]}... with backend: {ATTENTION_BACKEND}"
        )

    reference_video_path = os.path.join(reference_folder, reference_video_name)

    logger.info(f"Computing SSIM between {reference_video_path} and {generated_video_path}")
    ssim_values = compute_video_ssim_torchvision(
        reference_video_path, generated_video_path, use_ms_ssim=True
    )

    mean_ssim = ssim_values[0]
    logger.info(f"SSIM mean value: {mean_ssim}")

    write_ssim_results(
        output_dir, ssim_values, reference_video_path, generated_video_path,
        params["num_inference_steps"], prompt
    )

    min_acceptable_ssim = 0.90
    assert mean_ssim >= min_acceptable_ssim, (
        f"SSIM value {mean_ssim} is below threshold {min_acceptable_ssim} "
        f"for {model_id} with backend {ATTENTION_BACKEND}"
    )
fastvideo.tests.ssim.test_ltx2_similarity

Latent-slice regression test for LTX-2 distilled text-to-video.

Pixel-space SSIM is not a useful signal for this model: 4 distilled steps + bf16 attention + tiled VAE decode produce outputs that pass visual QA but occupy a very wide region in pixel space.

Inspired by diffusers' slice-vs-full regression philosophy — see diffusers/tests/pipelines/ltx2/test_ltx2.py (compares pixel slices via torch.allclose(..., atol=1e-4)) and diffusers/tests/pipelines/cogvideo/test_cogvideox.py (full pixel tensors via numpy_cosine_similarity_distance(...) < 1e-3). Diffusers itself does NOT compare latents; we apply the same "small signature slice + bounded full-tensor distance" idea to the pre-VAE latent because distilled few-step pipelines amplify per-step bf16 noise enough that VAE-decoded comparisons are unreliable.

Parameters are kept identical to the original SSIM run so that reference artefacts generated on Modal L40S remain bit-compatible with production inference.

Classes
Functions:
fastvideo.tests.ssim.test_matrixgame2_similarity
Classes
Functions:
fastvideo.tests.ssim.test_matrixgame2_similarity.test_matrixgame2_similarity
test_matrixgame2_similarity(prompt, ATTENTION_BACKEND, model_id)

Test that runs inference with different parameters and compares the output to reference videos using SSIM.

Source code in fastvideo/tests/ssim/test_matrixgame2_similarity.py
@pytest.mark.parametrize("prompt", TEST_PROMPTS)
@pytest.mark.parametrize("ATTENTION_BACKEND", ["FLASH_ATTN"])
@pytest.mark.parametrize("model_id", list(MODEL_TO_PARAMS.keys()))
def test_matrixgame2_similarity(prompt, ATTENTION_BACKEND, model_id):
    """
    Test that runs inference with different parameters and compares the output
    to reference videos using SSIM.
    """
    os.environ["FASTVIDEO_ATTENTION_BACKEND"] = ATTENTION_BACKEND

    script_dir = os.path.dirname(os.path.abspath(__file__))

    output_dir = build_generated_output_dir(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )
    output_video_name = "output.mp4"

    os.makedirs(output_dir, exist_ok=True)

    params_map = select_ssim_params(
        MODEL_TO_PARAMS,
        FULL_QUALITY_MODEL_TO_PARAMS,
    )
    BASE_PARAMS = params_map[model_id]
    num_inference_steps = BASE_PARAMS["num_inference_steps"]

    # Create action conditions for Matrix-Game 2.0
    actions = create_action_presets(
        BASE_PARAMS["num_frames"], keyboard_dim=BASE_PARAMS["keyboard_dim"], seed=BASE_PARAMS["seed"]
    )
    latent_frames = (BASE_PARAMS["num_frames"] - 1) // 4 + 1
    grid_sizes = torch.tensor([latent_frames, 44, 80])

    init_kwargs = {
        "num_gpus": BASE_PARAMS["num_gpus"],
        "use_fsdp_inference": True,
        "dit_layerwise_offload": False,
        "dit_cpu_offload": False,
        "vae_cpu_offload": False,
        "text_encoder_cpu_offload": True,
        "pin_cpu_memory": True,
    }

    generation_kwargs = {
        "num_inference_steps": num_inference_steps,
        "output_path": output_dir,
        "image_path": TEST_IMAGE_PATHS[0],
        "height": BASE_PARAMS["height"],
        "width": BASE_PARAMS["width"],
        "num_frames": BASE_PARAMS["num_frames"],
        "seed": BASE_PARAMS["seed"],
        "mouse_cond": actions["mouse"].unsqueeze(0),
        "keyboard_cond": actions["keyboard"].unsqueeze(0),
        "grid_sizes": grid_sizes,
        "save_video": True,
    }

    generator = VideoGenerator.from_pretrained(model_path=BASE_PARAMS["model_path"], **init_kwargs)
    generator.generate_video(prompt, **generation_kwargs)

    if isinstance(generator.executor, MultiprocExecutor):
        generator.executor.shutdown()

    assert os.path.exists(output_dir), f"Output video was not generated at {output_dir}"

    reference_folder = build_reference_folder_path(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )

    if not os.path.exists(reference_folder):
        logger.error("Reference folder missing")
        raise FileNotFoundError(f"Reference video folder does not exist: {reference_folder}")

    # Find the matching reference video based on the prompt
    reference_video_name = None

    for filename in os.listdir(reference_folder):
        if filename.endswith(".mp4"):
            reference_video_name = filename
            break

    if not reference_video_name:
        logger.error(f"Reference video not found for model: {model_id} with backend: {ATTENTION_BACKEND}")
        raise FileNotFoundError("Reference video missing")

    reference_video_path = os.path.join(reference_folder, reference_video_name)
    generated_video_path = os.path.join(output_dir, output_video_name)

    logger.info(f"Computing SSIM between {reference_video_path} and {generated_video_path}")
    ssim_values = compute_video_ssim_torchvision(reference_video_path, generated_video_path, use_ms_ssim=True)

    mean_ssim = ssim_values[0]
    logger.info(f"SSIM mean value: {mean_ssim}")
    logger.info(f"Writing SSIM results to directory: {output_dir}")

    success = write_ssim_results(
        output_dir,
        ssim_values,
        reference_video_path,
        generated_video_path,
        num_inference_steps,
        prompt,
    )

    if not success:
        logger.error("Failed to write SSIM results to file")

    min_acceptable_ssim = 0.98
    assert mean_ssim >= min_acceptable_ssim, (
        f"SSIM value {mean_ssim} is below threshold {min_acceptable_ssim} for {model_id} with backend {ATTENTION_BACKEND}"
    )
fastvideo.tests.ssim.test_matrixgame3_similarity
Classes
Functions:
fastvideo.tests.ssim.test_matrixgame3_similarity.test_matrixgame3_similarity
test_matrixgame3_similarity(prompt, ATTENTION_BACKEND, model_id)

Test that runs MG3 inference (action conditions auto-generated from seed) and compares the output to reference videos using SSIM.

Source code in fastvideo/tests/ssim/test_matrixgame3_similarity.py
@pytest.mark.parametrize("prompt", TEST_PROMPTS)
@pytest.mark.parametrize("ATTENTION_BACKEND", ["FLASH_ATTN"])
@pytest.mark.parametrize("model_id", list(MODEL_TO_PARAMS.keys()))
def test_matrixgame3_similarity(prompt, ATTENTION_BACKEND, model_id):
    """
    Test that runs MG3 inference (action conditions auto-generated from seed)
    and compares the output to reference videos using SSIM.
    """
    os.environ["FASTVIDEO_ATTENTION_BACKEND"] = ATTENTION_BACKEND

    script_dir = os.path.dirname(os.path.abspath(__file__))

    output_dir = build_generated_output_dir(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )

    os.makedirs(output_dir, exist_ok=True)

    params_map = select_ssim_params(
        MODEL_TO_PARAMS,
        FULL_QUALITY_MODEL_TO_PARAMS,
    )
    BASE_PARAMS = params_map[model_id]
    num_inference_steps = BASE_PARAMS["num_inference_steps"]

    init_kwargs = {
        "num_gpus": BASE_PARAMS["num_gpus"],
        "use_fsdp_inference": True,
        "dit_layerwise_offload": False,
        "dit_cpu_offload": False,
        "vae_cpu_offload": False,
        "text_encoder_cpu_offload": True,
        "pin_cpu_memory": True,
    }

    generation_kwargs = {
        "num_inference_steps": num_inference_steps,
        "output_path": output_dir,
        "image_path": TEST_IMAGE_PATHS[0],
        "height": BASE_PARAMS["height"],
        "width": BASE_PARAMS["width"],
        "num_frames": BASE_PARAMS["num_frames"],
        "guidance_scale": BASE_PARAMS["guidance_scale"],
        "seed": BASE_PARAMS["seed"],
        "save_video": True,
    }

    generator = VideoGenerator.from_pretrained(model_path=BASE_PARAMS["model_path"], **init_kwargs)
    generator.generate_video(prompt, **generation_kwargs)

    if isinstance(generator.executor, MultiprocExecutor):
        generator.executor.shutdown()

    assert os.path.exists(output_dir), f"Output video was not generated at {output_dir}"

    reference_folder = build_reference_folder_path(
        script_dir,
        device_reference_folder,
        model_id,
        ATTENTION_BACKEND,
    )

    if not os.path.exists(reference_folder):
        logger.error("Reference folder missing")
        raise FileNotFoundError(f"Reference video folder does not exist: {reference_folder}")

    prompt_prefix = prompt[:100].strip().rstrip(".")

    def _find_mp4(folder: str) -> str | None:
        for filename in sorted(os.listdir(folder)):
            if filename.endswith(".mp4") and prompt_prefix in filename:
                return filename
        return None

    reference_video_name = _find_mp4(reference_folder)
    if not reference_video_name:
        logger.error(f"Reference video not found for model: {model_id} with backend: {ATTENTION_BACKEND}")
        raise FileNotFoundError("Reference video missing")

    generated_video_name = _find_mp4(output_dir)
    if not generated_video_name:
        logger.error(f"Generated video not found for model: {model_id} with backend: {ATTENTION_BACKEND}")
        raise FileNotFoundError(f"Generated video missing in {output_dir}")

    reference_video_path = os.path.join(reference_folder, reference_video_name)
    generated_video_path = os.path.join(output_dir, generated_video_name)

    logger.info(f"Computing SSIM between {reference_video_path} and {generated_video_path}")
    ssim_values = compute_video_ssim_torchvision(reference_video_path, generated_video_path, use_ms_ssim=True)

    mean_ssim = ssim_values[0]
    logger.info(f"SSIM mean value: {mean_ssim}")
    logger.info(f"Writing SSIM results to directory: {output_dir}")

    success = write_ssim_results(
        output_dir,
        ssim_values,
        reference_video_path,
        generated_video_path,
        num_inference_steps,
        prompt,
    )

    if not success:
        logger.error("Failed to write SSIM results to file")

    min_acceptable_ssim = 0.98
    assert mean_ssim >= min_acceptable_ssim, (
        f"SSIM value {mean_ssim} is below threshold {min_acceptable_ssim} for {model_id} with backend {ATTENTION_BACKEND}"
    )
fastvideo.tests.ssim.test_stable_audio_similarity

Latent-slice regression test for Stable Audio Open 1.0 text-to-audio.

Companion to test_ltx2_similarity.py — applies the same latent cosine-distance philosophy to 3-D audio latents [B, 64, T_latent].

Why latent-space and not waveform-space SSIM: - dpmpp-3m-sde (k-diffusion) accumulates per-step bf16 noise; the Oobleck VAE then magnifies any residual drift into the time-domain waveform. A few mis-rounded accumulators drive sample-wise diff well past audible thresholds without indicating a real regression. - Diffusers' own tests/pipelines/stable_audio/test_stable_audio.py compares decoded audio samples via np.abs(expected - actual).max() < 1.5e-3; that bound holds for CPU dummy components but does not survive cross-architecture bf16 on our CI pool (L40S/A40/H100/B200). - Comparing the pre-VAE latent moves the assertion upstream of the dominant noise source.

Slice spec: audio_first_8_timesteps returns latent[0, :, :8] (= 64 channels × 8 latent timesteps = 512 elements). The full latent [1, 64, 1024] for SA-1.0 is also compared via cosine distance.

Classes
Functions:

fastvideo.tests.train

Modules

fastvideo.tests.train.callbacks
Modules
fastvideo.tests.train.callbacks.test_callback

CPU-only unit tests for :mod:fastvideo.train.callbacks.callback.

Covers the Callback base class no-op contract and the CallbackDict instantiation / dispatch / state-dict logic.

The concrete callback subclasses (GradNormClipCallback, EMACallback, ValidationCallback) have their own test files.

Classes
fastvideo.tests.train.callbacks.test_ema

CPU-only unit tests for :mod:fastvideo.train.callbacks.ema.

Exercises the EMA lifecycle (lazy init, start_iter gating, decay math, ema_context swap, state-dict round-trip) on a tiny CPU nn.Linear. EMA_FSDP works without dist.init_process_group because dist.is_initialized() returns False and _to_local_tensor falls through to raw tensors for non-DTensor inputs.

Classes
fastvideo.tests.train.callbacks.test_grad_clip

CPU-only unit tests for :mod:fastvideo.train.callbacks.grad_clip.

Exercises GradNormClipCallback.on_before_optimizer_step against synthetic nn.Module targets with manually populated gradients.

Classes
fastvideo.tests.train.callbacks.test_validation

CPU-only unit tests for :mod:fastvideo.train.callbacks.validation.

Covers the parts of ValidationCallback that don't need a real pipeline or distributed init:

  • constructor type coercions and defaults,
  • on_validation_begin gating logic (every_steps + modulo),
  • _find_ema_callback lookup via _callback_dict,
  • state_dict / load_state_dict rng round-trip.

The heavy _run_validation path needs a real diffusion pipeline plus distributed init and is exercised by Phase ⅔ tests.

Classes
fastvideo.tests.train.methods
Modules
fastvideo.tests.train.methods.test_wan_causal_dfsft

Per-method GPU smoke test: WanCausalModel + DiffusionForcingSFTMethod.

Mirrors test_wan_finetune.py for the diffusion-forcing SFT (DFSFT) algorithm on the causal Wan transformer. The harness is intentionally identical so the two tests are easy to compare and so future per-method tests can copy this template verbatim.

DFSFT samples inhomogeneous timesteps per chunk (chunk_size=3 in the fixture) and is the natural training counterpart of the WanCausalModel plugin.

Classes Functions:
fastvideo.tests.train.methods.test_wan_finetune

Per-method GPU smoke test: WanModel + FineTuneMethod.

Establishes the per-method test pattern for fastvideo/train:

  1. Instantiate the model + method via their public constructors (no Trainer setup, no FSDP wrapping).
  2. Feed a synthetic raw_batch dict through method.single_train_step() + method.backward().
  3. Assert that the loss is finite and that the first transformer block received a finite, non-zero gradient.

The first block's gradient is the last one computed during backprop, so a healthy grad there implies the full forward + chain-rule path is intact. Keeping the assertion to a single block keeps the reference surface tiny — a later PR layers a device-keyed grad-norm regression on top of this same harness.

Classes Functions:
fastvideo.tests.train.models
Modules
fastvideo.tests.train.models.test_load_hunyuan

GPU loading + forward smoke test for HunyuanModel.

Loads the real HunyuanVideo checkpoint (~13B at bf16) via HunyuanModel.__init__ and runs one transformer forward pass on synthetic inputs. Hunyuan's transformer takes a slightly different forward signature than Wan (no encoder_attention_mask, no return_dict); this test mirrors the kwargs in HunyuanModel._build_distill_input_kwargs.

Classes Functions:
fastvideo.tests.train.models.test_load_wan

GPU loading + forward smoke test for WanModel.

Loads the real Wan2.1 1.3B checkpoint via WanModel.__init__ and runs one transformer forward pass on synthetic inputs. Catches loader or forward-signature regressions in fastvideo.train.models.wan.WanModel and the underlying WanTransformer3DModel.

Classes Functions:
fastvideo.tests.train.models.test_load_wan_causal

GPU loading smoke test for WanCausalModel.

Verifies that WanCausalModel.__init__ resolves the CausalWanTransformer3DModel class override and successfully loads weights from the regular Wan2.1 1.3B checkpoint.

A real forward pass is intentionally omitted here: the causal transformer requires per-frame timesteps, a block-causal attention mask, and KV cache state that WanCausalModel.predict_noise_streaming manages for production callers. PR 5 (per-method tests) exercises that streaming forward path end-to-end.

Classes Functions:
fastvideo.tests.train.utils
Modules
fastvideo.tests.train.utils.test_checkpoint

CPU-only unit tests for :mod:fastvideo.train.utils.checkpoint.

Covers the pure-Python portions of the checkpoint manager: name parsing, resume-path resolution, metadata round-trip, rolling-delete cleanup, the _is_stateful predicate, and the maybe_save gating logic. Code paths that touch DCP (dcp.save / dcp.load) and CUDA RNG snapshots are intentionally not covered here — those need a GPU runner and will be tested in later phases.

Classes Functions:
fastvideo.tests.train.utils.test_checkpoint.test_resolve_unknown_dir_raises
test_resolve_unknown_dir_raises(tmp_path: Path) -> None

A dir that is neither a checkpoint nor an output_dir-with-checkpoints.

Source code in fastvideo/tests/train/utils/test_checkpoint.py
def test_resolve_unknown_dir_raises(tmp_path: Path) -> None:
    """A dir that is neither a checkpoint nor an output_dir-with-checkpoints."""
    bogus = tmp_path / "bogus"
    bogus.mkdir()
    with pytest.raises(ValueError, match="Could not resolve"):
        _resolve_resume_checkpoint(str(bogus), output_dir=str(tmp_path))
fastvideo.tests.train.utils.test_config

CPU-only unit tests for :func:load_run_config.

Classes Functions:
fastvideo.tests.train.utils.test_config.test_hsdp_shard_dim_defaults_to_num_gpus
test_hsdp_shard_dim_defaults_to_num_gpus(tmp_path: Path) -> None

When unset, hsdp_shard_dim and sp_size fall back to num_gpus.

Source code in fastvideo/tests/train/utils/test_config.py
def test_hsdp_shard_dim_defaults_to_num_gpus(tmp_path: Path) -> None:
    """When unset, hsdp_shard_dim and sp_size fall back to num_gpus."""
    data = _minimal_yaml()
    data["training"] = {"distributed": {"num_gpus": 4}}
    cfg = load_run_config(_write_yaml(tmp_path, data))
    assert cfg.training.distributed.hsdp_shard_dim == 4
    assert cfg.training.distributed.sp_size == 4
fastvideo.tests.train.utils.test_config.test_overrides_create_intermediate_keys
test_overrides_create_intermediate_keys(tmp_path: Path) -> None

Overrides into a nested key absent from YAML should still apply.

Source code in fastvideo/tests/train/utils/test_config.py
def test_overrides_create_intermediate_keys(tmp_path: Path) -> None:
    """Overrides into a nested key absent from YAML should still apply."""
    data = _minimal_yaml()
    # No `training.checkpoint` block in the minimal YAML.
    path = _write_yaml(tmp_path, data)
    cfg = load_run_config(
        path,
        overrides=["--training.checkpoint.checkpoints_total_limit=5"],
    )
    assert cfg.training.checkpoint.checkpoints_total_limit == 5

fastvideo.tests.utils

Functions:

fastvideo.tests.utils.compare_folders
compare_folders(reference_folder, generated_folder, use_ms_ssim=True)
Compare videos with the same filename between reference_folder and generated_folder

Example usage:
    results = compare_folders(reference_folder, generated_folder,
                          args.use_ms_ssim)
    for video_name, ssim_value in results.items():
        if ssim_value is not None:
            print(
                f"{video_name}: {ssim_value[0]:.4f}, Min SSIM: {ssim_value[1]:.4f}, Max SSIM: {ssim_value[2]:.4f}"
            )
        else:
            print(f"{video_name}: Error during comparison")

    valid_ssims = [v for v in results.values() if v is not None]
    if valid_ssims:
        avg_ssim = np.mean([v[0] for v in valid_ssims])
        print(f"

Average SSIM across all videos: {avg_ssim:.4f}") else: print(" No valid SSIM values to average")

Source code in fastvideo/tests/utils.py
def compare_folders(reference_folder, generated_folder, use_ms_ssim=True):
    """
    Compare videos with the same filename between reference_folder and generated_folder

    Example usage:
        results = compare_folders(reference_folder, generated_folder,
                              args.use_ms_ssim)
        for video_name, ssim_value in results.items():
            if ssim_value is not None:
                print(
                    f"{video_name}: {ssim_value[0]:.4f}, Min SSIM: {ssim_value[1]:.4f}, Max SSIM: {ssim_value[2]:.4f}"
                )
            else:
                print(f"{video_name}: Error during comparison")

        valid_ssims = [v for v in results.values() if v is not None]
        if valid_ssims:
            avg_ssim = np.mean([v[0] for v in valid_ssims])
            print(f"\nAverage SSIM across all videos: {avg_ssim:.4f}")
        else:
            print("\nNo valid SSIM values to average")
    """

    reference_videos = [f for f in os.listdir(reference_folder) if f.endswith(".mp4")]

    results = {}

    for video_name in reference_videos:
        ref_path = os.path.join(reference_folder, video_name)
        gen_path = os.path.join(generated_folder, video_name)

        if os.path.exists(gen_path):
            print(f"\nComparing {video_name}...")
            try:
                ssim_value = compute_video_ssim_torchvision(ref_path, gen_path, use_ms_ssim)
                results[video_name] = ssim_value
            except Exception as e:
                print(f"Error comparing {video_name}: {e}")
                results[video_name] = None
        else:
            print(f"\nSkipping {video_name} - no matching file in generated folder")

    return results
fastvideo.tests.utils.compute_video_ssim_torchvision
compute_video_ssim_torchvision(video1_path, video2_path, use_ms_ssim=True)

Compute SSIM between two videos.

Parameters:

Name Type Description Default
video1_path

Path to the first video.

required
video2_path

Path to the second video.

required
use_ms_ssim

Whether to use Multi-Scale Structural Similarity(MS-SSIM) instead of SSIM.

True
Source code in fastvideo/tests/utils.py
def compute_video_ssim_torchvision(video1_path, video2_path, use_ms_ssim=True):
    """
    Compute SSIM between two videos.

    Args:
        video1_path: Path to the first video.
        video2_path: Path to the second video.
        use_ms_ssim: Whether to use Multi-Scale Structural Similarity(MS-SSIM) instead of SSIM.
    """
    print(f"Computing SSIM between {video1_path} and {video2_path}...")
    if not os.path.exists(video1_path):
        raise FileNotFoundError(f"Video1 not found: {video1_path}")
    if not os.path.exists(video2_path):
        raise FileNotFoundError(f"Video2 not found: {video2_path}")

    frames1 = _read_video_frames(video1_path)
    frames2 = _read_video_frames(video2_path)

    # Ensure same number of frames
    min_frames = min(frames1.shape[0], frames2.shape[0])
    frames1 = frames1[:min_frames]
    frames2 = frames2[:min_frames]

    frames1 = frames1.float() / 255.0
    frames2 = frames2.float() / 255.0

    if torch.cuda.is_available():
        frames1 = frames1.cuda()
        frames2 = frames2.cuda()

    ssim_values = []

    # Process each frame individually
    for i in range(min_frames):
        img1 = frames1[i : i + 1]
        img2 = frames2[i : i + 1]

        with torch.no_grad():
            value = ms_ssim(img1, img2, data_range=1.0) if use_ms_ssim else ssim(img1, img2, data_range=1.0)

            ssim_values.append(value.item())

    if ssim_values:
        mean_ssim = np.mean(ssim_values)
        min_ssim = np.min(ssim_values)
        max_ssim = np.max(ssim_values)
        min_frame_idx = np.argmin(ssim_values)
        max_frame_idx = np.argmax(ssim_values)

        print(f"Mean SSIM: {mean_ssim:.4f}")
        print(f"Min SSIM: {min_ssim:.4f} (at frame {min_frame_idx})")
        print(f"Max SSIM: {max_ssim:.4f} (at frame {max_frame_idx})")

        return mean_ssim, min_ssim, max_ssim
    else:
        print("No SSIM values calculated")
        return 0, 0, 0
fastvideo.tests.utils.write_ssim_results
write_ssim_results(output_dir, ssim_values, reference_path, generated_path, num_inference_steps, prompt)

Write SSIM results to a JSON file in the same directory as the generated videos.

Source code in fastvideo/tests/utils.py
def write_ssim_results(output_dir, ssim_values, reference_path, generated_path, num_inference_steps, prompt):
    """
    Write SSIM results to a JSON file in the same directory as the generated videos.
    """
    try:
        logger.info(f"Attempting to write SSIM results to directory: {output_dir}")

        if not os.path.exists(output_dir):
            os.makedirs(output_dir, exist_ok=True)

        mean_ssim, min_ssim, max_ssim = ssim_values

        result = {
            "mean_ssim": mean_ssim,
            "min_ssim": min_ssim,
            "max_ssim": max_ssim,
            "reference_video": reference_path,
            "generated_video": generated_path,
            "parameters": {"num_inference_steps": num_inference_steps, "prompt": prompt},
        }

        test_name = f"steps{num_inference_steps}_{prompt[:100]}"
        result_file = os.path.join(output_dir, f"{test_name}_ssim.json")
        logger.info(f"Writing JSON results to: {result_file}")
        with open(result_file, "w") as f:
            json.dump(result, f, indent=2)

        logger.info(f"SSIM results written to {result_file}")
        return True
    except Exception as e:
        logger.error(f"ERROR writing SSIM results: {str(e)}")
        return False