tests
¶
Modules¶
fastvideo.tests.api
¶
Modules¶
fastvideo.tests.api.test_compat_translation
¶
Tests for fastvideo.api.compat translation helpers covering the
typed CompileConfig + PipelineSelection.vae_tiling surfaces promoted in
PR 6.
Classes¶
fastvideo.tests.api.test_compat_translation.TestCompileConfigRoundTrip
¶typed CompileConfig -> FastVideoArgs.torch_compile_kwargs
reconstruction drops None typed fields and merges extras.
fastvideo.tests.api.test_compat_translation.TestLegacyLtx2VaeTilingTranslation
¶ltx2_vae_tiling flat kwarg promotes to
generator.pipeline.vae_tiling; reverse direction emits the
legacy name back to FastVideoArgs.
fastvideo.tests.api.test_compat_translation.TestLegacyTextEncoderCompileTranslation
¶enable_torch_compile_text_encoder flat kwarg promotes to
generator.engine.compile.text_encoder_enabled; reverse direction
emits the legacy name back onto the FastVideoArgs kwargs dict so
realtime-runtime consumers can read it before FastVideoArgs filters
unknown fields.
fastvideo.tests.api.test_compat_translation.TestLegacyTorchCompileKwargsTranslation
¶Legacy torch_compile_kwargs={...} gets split across the four
first-class :class:CompileConfig fields and anything unknown falls
into extras.
fastvideo.tests.api.test_extra_overrides_routing
¶
LTX-2 audio-conditioning kwargs must reach ForwardBatch.extra and SamplingParam.update() must reject unknown kwargs instead of silently dropping them.
Background: prior to PR-1288 a chain of LTX-2 audio kwargs
(audio_num_frames, ltx2_audio_clean_latent …) silently flowed
into SamplingParam.update() which logger.error'd and dropped
them. That made every continuation segment generate audio for the
default num_frames duration, which in turn fed an A/V duration
mismatch into av_streaming.stream_fmp4 whose -shortest ffmpeg
invocation closed stdin before the writer thread had pushed every
frame, surfacing as BrokenPipeError in the streaming server.
These tests pin two contracts
_BATCH_EXTRA_PASSTHROUGH_KEYSlists the exact set of kwargs pulled out ofgenerate_video(**kwargs)forbatch.extra.SamplingParam.update()raisesValueErroron unknown keys.
Classes¶
Functions:¶
fastvideo.tests.api.test_extra_overrides_routing.test_passthrough_keys_are_not_sampling_param_fields
¶If any of these become SamplingParam fields, the routing block
in video_generator.py needs to be re-evaluated — they would no
longer need to be popped before sampling_param.update().
Source code in fastvideo/tests/api/test_extra_overrides_routing.py
fastvideo.tests.api.test_extra_overrides_routing.test_sampling_param_update_error_mentions_passthrough_route
¶The error message should point future contributors at the right routing mechanism so they don't re-introduce silent dropping.
Source code in fastvideo/tests/api/test_extra_overrides_routing.py
fastvideo.tests.api.test_extra_overrides_routing.test_sampling_param_update_rejects_audio_passthrough_keys
¶LTX-2 audio kwargs must NOT slip through update() — they
belong to ForwardBatch.extra and the routing block in
video_generator.py is responsible for popping them first.
Source code in fastvideo/tests/api/test_extra_overrides_routing.py
fastvideo.tests.api.test_extra_overrides_routing.test_sampling_param_update_rejects_partially_unknown_keys
¶Even when most keys are valid, a single unknown key must raise. Partial-success was the silent-failure mode this regression fixes.
Source code in fastvideo/tests/api/test_extra_overrides_routing.py
fastvideo.tests.api.test_ltx2_continuation
¶
Tests for the typed LTX-2 continuation state.
Covers:
- round-trip through :class:
ContinuationState(inline and blob-backed) - payload is JSON-serializable (Dynamo RPC / HTTP client constraint)
- kind / schema_version validation on deserialization
- compat-layer validation (known kinds, payload shape)
- round-trip through :func:
request_to_sampling_paramattaches the state to the resulting :class:SamplingParamwithout losing fidelity
Attributes¶
Classes¶
fastvideo.tests.api.test_ltx2_continuation.TestBlobIndirection
¶Large tensors live in the :class:BlobStore instead of the payload.
fastvideo.tests.api.test_ltx2_continuation.TestBlobIndirection.test_blob_id_held_when_store_unavailable
¶Deserializing without a blob store preserves the blob id so the caller can fetch it later.
Source code in fastvideo/tests/api/test_ltx2_continuation.py
fastvideo.tests.api.test_ltx2_continuation.TestCompatLayerWireUp
¶The public compat layer accepts request.state without reverting to NotImplementedError and attaches it to the SamplingParam path.
fastvideo.tests.api.test_ltx2_continuation.TestRoundTrip
¶Round-trip through :class:ContinuationState preserves all fields.
fastvideo.tests.api.test_ltx2_continuation.TestRoundTrip.test_bf16_audio_latents_preserved
¶safetensors serialization must preserve bf16 dtype (numpy has no bf16, so a raw-bytes path would silently promote).
Source code in fastvideo/tests/api/test_ltx2_continuation.py
fastvideo.tests.api.test_ltx2_continuation.TestValidation
¶Invalid payloads error cleanly.
Modules¶
fastvideo.tests.api.test_ltx2_gpu_pool_translation
¶
gpu_pool-style flat-kwarg integration tests.
Mirrors the load_kwargs dict that the FastVideo-internal
ui/ltx2-streaming/server/gpu_pool.py passes to
VideoGenerator.from_pretrained(**load_kwargs) and asserts that the
public typed GeneratorConfig surface (introduced across PRs 0-6)
can represent it end-to-end, with no fields silently falling through
to pipeline.experimental.
This is the parity guard PR 7.6 depends on: the public gpu_pool
upstream must be able to construct a typed GeneratorConfig without
knowing any legacy LTX-2 kwarg name, and downstream Dynamo
(FastVideoArgGroup) must be able to do the same.
Classes¶
fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestCompileExtrasPreserved
¶Additional torch.compile kwargs beyond the four typed fields
round-trip through CompileConfig.extras.
fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestGpuPoolForwardTranslation
¶gpu_pool flat kwargs -> typed GeneratorConfig.
fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestGpuPoolForwardTranslation.test_no_experimental_leakage
¶Every gpu_pool kwarg should have a typed home — nothing should
silently fall through to pipeline.experimental.
Source code in fastvideo/tests/api/test_ltx2_gpu_pool_translation.py
fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestGpuPoolReverseTranslation
¶typed GeneratorConfig -> FastVideoArgs kwargs reproduces the original gpu_pool flat-kwarg shape.
This is what lets PR 7.6 wire the public gpu_pool through
generator_config_to_fastvideo_args without the runtime noticing.
fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestGpuPoolReverseTranslation.test_no_stray_refine_dict
¶preset_overrides.refine must flatten to ltx2_refine_* kwargs
rather than landing as a nested refine kwarg that
FastVideoArgs doesn't understand.
Source code in fastvideo/tests/api/test_ltx2_gpu_pool_translation.py
fastvideo.tests.api.test_ltx2_gpu_pool_translation.TestRefineFlattenCoversAllTypedFields
¶Every field on LTX2Refine{Preset,Stage}Override must survive the round-trip through preset_overrides.refine back to ltx2_refine_* kwargs. Guards against the hardcoded-key-tuple regression where image_crf / video_position_offset_sec silently dropped.
fastvideo.tests.api.test_ltx2_param_mapping
¶
Regression tests for LTX2VideoArchConfig.param_names_mapping.
The to_gate_compress -> to_gate_logits rename is the LTX-2.3 gated
attention loader rule. It must only fire when apply_gated_attention=True;
otherwise it would silently retarget:
- LTX-2.0
VIDEO_SPARSE_ATTNcheckpoints, whose attention modules legitimately carry ato_gate_compressVSA-QAT gate (a sibling ofattn_maskedinfastvideo/models/dits/ltx2.py). - LoRAs trained with the default
lora_target_moduleslist (which includesto_gate_compress; seefastvideo/train/utils/lora.py:36andfastvideo/pipelines/lora_pipeline.py:171).
Classes¶
fastvideo.tests.api.test_ltx2_param_mapping.TestLTX20ParamMappingDefault
¶apply_gated_attention=False (LTX-2.0 default): to_gate_compress
must pass through unmodified except for prefix normalization.
fastvideo.tests.api.test_ltx2_param_mapping.TestLTX20ParamMappingDefault.test_default_lora_target_not_renamed
¶to_gate_compress is in DEFAULT_LORA_TARGET_MODULES, so LoRAs
trained with default targets ship these keys.
Source code in fastvideo/tests/api/test_ltx2_param_mapping.py
fastvideo.tests.api.test_ltx2_param_mapping.TestLTX20ParamMappingDefault.test_unrelated_param_still_prefix_stripped
¶Generic prefix-strip behavior is unchanged in LTX-2.0 mode.
Source code in fastvideo/tests/api/test_ltx2_param_mapping.py
fastvideo.tests.api.test_ltx2_param_mapping.TestLTX23ParamMappingGated
¶apply_gated_attention=True (LTX-2.3): the gated-attention
to_gate_compress upstream key is renamed to to_gate_logits.
fastvideo.tests.api.test_ltx2_param_mapping.TestParamMappingRuleOrdering
¶The gate rules must be inserted before the generic prefix-strip rules so first-match-wins matching fires the rename first.
Functions:¶
fastvideo.tests.api.test_ltx2_stage_overrides
¶
fastvideo.tests.api.test_presets
¶
Classes¶
fastvideo.tests.api.test_presets.TestPresetCountIntegrity
¶ fastvideo.tests.api.test_presets.TestPresetDefaultTypes
¶Preset defaults values must match the types on
:class:SamplingParam. Assigning None to a typed-str field
(e.g. negative_prompt) breaks downstream stages that assert the
runtime type — see the CFG branch in
pipelines/stages/text_encoding.py:81.
fastvideo.tests.api.test_presets.TestPresetDefaultTypes.test_ltx2_cfg_defaults_are_off
¶SamplingParam's LTX-2 CFG class defaults must be 1.0 (CFG
off). ForwardBatch.__post_init__ force-enables
do_classifier_free_guidance when either
ltx2_cfg_scale_video or ltx2_cfg_scale_audio is != 1.0,
so any non-1.0 default silently forces CFG on for every model
family that doesn't explicitly override these fields. Guard
against the regression that surfaced as the TurboDiffusion I2V
SSIM crash (text_encoding.py:81 assertion on
negative_prompt).
Source code in fastvideo/tests/api/test_presets.py
fastvideo.tests.api.test_presets.TestWanPresets
¶Verify the Wan presets registered from registry.py.
Functions:¶
fastvideo.tests.api.test_typed_quant_flow
¶
Typed quantization flow contract tests.
Locks in the path from typed
GeneratorConfig.engine.quantization.transformer_quant: "NVFP4"
through the compat layer to a concrete NVFP4Config instance pinned
on pipeline_config.dit_config.quant_config.
The model loader detects FP4 by isinstance(quant_method,
NVFP4QuantizeMethod) rather than by a flag, so the typed surface
must reliably produce that class on the DiT config — otherwise the
loader silently runs full bf16.
Classes¶
Functions:¶
fastvideo.tests.api.test_typed_quant_flow.captured_kwargs
¶Replace FastVideoArgs.from_kwargs with a capturer so the
test doesn't try to download model_index.json.
Source code in fastvideo/tests/api/test_typed_quant_flow.py
fastvideo.tests.api.test_typed_quant_flow.test_apply_transformer_quant_does_not_overwrite_explicit_dit_config
¶When the caller has explicitly set
pipeline_config.dit_config.quant_config already, the typed
carrier defers — the explicit setter wins.
Source code in fastvideo/tests/api/test_typed_quant_flow.py
fastvideo.tests.api.test_typed_quant_flow.test_apply_transformer_quant_pins_to_dit_config
¶FastVideoArgs.__post_init__._apply_transformer_quant must
copy the transformer_quant instance onto
pipeline_config.dit_config.quant_config so the DiT loader sees
it during construction.
Source code in fastvideo/tests/api/test_typed_quant_flow.py
fastvideo.tests.api.test_typed_quant_flow.test_no_typed_quant_omits_transformer_quant_kwarg
¶Default GeneratorConfig has quantization=None — the carrier
must not be set, so the existing legacy path
(pipeline_config.dit_config.quant_config = NVFP4Config())
keeps working as before.
Source code in fastvideo/tests/api/test_typed_quant_flow.py
fastvideo.tests.conftest
¶
Functions:¶
fastvideo.tests.conftest.distributed_setup
¶
Fixture to set up and tear down the distributed environment for tests.
This ensures proper cleanup even if tests fail.
Source code in fastvideo/tests/conftest.py
fastvideo.tests.contract
¶
Contract tests guarding FastVideo's public API against drift.
These tests run against the public surface only (fastvideo.VideoGenerator,
fastvideo.api.*) — never via private helpers. They fail at FastVideo CI
if a change breaks the shape the Dynamo backend package and the private
Dreamverse adapter depend on, so drift is caught here before it reaches
downstream integrators.
Modules¶
fastvideo.tests.contract.test_dreamverse_shape
¶
Contract test: Dreamverse-style inputs normalize through the public typed API without needing any private-only compatibility promise.
The private Dreamverse UI server (FastVideo-internal/ui/ltx2-streaming/
server/gpu_pool.py) has historically called
VideoGenerator.from_pretrained(**load_kwargs) with a flat kwarg bag
containing LTX-2-specific names (ltx2_refine_enabled,
ltx2_refine_upsampler_path, etc.). PR 6 gave every one of those
kwargs a typed home under GeneratorConfig.
This test makes sure:
- The public typed API can represent everything Dreamverse currently
passes at init time (
legacy_from_pretrained_to_config). - The request-path Dreamverse uses (
generator.generate_video(**kwargs)with per-segment flags) round-trips through the typedGenerationRequestwithout reintroducing private-only fields at the public boundary. - Private-only Dreamverse fields that don't belong on the public
surface either go to
pipeline.experimental/request.extensions(the documented escape hatch) or raise explicitly, rather than silently becoming part of the public compatibility promise.
Regression guard for the scoping rule in apirefactor.md §"Schema
Parity Requirement".
Classes¶
fastvideo.tests.contract.test_dreamverse_shape.TestDreamverseLoadKwargsShape
¶Every current Dreamverse init-time kwarg must land on a typed
field, not in the experimental escape hatch.
fastvideo.tests.contract.test_dreamverse_shape.TestDreamverseNoPrivateImports
¶The public entry points must not force a Dreamverse integrator
to import from fastvideo.pipelines.* or other internal paths.
fastvideo.tests.contract.test_dreamverse_shape.TestDreamversePrivateOnlyFields
¶Dreamverse carries a handful of private-only names (e.g. legacy
internal aliases). These must NOT silently turn into a public
compatibility promise — the documented contract is that unknown
fields land on pipeline.experimental so integrators see them
but FastVideo does not promise to preserve them.
fastvideo.tests.contract.test_dreamverse_shape.TestDreamverseRequestShape
¶The per-segment Dreamverse request path mirrors OpenAI's shape plus a few LTX-2 knobs. All of them must have a typed home.
fastvideo.tests.contract.test_dreamverse_shape.TestDreamverseRequestShape.test_return_state_reaches_output_config
¶PR 7 added output.return_state — must survive the legacy
translation path so Dreamverse callers can opt in.
Source code in fastvideo/tests/contract/test_dreamverse_shape.py
fastvideo.tests.contract.test_dynamo_shape
¶
Contract test: a mock Dynamo-style handler wraps FastVideo's public API without touching any private module.
The Dynamo backend package (components/src/dynamo/fastvideo/ in the
Dynamo repo) imports only these symbols:
from fastvideo import VideoGenerator
from fastvideo.api import (
ContinuationState, GenerationRequest, InputConfig, OutputConfig,
SamplingConfig,
)
If a FastVideo refactor breaks the adapter shape this test fails at FastVideo CI — before the Dynamo-side integration knows. The plan (PR 7.10) requires the backend to be expressible without flat legacy LTX-2 kwargs or FastVideo-internal imports; this file asserts the subset that exists today and is stable.
Classes¶
fastvideo.tests.contract.test_dynamo_shape.TestDynamoHandlerContract
¶ fastvideo.tests.contract.test_dynamo_shape.TestDynamoHandlerContract.test_handler_serializes_state_back_to_nvext
¶When the request carries state, the handler should be able to include a matching serialized state on the response. (Dynamo's NvVideosResponse has nvext.continuation_state reserved for this in the pending disaggregation path.)
Source code in fastvideo/tests/contract/test_dynamo_shape.py
fastvideo.tests.contract.test_dynamo_shape.TestNoInternalImports
¶The adapter template in this file imports only the public surface.
Any change to FastVideo that requires the Dynamo adapter to reach into a private module would make this test fail at review time.
Functions:¶
fastvideo.tests.contract.test_dynamo_shape.nv_request_to_generation_request
¶Translate Dynamo's request shape into FastVideo's typed request.
This function is the template integrators copy into the Dynamo repo. It uses only public FastVideo symbols.
Source code in fastvideo/tests/contract/test_dynamo_shape.py
fastvideo.tests.contract.test_generate_async
¶
Contract tests for VideoGenerator.generate_async.
These tests monkey-patch the synchronous _generate_request_impl so
the suite runs CPU-only -- the async wrapper is the piece under test,
not the pipeline.
fastvideo.tests.eval
¶
Modules¶
fastvideo.tests.eval.test_datasets_vbench
¶
fastvideo.tests.eval.test_evaluator_multi_gpu
¶
Multi-replica eval through the public Evaluator API.
Skipped automatically when fewer than 2 CUDA devices are visible.
Functions:¶
fastvideo.tests.eval.test_evaluator_multi_gpu.baseline_scores
¶Reference scores computed on a single-GPU evaluator. The multi-GPU runs must reproduce these exactly when handed the same input list — that's the only way to verify round-robin dispatch isn't dropping or reordering samples.
Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
fastvideo.tests.eval.test_evaluator_multi_gpu.test_multi_gpu_dispatch_preserves_order_and_scores
¶Same samples, multi-GPU dispatch — results must match the single-GPU baseline element-for-element. This verifies (a) the round-robin doesn't reorder, (b) every sample is scored exactly once, © the workers don't share mutable state.
Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
fastvideo.tests.eval.test_evaluator_multi_gpu.test_multi_gpu_evaluator_kwargs_form_runs_on_one_replica
¶The kwargs form (single sample) is documented to always hit worker 0; this test pins the contract so future refactors don't accidentally fan out a single call.
Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
fastvideo.tests.eval.test_evaluator_multi_gpu.test_multi_gpu_release_cuda_memory_runs_clean
¶release_cuda_memory must hit every replica without crashing.
Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
fastvideo.tests.eval.test_evaluator_paths
¶
Path-input variants of the public Evaluator API.
The worker boundary accepts video / reference as either a
pre-loaded (T, C, H, W) tensor or a path-like (str / Path).
These tests pin the path-form so future refactors don't accidentally
re-require pre-loaded tensors.
Classes¶
Functions:¶
fastvideo.tests.eval.test_evaluator_paths.test_dispatcher_holds_paths_not_tensors_in_queue
¶Memory invariant: when many paths are passed, the queued samples are tiny strings, not full tensors. Verify by checking the length of the per-sample reference set the dispatcher materializes.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_paths.test_missing_path_surfaces_as_exception
¶Decode failures must propagate, not silently produce a None score.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_paths.test_one_shot_evaluate_accepts_paths
¶The top-level fastvideo.eval.evaluate helper also flows paths.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_paths.test_path_form_score_matches_tensor_form
¶Loading via path must produce the same score as loading via the
public load_video helper and passing the tensor in directly.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_paths.test_samples_list_can_mix_paths_and_tensors
¶A single samples call can mix path and tensor entries.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_paths.video_paths
¶Two reproducible mp4s on disk + their pre-loaded tensors for parity.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_single
¶
End-to-end tests for single-replica eval through the public API.
Runs the lightweight pixel-space metrics — common.psnr and
common.ssim — under both shapes that real callers use:
- one-shot
evaluate(video=..., reference=...)(the helper infastvideo.eval.api); - a long-lived
Evaluator, called once per sample; - a long-lived
Evaluator, called with a list of sample dicts to fan out (samples=[...]).
GPU-only metrics live in separate test modules / classes; everything here runs on CPU so the suite stays cheap to invoke.
Classes¶
Functions:¶
fastvideo.tests.eval.test_evaluator_single.gen_ref
¶Reproducible (gen, ref) pair shaped (T, C, H, W).
fastvideo.tests.eval.test_evaluator_single.test_evaluator_accepts_legacy_5d_input
¶Callers that still pass (1, T, C, H, W) should get unwrapped.
Source code in fastvideo/tests/eval/test_evaluator_single.py
fastvideo.tests.eval.test_evaluator_single.test_evaluator_psnr_identical_videos_is_high
¶PSNR(x, x) is unbounded above; with our clamp it caps near 100 dB.
Source code in fastvideo/tests/eval/test_evaluator_single.py
fastvideo.tests.eval.test_evaluator_single.test_evaluator_samples_list_preserves_input_order
¶When samples=[...] is passed, results must come back per sample.
Source code in fastvideo/tests/eval/test_evaluator_single.py
fastvideo.tests.eval.test_evaluator_with_dataset
¶
End-to-end test: prompt dataset → Evaluator.
Mirrors the canonical user flow:
ds = get_dataset("vbench", dimensions=[...])
ev = create_evaluator(metrics=[...], device=...)
for row in ds:
video = my_generator(row["prompt"])
scores = ev.evaluate(video=video, **row)
We don't actually generate videos — that would pull in a diffusion model. Instead we synthesize a reproducible random tensor per row, so the test exercises the dataset-iteration → evaluator-call wiring without depending on any model weights.
Classes¶
Functions:¶
fastvideo.tests.eval.test_evaluator_with_dataset.test_dataset_samples_form_through_evaluator
¶Evaluator.evaluate(samples=[...]) is the canonical batched
entry point; verify it works when the per-row dicts come from a
dataset (kwargs form) rather than being hand-built in the test.
Source code in fastvideo/tests/eval/test_evaluator_with_dataset.py
fastvideo.tests.eval.test_evaluator_with_dataset.test_vbench_dataset_full_corpus_iteration
¶Iterating the whole dataset should be cheap (no evaluator calls).
This guards against a future refactor that accidentally makes
__iter__ do real work.
Source code in fastvideo/tests/eval/test_evaluator_with_dataset.py
fastvideo.tests.eval.test_evaluator_with_dataset.test_vbench_dataset_rows_drop_into_evaluator
¶Every row from the corpus must be a kwargs-friendly dict for
Evaluator.evaluate: extra keys flow through without breaking the
metric, and the metric returns a well-formed MetricResult.
Source code in fastvideo/tests/eval/test_evaluator_with_dataset.py
fastvideo.tests.eval.test_registry
¶
Smoke tests for the metric registry surface.
These exercise the public fastvideo.eval API only:
list_metrics, get_metric, and the group-resolution logic that
create_evaluator(metrics="vbench") uses.
Functions:¶
fastvideo.tests.eval.test_registry.test_create_evaluator_resolves_group_prefix
¶metrics="<group>" should expand to every <group>.* sub-metric.
Use the physics_iq group because it has multiple sub-metrics and
none of them load model weights — the group-resolution behavior is
what we're testing, not metric setup.
Source code in fastvideo/tests/eval/test_registry.py
fastvideo.tests.performance
¶
Modules¶
fastvideo.tests.performance.compare_baseline
¶
Track performance results and compare against historical baseline.
This script: 1) reads current benchmark results from fastvideo/tests/performance/results, 2) syncs the canonical baseline from the configured HF dataset repo, 3) compares each current record against the median of up to 5 prior records (filtered by gpu_type, successful only), 4) on persist runs (full-suite on main branch), writes the normalized record back to the HF dataset repo, 5) exits non-zero if any metric regresses by more than PERF_MAX_REGRESSION (default 5%).
Functions:¶
fastvideo.tests.performance.compare_baseline.normalize_performance_result
¶Normalize a raw perf_*.json result into the HF tracking schema.
The Buildkite artifact intentionally keeps the raw benchmark output from test_inference_performance.py. Baseline comparison, main-branch persistence, and manual baseline reseeds should all use this mapping so the stored HF records do not drift from the artifact schema.
Source code in fastvideo/tests/performance/compare_baseline.py
fastvideo.tests.performance.hf_store
¶
Shared HuggingFace storage utilities for performance tracking.
Provides a single place for: - Syncing the HF dataset repo to a local directory - Loading raw JSON records (with optional recency filter) - Loading records as a normalized pandas DataFrame - Uploading individual result files back to HF - Common helpers: sanitize, safe_float
Functions:¶
fastvideo.tests.performance.hf_store.load_as_dataframe
¶load_as_dataframe(local_dir: str, *, days: int | None = None, successful_only: bool = False) -> DataFrame
Load and normalize records from local_dir into a pandas DataFrame.
Combines :func:load_records + :func:normalize_dataframe into a single
call for consumers (e.g. the dashboard) that work exclusively with
DataFrames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_dir
|
str
|
Root directory previously populated by :func: |
required |
days
|
int | None
|
Passed through to :func: |
None
|
successful_only
|
bool
|
Passed through to :func: |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Normalized DataFrame, or an empty DataFrame if no records were found. |
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.load_records
¶load_records(local_dir: str, *, days: int | None = None, successful_only: bool = False) -> list[dict[str, Any]]
Return raw JSON dicts from local_dir.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_dir
|
str
|
Root directory previously populated by :func: |
required |
days
|
int | None
|
When set, discard records whose |
None
|
successful_only
|
bool
|
When True, only records with |
False
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of raw dicts sorted by |
list[dict[str, Any]]
|
not be parsed are silently skipped). |
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.load_records_for_model
¶load_records_for_model(local_dir: str, model_id: str, gpu_type: str | None = None, *, last_n: int | None = None, successful_only: bool = True) -> list[dict[str, Any]]
Return records for a specific model_id, optionally filtered by GPU.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_dir
|
str
|
Root directory previously populated by :func: |
required |
model_id
|
str
|
Matches the |
required |
gpu_type
|
str | None
|
When set, only records whose |
None
|
last_n
|
int | None
|
When set, return only the most recent n records (after all other filters). Useful for sliding-window baseline calculations. |
None
|
successful_only
|
bool
|
Passed through to :func: |
True
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of matching dicts sorted by timestamp ascending. |
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.normalize_dataframe
¶Apply standard type coercions to a raw records DataFrame.
- Parses
timestampto UTC-aware datetime. - Coerces
latency,throughput,memory,text_encoder_time_s,dit_time_s,vae_decode_time_sto float. - Adds a
config_idcolumn (first 7 chars ofcommit_sha).
Returns the mutated DataFrame (also modifies in place for efficiency).
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.safe_float
¶Coerce value to float, returning None on failure.
fastvideo.tests.performance.hf_store.sanitize
¶ fastvideo.tests.performance.hf_store.sync_from_hf
¶Download the HF dataset repo snapshot to local_dir.
Returns local_dir so callers can chain: load_records(sync_from_hf(...)).
By default (strict=False) failures are logged and local_dir is
returned unchanged, so dashboard / PR consumers stay resilient when HF is
unavailable. Callers that depend on the sync for correctness (e.g. the
main-branch baseline writer) must pass strict=True so that misconfig
or transient HF errors fail loud rather than silently reset the baseline.
When reuse_existing=True, a previous successful sync in local_dir
is reused only while its marker is fresh. This avoids duplicate HF
snapshot checks when compare and dashboard scripts run sequentially in the
same CI job, without silently reusing stale data in persistent local or
long-lived runner environments.
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.upload_record
¶Upload local_path to the HF repo under <model_id>/<filename>.
By default failures (missing token, network errors) are logged and
swallowed. Pass strict=True when the upload is part of a write-path
that must not silently lose records — otherwise the rolling baseline can
stop advancing without any signal in the build log.
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.test_inference_performance
¶
Config-driven inference performance tests.
Benchmark configs live in .buildkite/performance-benchmarks/tests/*.json. Each JSON file defines model params, generation kwargs, run config, and per-device thresholds. This test module auto-discovers all configs and parametrizes a single test function over them.
Classes¶
Functions:¶
fastvideo.tests.performance.test_inference_performance.test_inference_performance
¶Measure generation latency, peak GPU memory, and component-level timings (text encoder, DiT, VAE decode). Assert each against device-aware thresholds.
Source code in fastvideo/tests/performance/test_inference_performance.py
fastvideo.tests.ssim
¶
Modules¶
fastvideo.tests.ssim.conftest
¶
Functions:¶
fastvideo.tests.ssim.conftest.pytest_collection_modifyitems
¶Optionally keep only tests with a matching model_id parameter.
Source code in fastvideo/tests/ssim/conftest.py
fastvideo.tests.ssim.latent_similarity_utils
¶
Latent-space regression helpers for numerically fragile SSIM tests.
Motivation¶
Pixel-space SSIM is a poor regression signal for distilled / few-step models (e.g. LTX-2 distilled): a single mis-rounded bf16 accumulator in the VAE decoder can drive mean SSIM from ~0.95 to ~0.50 without any real quality regression.
Inspired by diffusers' "small signature slice + bounded full-tensor distance" testing philosophy, applied here to the pre-VAE latent rather than the decoded pixel/audio output:
tests/pipelines/ltx2/test_ltx2.py(diffusers) comparesoutput_type='pt'(pixel) slices viatorch.allclose(generated_slice, expected_slice, atol=1e-4);tests/pipelines/stable_audio/test_stable_audio.py(diffusers) compares decoded audio samples vianp.abs(expected - actual).max() < 1.5e-3;tests/pipelines/cogvideo/test_cogvideox.py(diffusers) compares full pixel video tensors vianumpy_cosine_similarity_distance(...) < 1e-3.
Diffusers does not assert on latents directly — that is a FastVideo adaptation. Distilled few-step pipelines amplify per-step bf16 noise enough that VAE-decoded comparisons are unreliable across our heterogeneous CI pool, so we move the assertion upstream of the VAE.
Design¶
- Inference is run with
output_type='latent'soDecodingStagehands back the un-decoded latent onresult["samples"]. - The reference artefact is a
.ptbundle (tensor + metadata) hosted on the same HF dataset as the mp4 references, selected by<GPU>_reference_videos/<model_id>/<backend>/<prompt>.pt. - Two assertions are performed:
1. A small signature slice (default video:
latent[0, :, 0, :3, :3]; audio:latent[0, :, :8]) is compared via cosine distance with a loose tolerance. Primary pass/fail gate. 2. The full latent is compared via cosine distance with a slightly tighter tolerance, guarding against shape-correct but globally drifted outputs. - Tolerances default to 5e-3 (slice) and 1e-2 (full). diffusers uses
1e-3against deterministic CPU dummy components; we relax for cross-GPU-arch bf16 differences on the rented CI pool (A40/L40S/H100/B200).
The helper intentionally reuses build_init_kwargs /
build_generation_kwargs from :mod:inference_similarity_utils so
model params (vae tiling, sp_size, flow shift, …) flow through a single
source of truth.
Classes¶
Functions:¶
fastvideo.tests.ssim.latent_similarity_utils.load_latent_reference
¶Inverse of :func:save_latent_reference — always loads to cpu.
Enforces format_version == LATENT_REFERENCE_FORMAT_VERSION so a
schema change forces a deliberate reseed instead of silently
misinterpreting old artefacts.
Source code in fastvideo/tests/ssim/latent_similarity_utils.py
fastvideo.tests.ssim.latent_similarity_utils.run_text_to_latent_similarity_test
¶run_text_to_latent_similarity_test(*, logger: Logger, script_dir: str, device_reference_folder: str, prompt: str, attention_backend_name: str, model_id: str, default_params_map: dict[str, dict[str, object]], full_quality_params_map: dict[str, dict[str, object]], slice_cosine_threshold: float = 0.005, full_cosine_threshold: float = 0.01, init_kwargs_override: dict[str, object] | None = None, generation_kwargs_override: dict[str, object] | None = None, slice_spec: dict[str, Any] | None = None) -> dict[str, float]
Run T2V (or T2A) inference with output_type='latent' and
compare to a reference latent.
Returns the computed metrics dict on success. Raises
AssertionError if any cosine tolerance is exceeded and
FileNotFoundError if the reference artefact is missing.
Source code in fastvideo/tests/ssim/latent_similarity_utils.py
375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 | |
fastvideo.tests.ssim.latent_similarity_utils.save_latent_reference
¶save_latent_reference(path: str, latent: Tensor, *, metadata: dict[str, Any], slice_spec: dict[str, Any] | None = None) -> None
Persist a latent bundle to path.
Storage format (dict pickled via torch.save):
latent: full latent as fp16 on cpushape: original shape (list)dtype_original: strexpected_slice: fp32 1-D signature sliceslice_spec: dict describing how the slice was builtmetadata: caller-provided context (prompt, backend, steps, …)format_version: int
fp16 is lossy but bounded; it keeps ref artefacts small (~a few MB per prompt) while preserving enough dynamic range for cosine-based regression. Slice values stay fp32 because the primary assertion is computed against them.
Source code in fastvideo/tests/ssim/latent_similarity_utils.py
fastvideo.tests.ssim.latent_similarity_utils.write_latent_similarity_results
¶write_latent_similarity_results(output_dir: str, metrics: dict[str, float], *, reference_path: str, generated_path: str, num_inference_steps: int, prompt: str, model_id: str, attention_backend_name: str, slice_spec: dict[str, Any], slice_cosine_threshold: float, full_cosine_threshold: float, passed: bool) -> bool
Persist latent regression metrics next to the generated artefact.
Mirrors :func:fastvideo.tests.utils.write_ssim_results so downstream
CI tooling can scrape one schema for both pixel and latent runs.
The filename is steps{N}_{prompt[:100]}_latent.json.
Source code in fastvideo/tests/ssim/latent_similarity_utils.py
fastvideo.tests.ssim.reference_videos_cli
¶
Functions:¶
fastvideo.tests.ssim.reference_videos_cli.ensure_reference_videos_available
¶ensure_reference_videos_available(*, local_dir: Path | None = None, repo_id: str | None = None, repo_type: str | None = None, quality_tier: str = DEFAULT_OUTPUT_QUALITY_TIER) -> bool
Return True if downloaded from HF, False if already present locally.
Source code in fastvideo/tests/ssim/reference_videos_cli.py
fastvideo.tests.ssim.test_flux2_similarity
¶
Latent-slice regression tests for Flux2 text-to-image variants.
Flux2 currently has local parity coverage against the official/reference pipeline, but CI needs a small deterministic regression gate for seeded HF artefacts. Pixel-space comparisons are unnecessarily brittle for this first slot, so the test follows the latent helper pattern used by LTX-2: generate a single-image latent with the production recipe, persist the generated latent, and compare a stable latent signature plus the full tensor against the device reference.
The default and full-quality parameter maps intentionally carry the same
recipe values for now. The --ssim-full-quality flag still switches the
reference tier through conftest.py; separate full-quality recipes can be
introduced after the initial Flux2 references have a stable CI window.
Functions:¶
fastvideo.tests.ssim.test_gamecraft_similarity
¶
SSIM regression test for HunyuanGameCraft (T2V and I2V).
Generates a video with deterministic seed and camera trajectory, then compares against a device-specific reference video via MS-SSIM.
Reference videos must be pre-generated and stored under
reference_videos/
To create initial reference videos, run this test once and copy the generated videos into the appropriate reference folder.
Classes¶
Functions:¶
fastvideo.tests.ssim.test_gamecraft_similarity.test_gamecraft_i2v_similarity
¶Generate an I2V video with GameCraft and compare to reference via SSIM.
Source code in fastvideo/tests/ssim/test_gamecraft_similarity.py
283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 | |
fastvideo.tests.ssim.test_gamecraft_similarity.test_gamecraft_t2v_similarity
¶Generate a T2V video with GameCraft and compare to reference via SSIM.
Source code in fastvideo/tests/ssim/test_gamecraft_similarity.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 | |
fastvideo.tests.ssim.test_gen3c_similarity
¶
SSIM regression test for GEN3C video generation.
Compares newly generated GEN3C videos against device-specific reference videos using MS-SSIM to detect quality regressions across code changes.
Usage
Requires 1+ GPU and reference videos.¶
pytest fastvideo/tests/ssim/test_gen3c_similarity.py -v
Environment variables
GEN3C_MODEL_PATH - Diffusers-format GEN3C model path/repo id. Default: FastVideo/GEN3C-Cosmos-7B-Diffusers (local converted path also supported)
Classes¶
Functions:¶
fastvideo.tests.ssim.test_gen3c_similarity.test_gen3c_inference_similarity
¶Generate a GEN3C video and compare against the reference using MS-SSIM.
Source code in fastvideo/tests/ssim/test_gen3c_similarity.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 | |
fastvideo.tests.ssim.test_lingbot_similarity
¶
fastvideo.tests.ssim.test_longcat_similarity
¶
SSIM-based similarity tests for LongCat video generation.
Tests three LongCat modes:
- T2V (Text-to-Video): 480p video from text prompt
- I2V (Image-to-Video): 480p video from image + text prompt
- VC (Video Continuation): 480p video continuation from input video + text prompt
Sampling parameters are derived from: - examples/inference/basic/basic_longcat_t2v.py - examples/inference/basic/basic_longcat_i2v.py - examples/inference/basic/basic_longcat_vc.py
Note: num_inference_steps is reduced for CI speed (4 steps vs 50 in examples).
Classes¶
Functions:¶
fastvideo.tests.ssim.test_longcat_similarity.test_longcat_i2v_similarity
¶Test LongCat I2V inference and compare output to reference videos using SSIM.
Parameters derived from examples/inference/basic/basic_longcat_i2v.py
Source code in fastvideo/tests/ssim/test_longcat_similarity.py
300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 | |
fastvideo.tests.ssim.test_longcat_similarity.test_longcat_t2v_similarity
¶Test LongCat T2V inference and compare output to reference videos using SSIM.
Parameters derived from examples/inference/basic/basic_longcat_t2v.py
Source code in fastvideo/tests/ssim/test_longcat_similarity.py
199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 | |
fastvideo.tests.ssim.test_longcat_similarity.test_longcat_vc_similarity
¶Test LongCat VC (Video Continuation) inference and compare output to reference videos using SSIM.
Parameters derived from examples/inference/basic/basic_longcat_vc.py
Source code in fastvideo/tests/ssim/test_longcat_similarity.py
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 | |
fastvideo.tests.ssim.test_ltx2_similarity
¶
Latent-slice regression test for LTX-2 distilled text-to-video.
Pixel-space SSIM is not a useful signal for this model: 4 distilled steps + bf16 attention + tiled VAE decode produce outputs that pass visual QA but occupy a very wide region in pixel space.
Inspired by diffusers' slice-vs-full regression philosophy — see
diffusers/tests/pipelines/ltx2/test_ltx2.py (compares pixel slices
via torch.allclose(..., atol=1e-4)) and
diffusers/tests/pipelines/cogvideo/test_cogvideox.py (full pixel
tensors via numpy_cosine_similarity_distance(...) < 1e-3).
Diffusers itself does NOT compare latents; we apply the same "small
signature slice + bounded full-tensor distance" idea to the pre-VAE
latent because distilled few-step pipelines amplify per-step bf16
noise enough that VAE-decoded comparisons are unreliable.
Parameters are kept identical to the original SSIM run so that reference artefacts generated on Modal L40S remain bit-compatible with production inference.
fastvideo.tests.ssim.test_matrixgame2_similarity
¶
Classes¶
Functions:¶
fastvideo.tests.ssim.test_matrixgame2_similarity.test_matrixgame2_similarity
¶Test that runs inference with different parameters and compares the output to reference videos using SSIM.
Source code in fastvideo/tests/ssim/test_matrixgame2_similarity.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
fastvideo.tests.ssim.test_matrixgame3_similarity
¶
Classes¶
Functions:¶
fastvideo.tests.ssim.test_matrixgame3_similarity.test_matrixgame3_similarity
¶Test that runs MG3 inference (action conditions auto-generated from seed) and compares the output to reference videos using SSIM.
Source code in fastvideo/tests/ssim/test_matrixgame3_similarity.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | |
fastvideo.tests.ssim.test_stable_audio_similarity
¶
Latent-slice regression test for Stable Audio Open 1.0 text-to-audio.
Companion to test_ltx2_similarity.py — applies the same latent
cosine-distance philosophy to 3-D audio latents [B, 64, T_latent].
Why latent-space and not waveform-space SSIM:
- dpmpp-3m-sde (k-diffusion) accumulates per-step bf16 noise; the
Oobleck VAE then magnifies any residual drift into the time-domain
waveform. A few mis-rounded accumulators drive sample-wise diff well
past audible thresholds without indicating a real regression.
- Diffusers' own
tests/pipelines/stable_audio/test_stable_audio.py compares
decoded audio samples via
np.abs(expected - actual).max() < 1.5e-3; that bound holds for
CPU dummy components but does not survive cross-architecture bf16
on our CI pool (L40S/A40/H100/B200).
- Comparing the pre-VAE latent moves the assertion upstream of
the dominant noise source.
Slice spec: audio_first_8_timesteps returns
latent[0, :, :8] (= 64 channels × 8 latent timesteps = 512
elements). The full latent [1, 64, 1024] for SA-1.0 is also
compared via cosine distance.
fastvideo.tests.train
¶
Modules¶
fastvideo.tests.train.callbacks
¶
Modules¶
fastvideo.tests.train.callbacks.test_callback
¶CPU-only unit tests for :mod:fastvideo.train.callbacks.callback.
Covers the Callback base class no-op contract and the
CallbackDict instantiation / dispatch / state-dict logic.
The concrete callback subclasses (GradNormClipCallback,
EMACallback, ValidationCallback) have their own test files.
fastvideo.tests.train.callbacks.test_ema
¶CPU-only unit tests for :mod:fastvideo.train.callbacks.ema.
Exercises the EMA lifecycle (lazy init, start_iter gating, decay
math, ema_context swap, state-dict round-trip) on a tiny CPU
nn.Linear. EMA_FSDP works without dist.init_process_group
because dist.is_initialized() returns False and _to_local_tensor
falls through to raw tensors for non-DTensor inputs.
fastvideo.tests.train.callbacks.test_grad_clip
¶CPU-only unit tests for :mod:fastvideo.train.callbacks.grad_clip.
Exercises GradNormClipCallback.on_before_optimizer_step against
synthetic nn.Module targets with manually populated gradients.
fastvideo.tests.train.callbacks.test_validation
¶CPU-only unit tests for :mod:fastvideo.train.callbacks.validation.
Covers the parts of ValidationCallback that don't need a real
pipeline or distributed init:
- constructor type coercions and defaults,
on_validation_begingating logic (every_steps + modulo),_find_ema_callbacklookup via_callback_dict,state_dict/load_state_dictrng round-trip.
The heavy _run_validation path needs a real diffusion pipeline plus
distributed init and is exercised by Phase ⅔ tests.
fastvideo.tests.train.methods
¶
Modules¶
fastvideo.tests.train.methods.test_wan_causal_dfsft
¶Per-method GPU smoke test: WanCausalModel + DiffusionForcingSFTMethod.
Mirrors test_wan_finetune.py for the diffusion-forcing SFT
(DFSFT) algorithm on the causal Wan transformer. The harness is
intentionally identical so the two tests are easy to compare and so
future per-method tests can copy this template verbatim.
DFSFT samples inhomogeneous timesteps per chunk (chunk_size=3
in the fixture) and is the natural training counterpart of the
WanCausalModel plugin.
fastvideo.tests.train.methods.test_wan_finetune
¶Per-method GPU smoke test: WanModel + FineTuneMethod.
Establishes the per-method test pattern for fastvideo/train:
- Instantiate the model + method via their public constructors
(no
Trainersetup, no FSDP wrapping). - Feed a synthetic
raw_batchdict throughmethod.single_train_step()+method.backward(). - Assert that the loss is finite and that the first transformer block received a finite, non-zero gradient.
The first block's gradient is the last one computed during backprop, so a healthy grad there implies the full forward + chain-rule path is intact. Keeping the assertion to a single block keeps the reference surface tiny — a later PR layers a device-keyed grad-norm regression on top of this same harness.
fastvideo.tests.train.models
¶
Modules¶
fastvideo.tests.train.models.test_load_hunyuan
¶GPU loading + forward smoke test for HunyuanModel.
Loads the real HunyuanVideo checkpoint (~13B at bf16) via
HunyuanModel.__init__ and runs one transformer forward pass on
synthetic inputs. Hunyuan's transformer takes a slightly different
forward signature than Wan (no encoder_attention_mask, no
return_dict); this test mirrors the kwargs in
HunyuanModel._build_distill_input_kwargs.
fastvideo.tests.train.models.test_load_wan
¶GPU loading + forward smoke test for WanModel.
Loads the real Wan2.1 1.3B checkpoint via WanModel.__init__ and
runs one transformer forward pass on synthetic inputs. Catches loader
or forward-signature regressions in
fastvideo.train.models.wan.WanModel and the underlying
WanTransformer3DModel.
fastvideo.tests.train.models.test_load_wan_causal
¶GPU loading smoke test for WanCausalModel.
Verifies that WanCausalModel.__init__ resolves the
CausalWanTransformer3DModel class override and successfully loads
weights from the regular Wan2.1 1.3B checkpoint.
A real forward pass is intentionally omitted here: the causal
transformer requires per-frame timesteps, a block-causal attention
mask, and KV cache state that WanCausalModel.predict_noise_streaming
manages for production callers. PR 5 (per-method tests) exercises that
streaming forward path end-to-end.
fastvideo.tests.train.utils
¶
Modules¶
fastvideo.tests.train.utils.test_checkpoint
¶CPU-only unit tests for :mod:fastvideo.train.utils.checkpoint.
Covers the pure-Python portions of the checkpoint manager: name
parsing, resume-path resolution, metadata round-trip, rolling-delete
cleanup, the _is_stateful predicate, and the maybe_save gating
logic. Code paths that touch DCP (dcp.save / dcp.load) and
CUDA RNG snapshots are intentionally not covered here — those need a
GPU runner and will be tested in later phases.
fastvideo.tests.train.utils.test_checkpoint.test_resolve_unknown_dir_raises
¶test_resolve_unknown_dir_raises(tmp_path: Path) -> None
A dir that is neither a checkpoint nor an output_dir-with-checkpoints.
Source code in fastvideo/tests/train/utils/test_checkpoint.py
fastvideo.tests.train.utils.test_config
¶CPU-only unit tests for :func:load_run_config.
fastvideo.tests.train.utils.test_config.test_hsdp_shard_dim_defaults_to_num_gpus
¶test_hsdp_shard_dim_defaults_to_num_gpus(tmp_path: Path) -> None
When unset, hsdp_shard_dim and sp_size fall back to num_gpus.
Source code in fastvideo/tests/train/utils/test_config.py
fastvideo.tests.train.utils.test_config.test_overrides_create_intermediate_keys
¶test_overrides_create_intermediate_keys(tmp_path: Path) -> None
Overrides into a nested key absent from YAML should still apply.
Source code in fastvideo/tests/train/utils/test_config.py
fastvideo.tests.utils
¶
Functions:¶
fastvideo.tests.utils.compare_folders
¶
Compare videos with the same filename between reference_folder and generated_folder
Example usage:
results = compare_folders(reference_folder, generated_folder,
args.use_ms_ssim)
for video_name, ssim_value in results.items():
if ssim_value is not None:
print(
f"{video_name}: {ssim_value[0]:.4f}, Min SSIM: {ssim_value[1]:.4f}, Max SSIM: {ssim_value[2]:.4f}"
)
else:
print(f"{video_name}: Error during comparison")
valid_ssims = [v for v in results.values() if v is not None]
if valid_ssims:
avg_ssim = np.mean([v[0] for v in valid_ssims])
print(f"
Average SSIM across all videos: {avg_ssim:.4f}") else: print(" No valid SSIM values to average")
Source code in fastvideo/tests/utils.py
fastvideo.tests.utils.compute_video_ssim_torchvision
¶
Compute SSIM between two videos.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video1_path
|
Path to the first video. |
required | |
video2_path
|
Path to the second video. |
required | |
use_ms_ssim
|
Whether to use Multi-Scale Structural Similarity(MS-SSIM) instead of SSIM. |
True
|
Source code in fastvideo/tests/utils.py
fastvideo.tests.utils.write_ssim_results
¶
write_ssim_results(output_dir, ssim_values, reference_path, generated_path, num_inference_steps, prompt)
Write SSIM results to a JSON file in the same directory as the generated videos.