eval
¶
Modules¶
fastvideo.tests.eval.test_datasets_vbench
¶
fastvideo.tests.eval.test_evaluator_multi_gpu
¶
Multi-replica eval through the public Evaluator API.
Skipped automatically when fewer than 2 CUDA devices are visible.
Functions¶
fastvideo.tests.eval.test_evaluator_multi_gpu.baseline_scores
¶
Reference scores computed on a single-GPU evaluator. The multi-GPU runs must reproduce these exactly when handed the same input list — that's the only way to verify round-robin dispatch isn't dropping or reordering samples.
Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
fastvideo.tests.eval.test_evaluator_multi_gpu.test_multi_gpu_dispatch_preserves_order_and_scores
¶
Same samples, multi-GPU dispatch — results must match the single-GPU baseline element-for-element. This verifies (a) the round-robin doesn't reorder, (b) every sample is scored exactly once, © the workers don't share mutable state.
Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
fastvideo.tests.eval.test_evaluator_multi_gpu.test_multi_gpu_evaluator_kwargs_form_runs_on_one_replica
¶
The kwargs form (single sample) is documented to always hit worker 0; this test pins the contract so future refactors don't accidentally fan out a single call.
Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
fastvideo.tests.eval.test_evaluator_multi_gpu.test_multi_gpu_release_cuda_memory_runs_clean
¶
release_cuda_memory must hit every replica without crashing.
Source code in fastvideo/tests/eval/test_evaluator_multi_gpu.py
fastvideo.tests.eval.test_evaluator_paths
¶
Path-input variants of the public Evaluator API.
The worker boundary accepts video / reference as either a
pre-loaded (T, C, H, W) tensor or a path-like (str / Path).
These tests pin the path-form so future refactors don't accidentally
re-require pre-loaded tensors.
Classes¶
Functions¶
fastvideo.tests.eval.test_evaluator_paths.test_dispatcher_holds_paths_not_tensors_in_queue
¶
Memory invariant: when many paths are passed, the queued samples are tiny strings, not full tensors. Verify by checking the length of the per-sample reference set the dispatcher materializes.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_paths.test_missing_path_surfaces_as_exception
¶
Decode failures must propagate, not silently produce a None score.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_paths.test_one_shot_evaluate_accepts_paths
¶
The top-level fastvideo.eval.evaluate helper also flows paths.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_paths.test_path_form_score_matches_tensor_form
¶
Loading via path must produce the same score as loading via the
public load_video helper and passing the tensor in directly.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_paths.test_samples_list_can_mix_paths_and_tensors
¶
A single samples call can mix path and tensor entries.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_paths.video_paths
¶
Two reproducible mp4s on disk + their pre-loaded tensors for parity.
Source code in fastvideo/tests/eval/test_evaluator_paths.py
fastvideo.tests.eval.test_evaluator_single
¶
End-to-end tests for single-replica eval through the public API.
Runs the lightweight pixel-space metrics — common.psnr and
common.ssim — under both shapes that real callers use:
- one-shot
evaluate(video=..., reference=...)(the helper infastvideo.eval.api); - a long-lived
Evaluator, called once per sample; - a long-lived
Evaluator, called with a list of sample dicts to fan out (samples=[...]).
GPU-only metrics live in separate test modules / classes; everything here runs on CPU so the suite stays cheap to invoke.
Classes¶
Functions¶
fastvideo.tests.eval.test_evaluator_single.gen_ref
¶
Reproducible (gen, ref) pair shaped (T, C, H, W).
fastvideo.tests.eval.test_evaluator_single.test_evaluator_accepts_legacy_5d_input
¶
Callers that still pass (1, T, C, H, W) should get unwrapped.
Source code in fastvideo/tests/eval/test_evaluator_single.py
fastvideo.tests.eval.test_evaluator_single.test_evaluator_psnr_identical_videos_is_high
¶
PSNR(x, x) is unbounded above; with our clamp it caps near 100 dB.
Source code in fastvideo/tests/eval/test_evaluator_single.py
fastvideo.tests.eval.test_evaluator_single.test_evaluator_samples_list_preserves_input_order
¶
When samples=[...] is passed, results must come back per sample.
Source code in fastvideo/tests/eval/test_evaluator_single.py
fastvideo.tests.eval.test_evaluator_with_dataset
¶
End-to-end test: prompt dataset → Evaluator.
Mirrors the canonical user flow:
ds = get_dataset("vbench", dimensions=[...])
ev = create_evaluator(metrics=[...], device=...)
for row in ds:
video = my_generator(row["prompt"])
scores = ev.evaluate(video=video, **row)
We don't actually generate videos — that would pull in a diffusion model. Instead we synthesize a reproducible random tensor per row, so the test exercises the dataset-iteration → evaluator-call wiring without depending on any model weights.
Classes¶
Functions¶
fastvideo.tests.eval.test_evaluator_with_dataset.test_dataset_samples_form_through_evaluator
¶
Evaluator.evaluate(samples=[...]) is the canonical batched
entry point; verify it works when the per-row dicts come from a
dataset (kwargs form) rather than being hand-built in the test.
Source code in fastvideo/tests/eval/test_evaluator_with_dataset.py
fastvideo.tests.eval.test_evaluator_with_dataset.test_vbench_dataset_full_corpus_iteration
¶
Iterating the whole dataset should be cheap (no evaluator calls).
This guards against a future refactor that accidentally makes
__iter__ do real work.
Source code in fastvideo/tests/eval/test_evaluator_with_dataset.py
fastvideo.tests.eval.test_evaluator_with_dataset.test_vbench_dataset_rows_drop_into_evaluator
¶
Every row from the corpus must be a kwargs-friendly dict for
Evaluator.evaluate: extra keys flow through without breaking the
metric, and the metric returns a well-formed MetricResult.
Source code in fastvideo/tests/eval/test_evaluator_with_dataset.py
fastvideo.tests.eval.test_registry
¶
Smoke tests for the metric registry surface.
These exercise the public fastvideo.eval API only:
list_metrics, get_metric, and the group-resolution logic that
create_evaluator(metrics="vbench") uses.
Functions¶
fastvideo.tests.eval.test_registry.test_create_evaluator_resolves_group_prefix
¶
metrics="<group>" should expand to every <group>.* sub-metric.
Use the physics_iq group because it has multiple sub-metrics and
none of them load model weights — the group-resolution behavior is
what we're testing, not metric setup.