test_evaluator_with_dataset
¶
End-to-end test: prompt dataset → Evaluator.
Mirrors the canonical user flow:
ds = get_dataset("vbench", dimensions=[...])
ev = create_evaluator(metrics=[...], device=...)
for row in ds:
video = my_generator(row["prompt"])
scores = ev.evaluate(video=video, **row)
We don't actually generate videos — that would pull in a diffusion model. Instead we synthesize a reproducible random tensor per row, so the test exercises the dataset-iteration → evaluator-call wiring without depending on any model weights.
Classes¶
Functions¶
fastvideo.tests.eval.test_evaluator_with_dataset.test_dataset_samples_form_through_evaluator
¶
Evaluator.evaluate(samples=[...]) is the canonical batched
entry point; verify it works when the per-row dicts come from a
dataset (kwargs form) rather than being hand-built in the test.
Source code in fastvideo/tests/eval/test_evaluator_with_dataset.py
fastvideo.tests.eval.test_evaluator_with_dataset.test_vbench_dataset_full_corpus_iteration
¶
Iterating the whole dataset should be cheap (no evaluator calls).
This guards against a future refactor that accidentally makes
__iter__ do real work.
Source code in fastvideo/tests/eval/test_evaluator_with_dataset.py
fastvideo.tests.eval.test_evaluator_with_dataset.test_vbench_dataset_rows_drop_into_evaluator
¶
Every row from the corpus must be a kwargs-friendly dict for
Evaluator.evaluate: extra keys flow through without breaking the
metric, and the metric returns a well-formed MetricResult.