evaluator
¶
User-facing scorer.
Layering (mirrors FastVideo's VideoGenerator → Worker pattern, but in-process)::
Evaluator ← user-facing
└── EvalWorker × N ← single-GPU; owns metric replicas
└── VideoPool ← async path-→-tensor prefetch (per evaluate call)
The constructor builds one :class:EvalWorker per GPU and loads every
metric on every worker eagerly. :meth:evaluate is the single entry
point: pass kwargs for one sample, or pass a list of sample dicts to
fan-out across GPU replicas with pipelined decoding — same method,
return type follows the input shape.
Classes¶
fastvideo.eval.evaluator.Evaluator
¶
Evaluator(metrics: list[str] | str = 'all', device: str = 'cuda:0', num_gpus: int = 1, compile: bool = False, *, loader_threads: int = 1, prefetch_factor: int = 2, pre_upload: bool = True, skip_missing_deps: bool = False)
Pre-initialized scorer for repeated evaluation.
Parameters¶
metrics : list[str] | str
Metric names, group prefixes ("vbench"), or "all".
device : str
Single-GPU device (e.g. "cuda:0"). Ignored when num_gpus > 1.
num_gpus : int
Number of GPU replicas. Each gets its own :class:EvalWorker.
compile : bool
Apply :func:torch.compile to each metric's _model.
loader_threads : int
Background decode threads in the :class:VideoPool. Default 1
(hide decode behind compute). Bump for I/O-heavy benchmark sets
where one loader can't keep up with the workers.
prefetch_factor : int
pool max_size = prefetch_factor * num_workers. Default 2 —
one sample being consumed, one prefetched per worker.
pre_upload : bool
When True (default), the worker performs a single
host→device upload of video / reference per sample
before the metric loop, and every metric reads from that
shared GPU-resident tensor. Without it, each metric pays its
own .to(self.device) — N transfers of the same clip for N
metrics, which dominates at high resolution. Set False for
training-time eval, where keeping a clip resident on GPU
across the metric loop would fight the training step for VRAM.
skip_missing_deps : bool
When True, silently drop explicit metric names whose
optional deps aren't importable (with a one-line warning per
skipped metric). Default False — an explicit name with a
missing dep raises :class:ImportError at construction time.
Group selectors ("vbench", "all") always silent-skip
regardless of this flag.
Source code in fastvideo/eval/evaluator.py
Functions¶
fastvideo.eval.evaluator.Evaluator.evaluate
¶
evaluate(samples: Iterable[dict] | None = None, *, metrics: list[str] | None = None, **kwargs) -> dict[str, MetricResult] | EvalResults
Score one sample (kwargs form) or many samples (list form).
Both forms go through the same :class:VideoPool pipeline;
video / reference paths are decoded asynchronously.
Parameters¶
samples :
Iterable of sample dicts. Omit and pass kwargs for a
single-sample call.
metrics :
Subset of this Evaluator's registered metrics to actually
run on this batch. None (default) runs all registered.
Lets a single long-lived Evaluator score different (gen,
ref) corpora with different metric subsets across multiple
evaluate() calls — e.g. LPIPS on a paired corpus, FVD
on an unequal-cardinality corpus — without burning model
loads. Set-metric accumulators are reset only for the
metrics included in metrics, so state for other set
metrics is preserved across calls.
Single sample::
ev.evaluate(video=tensor, text_prompt="...", fps=24.0)
Many samples::
ev.evaluate(samples=[{"video": ..., "reference": ...}, ...])
Many samples with a metric filter::
ev.evaluate(samples=lpips_samples, metrics=["common.lpips"])
ev.evaluate(samples=fvd_samples, metrics=["common.fvd"])
Returns¶
dict[str, MetricResult] for the single-sample form;
:class:EvalResults (list-of-dict subclass with .corpus) for
the list form.