Skip to content

evaluator

User-facing scorer.

Layering (mirrors FastVideo's VideoGenerator → Worker pattern, but in-process)::

Evaluator               ← user-facing
  └── EvalWorker × N    ← single-GPU; owns metric replicas
  └── VideoPool         ← async path-→-tensor prefetch (per evaluate call)

The constructor builds one :class:EvalWorker per GPU and loads every metric on every worker eagerly. :meth:evaluate is the single entry point: pass kwargs for one sample, or pass a list of sample dicts to fan-out across GPU replicas with pipelined decoding — same method, return type follows the input shape.

Classes

fastvideo.eval.evaluator.Evaluator

Evaluator(metrics: list[str] | str = 'all', device: str = 'cuda:0', num_gpus: int = 1, compile: bool = False, *, loader_threads: int = 1, prefetch_factor: int = 2, pre_upload: bool = True, skip_missing_deps: bool = False)

Pre-initialized scorer for repeated evaluation.

Parameters

metrics : list[str] | str Metric names, group prefixes ("vbench"), or "all". device : str Single-GPU device (e.g. "cuda:0"). Ignored when num_gpus > 1. num_gpus : int Number of GPU replicas. Each gets its own :class:EvalWorker. compile : bool Apply :func:torch.compile to each metric's _model. loader_threads : int Background decode threads in the :class:VideoPool. Default 1 (hide decode behind compute). Bump for I/O-heavy benchmark sets where one loader can't keep up with the workers. prefetch_factor : int pool max_size = prefetch_factor * num_workers. Default 2 — one sample being consumed, one prefetched per worker. pre_upload : bool When True (default), the worker performs a single host→device upload of video / reference per sample before the metric loop, and every metric reads from that shared GPU-resident tensor. Without it, each metric pays its own .to(self.device) — N transfers of the same clip for N metrics, which dominates at high resolution. Set False for training-time eval, where keeping a clip resident on GPU across the metric loop would fight the training step for VRAM. skip_missing_deps : bool When True, silently drop explicit metric names whose optional deps aren't importable (with a one-line warning per skipped metric). Default False — an explicit name with a missing dep raises :class:ImportError at construction time. Group selectors ("vbench", "all") always silent-skip regardless of this flag.

Source code in fastvideo/eval/evaluator.py
def __init__(
    self,
    metrics: list[str] | str = "all",
    device: str = "cuda:0",
    num_gpus: int = 1,
    compile: bool = False,
    *,
    loader_threads: int = 1,
    prefetch_factor: int = 2,
    pre_upload: bool = True,
    skip_missing_deps: bool = False,
) -> None:
    names = _resolve_metric_names(metrics, skip_missing_deps=skip_missing_deps)
    if num_gpus > 1:
        self._workers = [
            EvalWorker(names,
                       f"cuda:{i}",
                       compile=compile,
                       pre_upload=pre_upload,
                       skip_missing_deps=skip_missing_deps) for i in range(num_gpus)
        ]
    else:
        self._workers = [
            EvalWorker(names, device, compile=compile, pre_upload=pre_upload, skip_missing_deps=skip_missing_deps)
        ]
    self._loader_threads = max(1, loader_threads)
    self._prefetch_factor = max(1, prefetch_factor)

Functions

fastvideo.eval.evaluator.Evaluator.evaluate
evaluate(samples: Iterable[dict] | None = None, *, metrics: list[str] | None = None, **kwargs) -> dict[str, MetricResult] | EvalResults

Score one sample (kwargs form) or many samples (list form).

Both forms go through the same :class:VideoPool pipeline; video / reference paths are decoded asynchronously.

Parameters

samples : Iterable of sample dicts. Omit and pass kwargs for a single-sample call. metrics : Subset of this Evaluator's registered metrics to actually run on this batch. None (default) runs all registered. Lets a single long-lived Evaluator score different (gen, ref) corpora with different metric subsets across multiple evaluate() calls — e.g. LPIPS on a paired corpus, FVD on an unequal-cardinality corpus — without burning model loads. Set-metric accumulators are reset only for the metrics included in metrics, so state for other set metrics is preserved across calls.

Single sample::

ev.evaluate(video=tensor, text_prompt="...", fps=24.0)

Many samples::

ev.evaluate(samples=[{"video": ..., "reference": ...}, ...])

Many samples with a metric filter::

ev.evaluate(samples=lpips_samples, metrics=["common.lpips"])
ev.evaluate(samples=fvd_samples,   metrics=["common.fvd"])
Returns

dict[str, MetricResult] for the single-sample form; :class:EvalResults (list-of-dict subclass with .corpus) for the list form.

Source code in fastvideo/eval/evaluator.py
def evaluate(
    self,
    samples: Iterable[dict] | None = None,
    *,
    metrics: list[str] | None = None,
    **kwargs,
) -> dict[str, MetricResult] | EvalResults:
    """Score one sample (kwargs form) or many samples (list form).

    Both forms go through the same :class:`VideoPool` pipeline;
    ``video`` / ``reference`` paths are decoded asynchronously.

    Parameters
    ----------
    samples :
        Iterable of sample dicts.  Omit and pass kwargs for a
        single-sample call.
    metrics :
        Subset of this Evaluator's registered metrics to actually
        run on this batch.  ``None`` (default) runs all registered.
        Lets a single long-lived Evaluator score different (gen,
        ref) corpora with different metric subsets across multiple
        ``evaluate()`` calls — e.g. LPIPS on a paired corpus, FVD
        on an unequal-cardinality corpus — without burning model
        loads.  Set-metric accumulators are reset only for the
        metrics included in *metrics*, so state for other set
        metrics is preserved across calls.

    Single sample::

        ev.evaluate(video=tensor, text_prompt="...", fps=24.0)

    Many samples::

        ev.evaluate(samples=[{"video": ..., "reference": ...}, ...])

    Many samples with a metric filter::

        ev.evaluate(samples=lpips_samples, metrics=["common.lpips"])
        ev.evaluate(samples=fvd_samples,   metrics=["common.fvd"])

    Returns
    -------
    dict[str, MetricResult] for the single-sample form;
    :class:`EvalResults` (list-of-dict subclass with ``.corpus``) for
    the list form.
    """
    if metrics is not None:
        unknown = [m for m in metrics if m not in self.metric_names]
        if unknown:
            raise ValueError(f"metrics filter contains names not registered on this Evaluator: "
                             f"{unknown}; registered: {self.metric_names}")

    single = samples is None
    sample_list: list[dict] = [kwargs] if samples is None else list(samples)
    if not sample_list:
        return EvalResults(samples=[], corpus={})

    if single:
        set_names = self._workers[0].set_metrics().keys()
        active_set = (set_names if metrics is None else (set_names & set(metrics)))
        if active_set:
            # Set metrics need a population. A single sample can't produce a
            # meaningful corpus result, and silently discarding it (return
            # ``per_sample[0]`` only) hides the no-op. Force the list form.
            raise ValueError("Set-vs-set metrics require samples=[...] with >=2 entries; "
                             "the kwargs form (single sample) cannot produce a corpus "
                             f"result. Active set metrics: {sorted(active_set)}")

    per_sample, corpus = self._run(sample_list, metric_filter=metrics)

    if single:
        return per_sample[0]
    return EvalResults(samples=per_sample, corpus=corpus)
fastvideo.eval.evaluator.Evaluator.release_cuda_memory
release_cuda_memory() -> None

Free CUDA caches on every replica without dropping models.

Source code in fastvideo/eval/evaluator.py
def release_cuda_memory(self) -> None:
    """Free CUDA caches on every replica without dropping models."""
    for w in self._workers:
        w.release_cuda_memory()
fastvideo.eval.evaluator.Evaluator.reload
reload() -> None

Rebuild metrics dropped by :meth:unload.

Source code in fastvideo/eval/evaluator.py
def reload(self) -> None:
    """Rebuild metrics dropped by :meth:`unload`."""
    for w in self._workers:
        w.reload()
fastvideo.eval.evaluator.Evaluator.shutdown
shutdown() -> None

No-op; kept for API compatibility with older callers.

Source code in fastvideo/eval/evaluator.py
def shutdown(self) -> None:
    """No-op; kept for API compatibility with older callers."""
fastvideo.eval.evaluator.Evaluator.unload
unload() -> None

Drop metric refs on every replica. Reverse with :meth:reload.

Source code in fastvideo/eval/evaluator.py
def unload(self) -> None:
    """Drop metric refs on every replica. Reverse with :meth:`reload`."""
    for w in self._workers:
        w.unload()

Functions