datasets ¶

Prompt-corpus datasets for end-to-end benchmark evaluation.

Public API mirrors :mod:fastvideo.eval (metrics side):

from fastvideo.eval.datasets import (
    PromptDataset, Sample,
    register_dataset, get_dataset, list_datasets,
)

A dataset is an iterable of plain dicts (one per sample). Built-in datasets self-register at import time. To add one, drop a module into this package that subclasses :class:PromptDataset and decorates with @register_dataset("name") — auto-discovery picks it up.

Classes¶

fastvideo.eval.datasets.PromptDataset ¶

PromptDataset()

Iterable corpus of sample dicts. Subclasses populate self._rows.

Source code in fastvideo/eval/datasets/base.py

def __init__(self) -> None:
    self._rows: list[dict] = []

Methods:¶

fastvideo.eval.datasets.PromptDataset.by_dimension ¶

by_dimension() -> dict[str, list[dict]]

Group samples by dimension. A multi-dim sample appears under each.

Source code in fastvideo/eval/datasets/base.py

def by_dimension(self) -> dict[str, list[dict]]:
    """Group samples by dimension. A multi-dim sample appears under each."""
    out: dict[str, list[dict]] = {}
    for s in self._rows:
        for d in s.get("dimensions", ()):
            out.setdefault(d, []).append(s)
    return out

fastvideo.eval.datasets.Sample ¶

Bases: TypedDict

Documented schema for a row yielded by :class:PromptDataset.

Only prompt is required. Extra keys beyond these are forwarded to the runner's eval-kwargs builder verbatim, so action-conditioned or audio-bearing benchmarks can add their own fields without changing the base class.

fastvideo.eval.datasets.VBenchPromptDataset ¶

VBenchPromptDataset(dimensions: list[str] | str = 'all', full_info_path: str | Path | None = None)

Bases: PromptDataset

VBench prompts filtered by evaluation dimension.

Parameters:

Name	Type	Description	Default
`dimensions`	`list[str] \| str`	List of dimension names, or `"all"`. Unknown dimensions raise `ValueError`.	`'all'`
`full_info_path`	`str \| Path \| None`	Optional override for `VBench_full_info.json`; defaults to autodetection.	`None`

A prompt that belongs to several requested dimensions is yielded once; its dimensions list carries all matches so the scorer can route.

Source code in fastvideo/eval/datasets/vbench.py

def __init__(
    self,
    dimensions: list[str] | str = "all",
    full_info_path: str | Path | None = None,
) -> None:
    super().__init__()
    path = Path(full_info_path) if full_info_path else _locate_full_info()
    with path.open() as f:
        entries = json.load(f)

    all_dims = sorted({d for e in entries for d in e["dimension"]})
    if dimensions == "all":
        self.dimensions: list[str] = all_dims
    else:
        unknown = set(dimensions) - set(all_dims)
        if unknown:
            raise ValueError(f"Unknown VBench dimensions: {sorted(unknown)}. "
                             f"Available: {all_dims}")
        self.dimensions = list(dimensions)

    wanted = set(self.dimensions)
    for entry in entries:
        relevant = [d for d in entry["dimension"] if d in wanted]
        if not relevant:
            continue
        n = (TEMPORAL_FLICKERING_SAMPLES if "temporal_flickering" in relevant else DEFAULT_SAMPLES)

        # Strip the outer {dim_name: ...} wrapper from upstream's aux
        # schema so every metric reads its inputs from a flat dict.
        #
        # This unwraps exactly one level — the dimension key. Whatever
        # shape lives inside is the metric's contract:
        #
        #   color:                {"color": {"color": "red"}}
        #     → flat: {"color": "red"}                  (scalar)
        #
        #   object_class:         {"object_class": {"object": "person"}}
        #     → flat: {"object": "person"}              (scalar)
        #
        #   multiple_objects:     {"multiple_objects": {"object": "a and b"}}
        #     → flat: {"object": "a and b"}             (scalar)
        #
        #   spatial_relationship: {"spatial_relationship":
        #                            {"spatial_relationship":
        #                                {"object_a": ..., "object_b": ...,
        #                                 "relationship": ...}}}
        #     → flat: {"spatial_relationship": {object_a,object_b,relationship}}
        #
        # Note the spatial_relationship case keeps a nested inner dict
        # by design — upstream double-wraps it, the SpatialRelationship
        # metric reads ``aux["spatial_relationship"]`` expecting that
        # inner dict. Don't "simplify" the wrapping away.
        raw_aux = entry.get("auxiliary_info") or {}
        flat_aux: dict = {}
        for v in raw_aux.values():
            if isinstance(v, dict):
                flat_aux.update(v)

        self._rows.append({
            "prompt": entry["prompt_en"],
            "n_samples": n,
            "dimensions": relevant,
            "auxiliary_info": flat_aux,
        })
    self.full_info_path = path

Functions:¶

fastvideo.eval.datasets.get_dataset ¶

get_dataset(name: str, **kwargs: Any) -> BasePromptDataset

Instantiate a registered dataset by name.

Source code in fastvideo/eval/datasets/registry.py

def get_dataset(name: str, **kwargs: Any) -> BasePromptDataset:
    """Instantiate a registered dataset by name."""
    cls = _REGISTRY.get(name)
    if cls is None:
        available = ", ".join(sorted(_REGISTRY.keys()))
        raise KeyError(f"Unknown dataset '{name}'. Available: {available}")
    return cls(**kwargs)

fastvideo.eval.datasets.list_datasets ¶

list_datasets() -> list[str]

Return sorted list of all registered dataset names.

Source code in fastvideo/eval/datasets/registry.py

def list_datasets() -> list[str]:
    """Return sorted list of all registered dataset names."""
    return sorted(_REGISTRY.keys())

fastvideo.eval.datasets.register_dataset ¶

register_dataset(name: str)

Decorator to register a prompt-dataset class.

Usage::

@register_dataset("vbench")
class VBenchPromptDataset(BasePromptDataset):
    ...

Source code in fastvideo/eval/datasets/registry.py

def register_dataset(name: str):
    """Decorator to register a prompt-dataset class.

    Usage::

        @register_dataset("vbench")
        class VBenchPromptDataset(BasePromptDataset):
            ...
    """

    def wrapper(cls):
        cls.name = name
        _REGISTRY[name] = cls
        return cls

    return wrapper