Skip to content

vbench

VBench prompt corpus.

Single source of truth: upstream's VBench_full_info.json (946 entries, each with prompt_en, a dimension list, optional auxiliary_info keyed by dimension).

Classes

fastvideo.eval.datasets.vbench.VBenchPromptDataset

VBenchPromptDataset(dimensions: list[str] | str = 'all', full_info_path: str | Path | None = None)

Bases: PromptDataset

VBench prompts filtered by evaluation dimension.

Parameters:

Name Type Description Default
dimensions list[str] | str

List of dimension names, or "all". Unknown dimensions raise ValueError.

'all'
full_info_path str | Path | None

Optional override for VBench_full_info.json; defaults to autodetection.

None

A prompt that belongs to several requested dimensions is yielded once; its dimensions list carries all matches so the scorer can route.

Source code in fastvideo/eval/datasets/vbench.py
def __init__(
    self,
    dimensions: list[str] | str = "all",
    full_info_path: str | Path | None = None,
) -> None:
    super().__init__()
    path = Path(full_info_path) if full_info_path else _locate_full_info()
    with path.open() as f:
        entries = json.load(f)

    all_dims = sorted({d for e in entries for d in e["dimension"]})
    if dimensions == "all":
        self.dimensions: list[str] = all_dims
    else:
        unknown = set(dimensions) - set(all_dims)
        if unknown:
            raise ValueError(f"Unknown VBench dimensions: {sorted(unknown)}. "
                             f"Available: {all_dims}")
        self.dimensions = list(dimensions)

    wanted = set(self.dimensions)
    for entry in entries:
        relevant = [d for d in entry["dimension"] if d in wanted]
        if not relevant:
            continue
        n = (TEMPORAL_FLICKERING_SAMPLES if "temporal_flickering" in relevant else DEFAULT_SAMPLES)

        # Strip the outer {dim_name: ...} wrapper from upstream's aux
        # schema so every metric reads its inputs from a flat dict.
        #
        # This unwraps exactly one level — the dimension key. Whatever
        # shape lives inside is the metric's contract:
        #
        #   color:                {"color": {"color": "red"}}
        #     → flat: {"color": "red"}                  (scalar)
        #
        #   object_class:         {"object_class": {"object": "person"}}
        #     → flat: {"object": "person"}              (scalar)
        #
        #   multiple_objects:     {"multiple_objects": {"object": "a and b"}}
        #     → flat: {"object": "a and b"}             (scalar)
        #
        #   spatial_relationship: {"spatial_relationship":
        #                            {"spatial_relationship":
        #                                {"object_a": ..., "object_b": ...,
        #                                 "relationship": ...}}}
        #     → flat: {"spatial_relationship": {object_a,object_b,relationship}}
        #
        # Note the spatial_relationship case keeps a nested inner dict
        # by design — upstream double-wraps it, the SpatialRelationship
        # metric reads ``aux["spatial_relationship"]`` expecting that
        # inner dict. Don't "simplify" the wrapping away.
        raw_aux = entry.get("auxiliary_info") or {}
        flat_aux: dict = {}
        for v in raw_aux.values():
            if isinstance(v, dict):
                flat_aux.update(v)

        self._rows.append({
            "prompt": entry["prompt_en"],
            "n_samples": n,
            "dimensions": relevant,
            "auxiliary_info": flat_aux,
        })
    self.full_info_path = path

Functions