Skip to content

scene

Modules

fastvideo.eval.metrics.vbench.scene.metric

VLM-based scene matching using AVoCaDO (Qwen2.5-Omni).

Replaces VBench's Tag2Text-based scene metric with a modern VLM caption. The algorithm follows VBench: 1. Caption the video 2. Check if all scene keywords appear in the caption 3. Score = 1.0 if all match, 0.0 otherwise

Unlike VBench (which captions each frame separately with Tag2Text), AVoCaDO captions the entire video in one pass with rich natural language, making the keyword check more robust.

Classes

fastvideo.eval.metrics.vbench.scene.metric.SceneMetric
SceneMetric(model_path: str = 'AVoCaDO-Captioner/AVoCaDO')

Bases: BaseMetric

Source code in fastvideo/eval/metrics/vbench/scene/metric.py
def __init__(self, model_path: str = "AVoCaDO-Captioner/AVoCaDO") -> None:
    super().__init__()
    self._model: Any = None
    self._processor: Any = None
    self._model_path = model_path

Functions