scene
¶
Modules¶
fastvideo.eval.metrics.vbench.scene.metric
¶
VLM-based scene matching using AVoCaDO (Qwen2.5-Omni).
Replaces VBench's Tag2Text-based scene metric with a modern VLM caption. The algorithm follows VBench: 1. Caption the video 2. Check if all scene keywords appear in the caption 3. Score = 1.0 if all match, 0.0 otherwise
Unlike VBench (which captions each frame separately with Tag2Text), AVoCaDO captions the entire video in one pass with rich natural language, making the keyword check more robust.