vbench
¶
VBench metrics. Bootstraps upstream submodule on sys.path and installs runtime compat shims for modern torch/transformers/numpy/timm.
The upstream vbench source lives as a git submodule at
fastvideo/third_party/eval/vbench (pinned to a specific
Vchitect/VBench SHA). We do not pip-install it — we only need its
Python modules importable. Its runtime deps (clip, transformers, etc.)
are already in FastVideo's main env.
Compat with modern dependency versions is achieved at import time, in this file, instead of via on-disk patches to upstream files. Each shim below corresponds to a specific drift between vbench's pinned-2023 deps and FastVideo's current pins. Adding a new shim is preferable to editing the submodule.
Modules¶
fastvideo.eval.metrics.vbench.aesthetic_quality
¶
fastvideo.eval.metrics.vbench.appearance_style
¶
fastvideo.eval.metrics.vbench.background_consistency
¶
fastvideo.eval.metrics.vbench.dynamic_degree
¶
Modules¶
fastvideo.eval.metrics.vbench.dynamic_degree.metric
¶
VBench Dynamic Degree — RAFT optical flow motion detection.
For each consecutive frame pair, computes optical flow via RAFT and takes the mean of the top 5% flow magnitudes. If enough pairs exceed an adaptive threshold, the video is classified as dynamic (1.0) vs static (0.0).
Classes¶
fastvideo.eval.metrics.vbench.dynamic_degree.metric.DynamicDegreeMetric
¶Functions¶
fastvideo.eval.metrics.vbench.human_action
¶
fastvideo.eval.metrics.vbench.imaging_quality
¶
fastvideo.eval.metrics.vbench.motion_smoothness
¶
Modules¶
fastvideo.eval.metrics.vbench.motion_smoothness.metric
¶
VBench Motion Smoothness — AMT-S frame interpolation quality.
Takes every-other frame, uses AMT-S to interpolate the missing middle frames, then compares interpolated vs actual frames. Score = (255 - mean_pixel_diff) / 255. Higher = smoother motion.
fastvideo.eval.metrics.vbench.multiple_objects
¶
fastvideo.eval.metrics.vbench.object_class
¶
fastvideo.eval.metrics.vbench.overall_consistency
¶
fastvideo.eval.metrics.vbench.scene
¶
Modules¶
fastvideo.eval.metrics.vbench.scene.metric
¶
VLM-based scene matching using AVoCaDO (Qwen2.5-Omni).
Replaces VBench's Tag2Text-based scene metric with a modern VLM caption. The algorithm follows VBench: 1. Caption the video 2. Check if all scene keywords appear in the caption 3. Score = 1.0 if all match, 0.0 otherwise
Unlike VBench (which captions each frame separately with Tag2Text), AVoCaDO captions the entire video in one pass with rich natural language, making the keyword check more robust.
fastvideo.eval.metrics.vbench.spatial_relationship
¶
fastvideo.eval.metrics.vbench.subject_consistency
¶
fastvideo.eval.metrics.vbench.temporal_flickering
¶
fastvideo.eval.metrics.vbench.temporal_style
¶
Modules¶
fastvideo.eval.metrics.vbench.temporal_style.metric
¶
VBench Temporal Style — ViCLIP text-video alignment (style focus).
Identical logic to overall_consistency — same ViCLIP cosine similarity. The difference is semantic: overall_consistency measures general prompt alignment while temporal_style measures style consistency over time. VBench uses different prompts for each from its metadata JSON.