metric ¶ VBench Overall Consistency — ViCLIP text-video alignment. Encodes 8 sampled video frames via ViCLIP vision encoder and a text prompt via ViCLIP text encoder, then computes cosine similarity. Classes¶ Functions¶