Skip to content

metric

VBench Overall Consistency — ViCLIP text-video alignment.

Encodes 8 sampled video frames via ViCLIP vision encoder and a text prompt via ViCLIP text encoder, then computes cosine similarity.

Classes

Functions