metric ¶ VBench Appearance Style — CLIP ViT-B/32 text-image alignment. Per-frame cosine similarity between CLIP image features and a text prompt describing the expected style. Requires sample["text_prompt"]. Classes¶ Functions¶