VBench Human Action — UMT ViT-L/16 action classification (Kinetics-400).
Classifies human actions in 16-frame clips. Top-5 predictions with
confidence >= 0.85 are compared against the ground-truth action label.
Score = 1.0 if match found, 0.0 otherwise.
Classes
Functions