Skip to content

extractors

Pluggable video feature extractors for FVD.

Three extractors, sharing the _BaseExtractor contract:

  • i3d — Kinetics-400 I3D (TorchScript, flateon/FVD-I3D-torchscript). The standard FVD feature space used in the literature.
  • clip — CLIP ViT-B/32 per-frame embeddings, mean-pooled over time. Captures semantic / content quality.
  • videomae — VideoMAE-base last-hidden-state, mean-pooled over patch tokens. Captures structural / motion quality.

The contract is intentionally narrow: each extractor takes a (B, T, C, H, W) float tensor in [0, 1] and returns (B, D) numpy features. Preprocessing (resize, normalize, layout) is the extractor's job; its callers should not care.

Functions

fastvideo.eval.metrics.common.fvd.extractors.load_extractor

load_extractor(name: str, device: device) -> _BaseExtractor

Instantiate the named extractor on device. Raises ValueError on unknown names.

Source code in fastvideo/eval/metrics/common/fvd/extractors.py
def load_extractor(name: str, device: torch.device) -> _BaseExtractor:
    """Instantiate the named extractor on *device*. Raises ``ValueError`` on unknown names."""
    cls = _EXTRACTORS.get(name)
    if cls is None:
        raise ValueError(f"Unknown FVD extractor '{name}'. Available: {available_extractors()}")
    return cls(device)