io
¶
Classes¶
fastvideo.eval.io.NoAudioStreamError
¶
Bases: ValueError
Raised when a video file has no audio stream to extract.
Functions¶
fastvideo.eval.io.as_video
¶
Coerce path/tensor/Video → :class:Video for the pool to decode.
Path strings and :class:pathlib.Path become Video(source=str(x));
the pool then calls :func:load_video on first use. Tensors become
Video(source=None, frames=x) — the pool sees .frames already
populated and forwards untouched. :class:Video instances pass through.
Source code in fastvideo/eval/io/inputs.py
fastvideo.eval.io.build_eval_kwargs
¶
Build evaluator kwargs from a sample row + a video on disk.
Loads the video as (T,C,H,W) and adds the leading batch dim.
Forwards prompt (as scalar text_prompt) and
auxiliary_info (as scalar dict) when present on the row —
matches the one-sample-per-call contract that the evaluator and
every metric assume.
Source code in fastvideo/eval/io/paths.py
fastvideo.eval.io.default_filename
¶
fastvideo.eval.io.extract_audio_track
¶
extract_audio_track(video_path: str | Path, *, output_dir: str | Path, sample_rate: int | None = None, codec: str = 'pcm_s16le') -> Path
Pull the first audio stream from video_path to a .wav in output_dir.
Idempotent — if output_dir / {stem}.wav already exists, the
existing file is returned without re-decoding. This makes the
helper safe to call repeatedly from parallel workers on overlapping
inputs (the cache hit is the fast path).
Parameters¶
video_path :
Source video file (anything PyAV can open: mp4, mkv, webm, …).
output_dir :
Directory the .wav lands in. Created if it doesn't exist.
sample_rate :
If set, resample to this rate. Default (None) preserves the
source rate — most audio metrics resample again internally to
their own target rate, so adding a resample step here would just
be wasted work.
codec :
PCM codec for the output. pcm_s16le (default) gives 16-bit
little-endian PCM, the most broadly compatible WAV format.
Returns¶
Path
Absolute path to the extracted .wav.
Raises¶
NoAudioStreamError If video_path has no audio stream.
Source code in fastvideo/eval/io/audio.py
fastvideo.eval.io.extract_frames
¶
extract_frames(video: Tensor, n_frames: int | None = None) -> Tensor
Uniformly sample n_frames from a (T, C, H, W) video tensor.
Source code in fastvideo/eval/io/video.py
fastvideo.eval.io.glob_videos
¶
Find every generated video for row, sorted by trailing -<idx>.
Source code in fastvideo/eval/io/paths.py
fastvideo.eval.io.load_video
¶
Load a video as a (T, C, H, W) float32 tensor in [0, 1].
Supported source types:
- str / Path – path to
.mp4/.avi/.giffile, or a directory of frame images (sorted alphabetically). - torch.Tensor – returned as-is after shape validation.
- list[PIL.Image] – stacked into a tensor.
Source code in fastvideo/eval/io/video.py
fastvideo.eval.io.samples_from
¶
samples_from(*, video: PathSpec | None = None, reference: PathSpec | None = None, audio: PathSpec | None = None, reference_audio: PathSpec | None = None, text_prompt: str | None = None, text_prompts: str | Path | list[str] | None = None, fps: float | None = None, auxiliary_info: dict | list[dict] | None = None, extras: dict | list[dict] | None = None, extract_audio: bool | str | Path = False, extract_workers: int = 4) -> list[dict]
Build a samples list from path-style inputs.
Parameters¶
video, reference, audio, reference_audio :
File path, directory of files (sorted by name), or any iterable
of paths. Pass whichever modalities apply to the metrics you
plan to run — they attach to sample["video"] /
sample["reference"] / sample["audio"] /
sample["reference_audio"] respectively. Video paths are
wrapped in :class:Video so :class:VideoPool decodes them
lazily in parallel; audio paths stay as strings (audio metrics
each load with their own resample / preprocess).
text_prompt :
A single prompt string broadcast onto every sample.
text_prompts :
A list of strings (one per sample), or a path to a .jsonl /
.json file containing per-sample prompts.
fps :
Scalar fps broadcast onto every sample.
auxiliary_info :
Single dict (broadcast) or list of dicts (zipped) for
sample["auxiliary_info"] — vbench structured-prompt
metrics read this.
extras :
Catch-all per-sample attachments. Use for metric-specific keys
the dedicated kwargs don't cover (scenario, view,
actions, calibration, reference_take2, ...). Pass
a single dict to broadcast or a list-of-dicts to zip; the keys
merge into each sample dict.
extract_audio :
If truthy, auto-extract audio from each video /
reference source into .wav files via PyAV and attach
the paths under sample["audio"] / sample["reference_audio"].
Pass a path for a persistent cache, True for a tempdir.
Skipped silently for videos with no audio stream; ignored
wherever audio / reference_audio is already explicit.
extract_workers :
Parallel workers for extract_audio.
Returns¶
list[dict]
Canonical samples shape — hand directly to
:meth:Evaluator.evaluate.
Cardinality and shape¶
Let N = len(generated inputs) (the agreed length of whichever
of video / audio you passed). References are attached
1:1 onto the first N samples; any extras (when |ref| > N)
become standalone role-tagged samples at the end of the list, so
set metrics like FVD see the full reference corpus while per-sample
paired metrics like LPIPS only run on the first N pairs.
Notes¶
"Missing" keys are simply absent from the sample dict. Metrics
handle them per their own contract (sample.get(...) for
optional, sample[...] raises for required, or
:meth:BaseMetric._skip for opt-in skip behavior). One fat samples
list with many keys can serve many metrics — each reads its subset.
Source code in fastvideo/eval/io/inputs.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
fastvideo.eval.io.sanitize_prompt
¶
Modules¶
fastvideo.eval.io.audio
¶
Audio-extraction helper for video files.
Audio metrics (audio.clap_score, audio.frechet_distance, …) read
file paths under sample["audio"] and do their own per-metric
preprocessing (CLAP at 48 kHz, PaSST at 32 kHz, whisper at 16 kHz mono,
…). When the source video carries an audio track (V2A / T2A-V model
outputs), :func:samples_from(extract_audio=True) calls
:func:extract_audio_track to pull a .wav next to it once per video.
Why a separate utility instead of in-pool decode: every audio metric wants a different sample rate / channel count / format, so the pool can't usefully pre-load audio into a single canonical tensor the way it pre-loads video frames. Paths-in / paths-out is the lingua franca for audio in this codebase.
Classes¶
fastvideo.eval.io.audio.NoAudioStreamError
¶
Bases: ValueError
Raised when a video file has no audio stream to extract.
Functions¶
fastvideo.eval.io.audio.extract_audio_track
¶
extract_audio_track(video_path: str | Path, *, output_dir: str | Path, sample_rate: int | None = None, codec: str = 'pcm_s16le') -> Path
Pull the first audio stream from video_path to a .wav in output_dir.
Idempotent — if output_dir / {stem}.wav already exists, the
existing file is returned without re-decoding. This makes the
helper safe to call repeatedly from parallel workers on overlapping
inputs (the cache hit is the fast path).
Parameters¶
video_path :
Source video file (anything PyAV can open: mp4, mkv, webm, …).
output_dir :
Directory the .wav lands in. Created if it doesn't exist.
sample_rate :
If set, resample to this rate. Default (None) preserves the
source rate — most audio metrics resample again internally to
their own target rate, so adding a resample step here would just
be wasted work.
codec :
PCM codec for the output. pcm_s16le (default) gives 16-bit
little-endian PCM, the most broadly compatible WAV format.
Returns¶
Path
Absolute path to the extracted .wav.
Raises¶
NoAudioStreamError If video_path has no audio stream.
Source code in fastvideo/eval/io/audio.py
fastvideo.eval.io.inputs
¶
Input-shape helpers: paths → samples list.
The :class:Evaluator and every metric consume the same internal
representation — a list[dict] with one dict per sample. Building
that list by hand is the largest source of ceremony in user scripts.
:func:samples_from is a pure function that turns path-style inputs
(generated videos / audio, optional paired references, optional
per-sample prompts / fps / metadata) into the canonical samples list,
ready to hand to :meth:Evaluator.evaluate.
Design rules:
- No Evaluator state, no metric introspection, no I/O beyond reading the prompts file when given. Just dict assembly.
- "Extra" keys on a sample are free — metrics each read what they need and ignore the rest. So one fat samples list naturally serves many metrics in one Evaluator.
- No "primary modality" axis. Each modality is a named kwarg
(
video=,audio=); pass whichever you have. Both is fine. - No
modeaxis. The shape of the output is determined by cardinality: when|gen| == |ref|the samples list is pair-zipped; when|gen| < |ref|the unmatched references become role-tagged set samples for corpus-shaped metrics (FVD / FAD) without disturbing per-sample paired metrics (LPIPS / PSNR / SSIM / gt_optical_flow).
Classes¶
Functions¶
fastvideo.eval.io.inputs.as_video
¶
Coerce path/tensor/Video → :class:Video for the pool to decode.
Path strings and :class:pathlib.Path become Video(source=str(x));
the pool then calls :func:load_video on first use. Tensors become
Video(source=None, frames=x) — the pool sees .frames already
populated and forwards untouched. :class:Video instances pass through.
Source code in fastvideo/eval/io/inputs.py
fastvideo.eval.io.inputs.samples_from
¶
samples_from(*, video: PathSpec | None = None, reference: PathSpec | None = None, audio: PathSpec | None = None, reference_audio: PathSpec | None = None, text_prompt: str | None = None, text_prompts: str | Path | list[str] | None = None, fps: float | None = None, auxiliary_info: dict | list[dict] | None = None, extras: dict | list[dict] | None = None, extract_audio: bool | str | Path = False, extract_workers: int = 4) -> list[dict]
Build a samples list from path-style inputs.
Parameters¶
video, reference, audio, reference_audio :
File path, directory of files (sorted by name), or any iterable
of paths. Pass whichever modalities apply to the metrics you
plan to run — they attach to sample["video"] /
sample["reference"] / sample["audio"] /
sample["reference_audio"] respectively. Video paths are
wrapped in :class:Video so :class:VideoPool decodes them
lazily in parallel; audio paths stay as strings (audio metrics
each load with their own resample / preprocess).
text_prompt :
A single prompt string broadcast onto every sample.
text_prompts :
A list of strings (one per sample), or a path to a .jsonl /
.json file containing per-sample prompts.
fps :
Scalar fps broadcast onto every sample.
auxiliary_info :
Single dict (broadcast) or list of dicts (zipped) for
sample["auxiliary_info"] — vbench structured-prompt
metrics read this.
extras :
Catch-all per-sample attachments. Use for metric-specific keys
the dedicated kwargs don't cover (scenario, view,
actions, calibration, reference_take2, ...). Pass
a single dict to broadcast or a list-of-dicts to zip; the keys
merge into each sample dict.
extract_audio :
If truthy, auto-extract audio from each video /
reference source into .wav files via PyAV and attach
the paths under sample["audio"] / sample["reference_audio"].
Pass a path for a persistent cache, True for a tempdir.
Skipped silently for videos with no audio stream; ignored
wherever audio / reference_audio is already explicit.
extract_workers :
Parallel workers for extract_audio.
Returns¶
list[dict]
Canonical samples shape — hand directly to
:meth:Evaluator.evaluate.
Cardinality and shape¶
Let N = len(generated inputs) (the agreed length of whichever
of video / audio you passed). References are attached
1:1 onto the first N samples; any extras (when |ref| > N)
become standalone role-tagged samples at the end of the list, so
set metrics like FVD see the full reference corpus while per-sample
paired metrics like LPIPS only run on the first N pairs.
Notes¶
"Missing" keys are simply absent from the sample dict. Metrics
handle them per their own contract (sample.get(...) for
optional, sample[...] raises for required, or
:meth:BaseMetric._skip for opt-in skip behavior). One fat samples
list with many keys can serve many metrics — each reads its subset.
Source code in fastvideo/eval/io/inputs.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
fastvideo.eval.io.paths
¶
Filesystem helpers shared by eval scripts.
Provides the prompt-sanitization, default filename convention, and
(row, video_path) → eval-kwargs builder. Free functions, not a
class — :class:fastvideo.eval.Evaluator is the only stateful object
in the eval surface; loops live in user scripts.
Functions¶
fastvideo.eval.io.paths.build_eval_kwargs
¶
Build evaluator kwargs from a sample row + a video on disk.
Loads the video as (T,C,H,W) and adds the leading batch dim.
Forwards prompt (as scalar text_prompt) and
auxiliary_info (as scalar dict) when present on the row —
matches the one-sample-per-call contract that the evaluator and
every metric assume.
Source code in fastvideo/eval/io/paths.py
fastvideo.eval.io.paths.default_filename
¶
fastvideo.eval.io.paths.glob_videos
¶
Find every generated video for row, sorted by trailing -<idx>.
Source code in fastvideo/eval/io/paths.py
fastvideo.eval.io.paths.sanitize_prompt
¶
fastvideo.eval.io.video
¶
Functions¶
fastvideo.eval.io.video.extract_frames
¶
extract_frames(video: Tensor, n_frames: int | None = None) -> Tensor
Uniformly sample n_frames from a (T, C, H, W) video tensor.
Source code in fastvideo/eval/io/video.py
fastvideo.eval.io.video.load_video
¶
Load a video as a (T, C, H, W) float32 tensor in [0, 1].
Supported source types:
- str / Path – path to
.mp4/.avi/.giffile, or a directory of frame images (sorted alphabetically). - torch.Tensor – returned as-is after shape validation.
- list[PIL.Image] – stacked into a tensor.