inputs
¶
Input-shape helpers: paths → samples list.
The :class:Evaluator and every metric consume the same internal
representation — a list[dict] with one dict per sample. Building
that list by hand is the largest source of ceremony in user scripts.
:func:samples_from is a pure function that turns path-style inputs
(generated videos / audio, optional paired references, optional
per-sample prompts / fps / metadata) into the canonical samples list,
ready to hand to :meth:Evaluator.evaluate.
Design rules:
- No Evaluator state, no metric introspection, no I/O beyond reading the prompts file when given. Just dict assembly.
- "Extra" keys on a sample are free — metrics each read what they need and ignore the rest. So one fat samples list naturally serves many metrics in one Evaluator.
- No "primary modality" axis. Each modality is a named kwarg
(
video=,audio=); pass whichever you have. Both is fine. - No
modeaxis. The shape of the output is determined by cardinality: when|gen| == |ref|the samples list is pair-zipped; when|gen| < |ref|the unmatched references become role-tagged set samples for corpus-shaped metrics (FVD / FAD) without disturbing per-sample paired metrics (LPIPS / PSNR / SSIM / gt_optical_flow).
Classes¶
Functions¶
fastvideo.eval.io.inputs.as_video
¶
Coerce path/tensor/Video → :class:Video for the pool to decode.
Path strings and :class:pathlib.Path become Video(source=str(x));
the pool then calls :func:load_video on first use. Tensors become
Video(source=None, frames=x) — the pool sees .frames already
populated and forwards untouched. :class:Video instances pass through.
Source code in fastvideo/eval/io/inputs.py
fastvideo.eval.io.inputs.samples_from
¶
samples_from(*, video: PathSpec | None = None, reference: PathSpec | None = None, audio: PathSpec | None = None, reference_audio: PathSpec | None = None, text_prompt: str | None = None, text_prompts: str | Path | list[str] | None = None, fps: float | None = None, auxiliary_info: dict | list[dict] | None = None, extras: dict | list[dict] | None = None, extract_audio: bool | str | Path = False, extract_workers: int = 4) -> list[dict]
Build a samples list from path-style inputs.
Parameters¶
video, reference, audio, reference_audio :
File path, directory of files (sorted by name), or any iterable
of paths. Pass whichever modalities apply to the metrics you
plan to run — they attach to sample["video"] /
sample["reference"] / sample["audio"] /
sample["reference_audio"] respectively. Video paths are
wrapped in :class:Video so :class:VideoPool decodes them
lazily in parallel; audio paths stay as strings (audio metrics
each load with their own resample / preprocess).
text_prompt :
A single prompt string broadcast onto every sample.
text_prompts :
A list of strings (one per sample), or a path to a .jsonl /
.json file containing per-sample prompts.
fps :
Scalar fps broadcast onto every sample.
auxiliary_info :
Single dict (broadcast) or list of dicts (zipped) for
sample["auxiliary_info"] — vbench structured-prompt
metrics read this.
extras :
Catch-all per-sample attachments. Use for metric-specific keys
the dedicated kwargs don't cover (scenario, view,
actions, calibration, reference_take2, ...). Pass
a single dict to broadcast or a list-of-dicts to zip; the keys
merge into each sample dict.
extract_audio :
If truthy, auto-extract audio from each video /
reference source into .wav files via PyAV and attach
the paths under sample["audio"] / sample["reference_audio"].
Pass a path for a persistent cache, True for a tempdir.
Skipped silently for videos with no audio stream; ignored
wherever audio / reference_audio is already explicit.
extract_workers :
Parallel workers for extract_audio.
Returns¶
list[dict]
Canonical samples shape — hand directly to
:meth:Evaluator.evaluate.
Cardinality and shape¶
Let N = len(generated inputs) (the agreed length of whichever
of video / audio you passed). References are attached
1:1 onto the first N samples; any extras (when |ref| > N)
become standalone role-tagged samples at the end of the list, so
set metrics like FVD see the full reference corpus while per-sample
paired metrics like LPIPS only run on the first N pairs.
Notes¶
"Missing" keys are simply absent from the sample dict. Metrics
handle them per their own contract (sample.get(...) for
optional, sample[...] raises for required, or
:meth:BaseMetric._skip for opt-in skip behavior). One fat samples
list with many keys can serve many metrics — each reads its subset.
Source code in fastvideo/eval/io/inputs.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |