wer
¶
Modules¶
fastvideo.eval.metrics.audio.wer.metric
¶
Word Error Rate for generated audio.
Per-sample. Default backend is Whisper-base; glm_asr (vendored under
third_party/eval/glmasr/) and sensevoice (FunASR, install-on-
demand) are selectable via asr_backend=. Text is normalized per the
MagiHuman convention and CJK inputs are scored at the character level.
Classes¶
fastvideo.eval.metrics.audio.wer.metric.WERMetric
¶
WERMetric(asr_backend: str = 'whisper', model_name: str | None = None, instruction: str = 'Please transcribe this audio into text')
Bases: BaseMetric
WER metric with selectable ASR backend.
Sample fields: audio, reference_text (or text_prompt),
optional language. instruction is passed to GLM-ASR.