Track performance results and compare against historical baseline.
This script:
1) reads current benchmark results from fastvideo/tests/performance/results,
2) syncs the canonical baseline from the configured HF dataset repo,
3) compares each current record against the median of up to 5 prior records
(filtered by gpu_type, successful only),
4) on persist runs (full-suite on main branch), writes the normalized record
back to the HF dataset repo,
5) exits non-zero if any metric regresses by more than PERF_MAX_REGRESSION
(default 5%).
Normalize a raw perf_*.json result into the HF tracking schema.
The Buildkite artifact intentionally keeps the raw benchmark output from
test_inference_performance.py. Baseline comparison, main-branch persistence,
and manual baseline reseeds should all use this mapping so the stored HF
records do not drift from the artifact schema.
Source code in fastvideo/tests/performance/compare_baseline.py
| def normalize_performance_result(result: dict[str, Any]) -> dict[str, Any]:
"""Normalize a raw perf_*.json result into the HF tracking schema.
The Buildkite artifact intentionally keeps the raw benchmark output from
test_inference_performance.py. Baseline comparison, main-branch persistence,
and manual baseline reseeds should all use this mapping so the stored HF
records do not drift from the artifact schema.
"""
benchmark_id = result.get("benchmark_id", "unknown")
model_id = benchmark_id
timestamp = result.get("timestamp")
if not timestamp:
timestamp = datetime.now(timezone.utc).isoformat()
commit_sha = result.get("commit") or os.environ.get("BUILDKITE_COMMIT", "")
latency = safe_float(result.get("avg_generation_time_s"))
throughput = safe_float(result.get("throughput_fps"))
memory = safe_float(result.get("max_peak_memory_mb"))
text_encoder_time = safe_float(result.get("text_encoder_time_s"))
dit_time = safe_float(result.get("dit_time_s"))
vae_decode_time = safe_float(result.get("vae_decode_time_s"))
return {
"model_id": model_id,
"timestamp": timestamp,
"commit_sha": commit_sha,
"gpu_type": result.get("device", "unknown"),
"latency": latency,
"throughput": throughput,
"memory": memory,
"text_encoder_time_s": text_encoder_time,
"dit_time_s": dit_time,
"vae_decode_time_s": vae_decode_time,
"success": True,
}
|