compare_baseline ¶

Track performance results and compare against historical baseline.

This script: 1) reads current benchmark results from fastvideo/tests/performance/results, 2) syncs the canonical baseline from the configured HF dataset repo, 3) compares each current record against the median of up to 5 prior records (filtered by gpu_type, successful only), 4) on persist runs (full-suite on main branch), writes the normalized record back to the HF dataset repo, 5) exits non-zero if any metric regresses by more than PERF_MAX_REGRESSION (default 5%).

Functions¶

fastvideo.tests.performance.compare_baseline.normalize_performance_result ¶

normalize_performance_result(result: dict[str, Any]) -> dict[str, Any]

Normalize a raw perf_*.json result into the HF tracking schema.

The Buildkite artifact intentionally keeps the raw benchmark output from test_inference_performance.py. Baseline comparison, main-branch persistence, and manual baseline reseeds should all use this mapping so the stored HF records do not drift from the artifact schema.

Source code in fastvideo/tests/performance/compare_baseline.py

def normalize_performance_result(result: dict[str, Any]) -> dict[str, Any]:
    """Normalize a raw perf_*.json result into the HF tracking schema.

    The Buildkite artifact intentionally keeps the raw benchmark output from
    test_inference_performance.py. Baseline comparison, main-branch persistence,
    and manual baseline reseeds should all use this mapping so the stored HF
    records do not drift from the artifact schema.
    """
    benchmark_id = result.get("benchmark_id", "unknown")
    model_id = benchmark_id

    timestamp = result.get("timestamp")
    if not timestamp:
        timestamp = datetime.now(timezone.utc).isoformat()

    commit_sha = result.get("commit") or os.environ.get("BUILDKITE_COMMIT", "")
    latency = safe_float(result.get("avg_generation_time_s"))
    throughput = safe_float(result.get("throughput_fps"))
    memory = safe_float(result.get("max_peak_memory_mb"))
    text_encoder_time = safe_float(result.get("text_encoder_time_s"))
    dit_time = safe_float(result.get("dit_time_s"))
    vae_decode_time = safe_float(result.get("vae_decode_time_s"))

    return {
        "model_id": model_id,
        "timestamp": timestamp,
        "commit_sha": commit_sha,
        "gpu_type": result.get("device", "unknown"),
        "latency": latency,
        "throughput": throughput,
        "memory": memory,
        "text_encoder_time_s": text_encoder_time,
        "dit_time_s": dit_time,
        "vae_decode_time_s": vae_decode_time,
        "success": True,
    }