performance
¶
Modules¶
fastvideo.tests.performance.compare_baseline
¶
Track performance results and compare against historical baseline.
This script: 1) reads current benchmark results from fastvideo/tests/performance/results, 2) syncs the canonical baseline from the configured HF dataset repo, 3) compares each current record against the median of up to 5 prior records (filtered by gpu_type, successful only), 4) on persist runs (full-suite on main branch), writes the normalized record back to the HF dataset repo, 5) exits non-zero if any metric regresses by more than PERF_MAX_REGRESSION (default 5%).
Functions:¶
fastvideo.tests.performance.compare_baseline.normalize_performance_result
¶
Normalize a raw perf_*.json result into the HF tracking schema.
The Buildkite artifact intentionally keeps the raw benchmark output from test_inference_performance.py. Baseline comparison, main-branch persistence, and manual baseline reseeds should all use this mapping so the stored HF records do not drift from the artifact schema.
Source code in fastvideo/tests/performance/compare_baseline.py
fastvideo.tests.performance.hf_store
¶
Shared HuggingFace storage utilities for performance tracking.
Provides a single place for: - Syncing the HF dataset repo to a local directory - Loading raw JSON records (with optional recency filter) - Loading records as a normalized pandas DataFrame - Uploading individual result files back to HF - Common helpers: sanitize, safe_float
Functions:¶
fastvideo.tests.performance.hf_store.load_as_dataframe
¶
load_as_dataframe(local_dir: str, *, days: int | None = None, successful_only: bool = False) -> DataFrame
Load and normalize records from local_dir into a pandas DataFrame.
Combines :func:load_records + :func:normalize_dataframe into a single
call for consumers (e.g. the dashboard) that work exclusively with
DataFrames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_dir
|
str
|
Root directory previously populated by :func: |
required |
days
|
int | None
|
Passed through to :func: |
None
|
successful_only
|
bool
|
Passed through to :func: |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Normalized DataFrame, or an empty DataFrame if no records were found. |
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.load_records
¶
load_records(local_dir: str, *, days: int | None = None, successful_only: bool = False) -> list[dict[str, Any]]
Return raw JSON dicts from local_dir.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_dir
|
str
|
Root directory previously populated by :func: |
required |
days
|
int | None
|
When set, discard records whose |
None
|
successful_only
|
bool
|
When True, only records with |
False
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of raw dicts sorted by |
list[dict[str, Any]]
|
not be parsed are silently skipped). |
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.load_records_for_model
¶
load_records_for_model(local_dir: str, model_id: str, gpu_type: str | None = None, *, last_n: int | None = None, successful_only: bool = True) -> list[dict[str, Any]]
Return records for a specific model_id, optionally filtered by GPU.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_dir
|
str
|
Root directory previously populated by :func: |
required |
model_id
|
str
|
Matches the |
required |
gpu_type
|
str | None
|
When set, only records whose |
None
|
last_n
|
int | None
|
When set, return only the most recent n records (after all other filters). Useful for sliding-window baseline calculations. |
None
|
successful_only
|
bool
|
Passed through to :func: |
True
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of matching dicts sorted by timestamp ascending. |
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.normalize_dataframe
¶
Apply standard type coercions to a raw records DataFrame.
- Parses
timestampto UTC-aware datetime. - Coerces
latency,throughput,memory,text_encoder_time_s,dit_time_s,vae_decode_time_sto float. - Adds a
config_idcolumn (first 7 chars ofcommit_sha).
Returns the mutated DataFrame (also modifies in place for efficiency).
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.safe_float
¶
Coerce value to float, returning None on failure.
fastvideo.tests.performance.hf_store.sanitize
¶
fastvideo.tests.performance.hf_store.sync_from_hf
¶
Download the HF dataset repo snapshot to local_dir.
Returns local_dir so callers can chain: load_records(sync_from_hf(...)).
By default (strict=False) failures are logged and local_dir is
returned unchanged, so dashboard / PR consumers stay resilient when HF is
unavailable. Callers that depend on the sync for correctness (e.g. the
main-branch baseline writer) must pass strict=True so that misconfig
or transient HF errors fail loud rather than silently reset the baseline.
When reuse_existing=True, a previous successful sync in local_dir
is reused only while its marker is fresh. This avoids duplicate HF
snapshot checks when compare and dashboard scripts run sequentially in the
same CI job, without silently reusing stale data in persistent local or
long-lived runner environments.
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.upload_record
¶
Upload local_path to the HF repo under <model_id>/<filename>.
By default failures (missing token, network errors) are logged and
swallowed. Pass strict=True when the upload is part of a write-path
that must not silently lose records — otherwise the rolling baseline can
stop advancing without any signal in the build log.
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.test_inference_performance
¶
Config-driven inference performance tests.
Benchmark configs live in .buildkite/performance-benchmarks/tests/*.json. Each JSON file defines model params, generation kwargs, run config, and per-device thresholds. This test module auto-discovers all configs and parametrizes a single test function over them.
Classes¶
Functions:¶
fastvideo.tests.performance.test_inference_performance.test_inference_performance
¶
Measure generation latency, peak GPU memory, and component-level timings (text encoder, DiT, VAE decode). Assert each against device-aware thresholds.