hf_store
¶
Shared HuggingFace storage utilities for performance tracking.
Provides a single place for: - Syncing the HF dataset repo to a local directory - Loading raw JSON records (with optional recency filter) - Loading records as a normalized pandas DataFrame - Uploading individual result files back to HF - Common helpers: sanitize, safe_float
Functions¶
fastvideo.tests.performance.hf_store.load_as_dataframe
¶
load_as_dataframe(local_dir: str, *, days: int | None = None, successful_only: bool = False) -> DataFrame
Load and normalize records from local_dir into a pandas DataFrame.
Combines :func:load_records + :func:normalize_dataframe into a single
call for consumers (e.g. the dashboard) that work exclusively with
DataFrames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_dir
|
str
|
Root directory previously populated by :func: |
required |
days
|
int | None
|
Passed through to :func: |
None
|
successful_only
|
bool
|
Passed through to :func: |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Normalized DataFrame, or an empty DataFrame if no records were found. |
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.load_records
¶
load_records(local_dir: str, *, days: int | None = None, successful_only: bool = False) -> list[dict[str, Any]]
Return raw JSON dicts from local_dir.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_dir
|
str
|
Root directory previously populated by :func: |
required |
days
|
int | None
|
When set, discard records whose |
None
|
successful_only
|
bool
|
When True, only records with |
False
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of raw dicts sorted by |
list[dict[str, Any]]
|
not be parsed are silently skipped). |
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.load_records_for_model
¶
load_records_for_model(local_dir: str, model_id: str, gpu_type: str | None = None, *, last_n: int | None = None, successful_only: bool = True) -> list[dict[str, Any]]
Return records for a specific model_id, optionally filtered by GPU.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
local_dir
|
str
|
Root directory previously populated by :func: |
required |
model_id
|
str
|
Matches the |
required |
gpu_type
|
str | None
|
When set, only records whose |
None
|
last_n
|
int | None
|
When set, return only the most recent n records (after all other filters). Useful for sliding-window baseline calculations. |
None
|
successful_only
|
bool
|
Passed through to :func: |
True
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of matching dicts sorted by timestamp ascending. |
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.normalize_dataframe
¶
Apply standard type coercions to a raw records DataFrame.
- Parses
timestampto UTC-aware datetime. - Coerces
latency,throughput,memory,text_encoder_time_s,dit_time_s,vae_decode_time_sto float. - Adds a
config_idcolumn (first 7 chars ofcommit_sha).
Returns the mutated DataFrame (also modifies in place for efficiency).
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.safe_float
¶
Coerce value to float, returning None on failure.
fastvideo.tests.performance.hf_store.sanitize
¶
fastvideo.tests.performance.hf_store.sync_from_hf
¶
Download the HF dataset repo snapshot to local_dir.
Returns local_dir so callers can chain: load_records(sync_from_hf(...)).
By default (strict=False) failures are logged and local_dir is
returned unchanged, so dashboard / PR consumers stay resilient when HF is
unavailable. Callers that depend on the sync for correctness (e.g. the
main-branch baseline writer) must pass strict=True so that misconfig
or transient HF errors fail loud rather than silently reset the baseline.
When reuse_existing=True, a previous successful sync in local_dir
is reused only while its marker is fresh. This avoids duplicate HF
snapshot checks when compare and dashboard scripts run sequentially in the
same CI job, without silently reusing stale data in persistent local or
long-lived runner environments.
Source code in fastvideo/tests/performance/hf_store.py
fastvideo.tests.performance.hf_store.upload_record
¶
Upload local_path to the HF repo under <model_id>/<filename>.
By default failures (missing token, network errors) are logged and
swallowed. Pass strict=True when the upload is part of a write-path
that must not silently lose records — otherwise the rolling baseline can
stop advancing without any signal in the build log.