benchmarks
¶
Modules¶
fastvideo.models.loader.benchmarks.benchmark_weight_loading
¶
Benchmark for model weight loading speed.
Measures the time to load model weights from safetensors files using different strategies (CPU vs GPU, broadcast vs independent).
Usage (single GPU): python fastvideo/models/loader/benchmarks/benchmark_weight_loading.py --model-path /path/to/model
Usage (multi-GPU, e.g. 4 GPUs): torchrun --nproc_per_node=4 fastvideo/models/loader/benchmarks/benchmark_weight_loading.py --model-path /path/to/model
Functions¶
fastvideo.models.loader.benchmarks.benchmark_weight_loading.benchmark_loading
¶
benchmark_loading(files: list[str], to_cpu: bool, broadcast: bool, warmup: int, repeats: int, label: str) -> None
Run the weight loading benchmark and print results.
Source code in fastvideo/models/loader/benchmarks/benchmark_weight_loading.py
fastvideo.models.loader.benchmarks.benchmark_weight_loading_comparison
¶
A/B benchmark: independent-read vs rank-0-broadcast weight loading.
Compares two strategies
- "before" (independent): every rank reads safetensors from disk to GPU
- "after" (broadcast): rank 0 reads from disk, broadcasts to other ranks
Usage
1 GPU¶
torchrun --nproc_per_node=1 fastvideo/models/loader/benchmarks/benchmark_weight_loading_comparison.py --model-path /path/to/model --subfolder transformer
2 GPUs¶
torchrun --nproc_per_node=2 fastvideo/models/loader/benchmarks/benchmark_weight_loading_comparison.py --model-path /path/to/model --subfolder transformer
4 GPUs¶
torchrun --nproc_per_node=4 fastvideo/models/loader/benchmarks/benchmark_weight_loading_comparison.py --model-path /path/to/model --subfolder transformer
Functions¶
fastvideo.models.loader.benchmarks.benchmark_weight_loading_comparison.load_broadcast
¶
After-PR behavior: rank 0 reads from disk, broadcasts to other ranks.
Source code in fastvideo/models/loader/benchmarks/benchmark_weight_loading_comparison.py
fastvideo.models.loader.benchmarks.benchmark_weight_loading_comparison.load_independent
¶
Before-PR behavior: every rank reads every tensor from disk to GPU.