video_sparse_attn
¶
Classes¶
fastvideo.attention.backends.video_sparse_attn.VideoSparseAttentionImpl
¶
VideoSparseAttentionImpl(num_heads: int, head_size: int, causal: bool, softmax_scale: float, num_kv_heads: int | None = None, prefix: str = '', **extra_impl_args)
Bases: AttentionImpl
Source code in fastvideo/attention/backends/video_sparse_attn.py
Methods:¶
fastvideo.attention.backends.video_sparse_attn.VideoSparseAttentionImpl.preprocess_qkv
¶
Tile QKV; aliasing contract: see tile().
fastvideo.attention.backends.video_sparse_attn.VideoSparseAttentionImpl.tile
¶
Tile x into attn_metadata.tile_buf and return it.
The returned tensor aliases the per-metadata buffer and is only
valid until the next tile() / preprocess_qkv call on the
same attn_metadata. Callers must consume (or copy) the
result before invoking another VSA layer with the same metadata.
Today both call sites materialize copies via
.transpose(...).contiguous() inside forward(), so the
contract holds; future callers must preserve it.
Source code in fastvideo/attention/backends/video_sparse_attn.py
Functions:¶
fastvideo.attention.backends.video_sparse_attn.construct_variable_block_sizes
cached
¶
construct_variable_block_sizes(dit_seq_shape: tuple[int, int, int], num_tiles: tuple[int, int, int], device: device) -> LongTensor
Compute the number of valid (non‑padded) tokens inside every
(ts_t × ts_h × ts_w) tile after padding ‑‑ flattened in the order
(t‑tile, h‑tile, w‑tile) that rearrange uses.
Returns¶
torch.LongTensor # shape: [∏ full_window_size]