entrypoints
¶
Modules¶
fastvideo.entrypoints.cli
¶
Modules¶
fastvideo.entrypoints.cli.bench
¶
Runs benchmark against a running FastVideo OpenAI-compatible server.
Example usage
fastvideo bench --dataset vbench --num-prompts 20 --port 8000
fastvideo.entrypoints.cli.bench_serving
¶
Benchmark online serving for diffusion models (Image/Video Generation).
Example usage
launch a server and benchmark on it¶
T2V or T2I or any other multimodal generation model¶
fastvideo serve --config serve.yaml
benchmark it and make sure the port is the same as the server's port¶
fastvideo bench --dataset vbench --num-prompts 20 --port 8000
fastvideo.entrypoints.cli.cli_types
¶
Classes¶
fastvideo.entrypoints.cli.cli_types.CLISubcommand
¶Base class for CLI subcommands
fastvideo.entrypoints.cli.cli_types.CLISubcommand.subparser_init
¶subparser_init(subparsers: _SubParsersAction) -> FlexibleArgumentParser
fastvideo.entrypoints.cli.eval
¶
fastvideo eval CLI: list registered eval metrics and run them
against a set of videos.
This is a thin wrapper around :mod:fastvideo.eval. Heavy lifting
(metric loading, GPU handling, batching) lives in
:func:fastvideo.eval.create_evaluator.
Examples::
fastvideo eval list
fastvideo eval list --group vbench
fastvideo eval run --videos path/to/videos/*.mp4 \
--metrics common.ssim --reference path/to/refs/
fastvideo eval run --videos clip.mp4 --metrics vbench.aesthetic_quality \
--output scores.json
fastvideo.entrypoints.cli.generate
¶
Classes¶
fastvideo.entrypoints.cli.generate.GenerateSubcommand
¶
Bases: CLISubcommand
The generate subcommand for the FastVideo CLI
Source code in fastvideo/entrypoints/cli/generate.py
fastvideo.entrypoints.cli.generate.GenerateSubcommand.validate
¶validate(args: Namespace) -> None
Validate the arguments for this command
Source code in fastvideo/entrypoints/cli/generate.py
Functions:¶
fastvideo.entrypoints.cli.main
¶
Classes¶
Functions:¶
fastvideo.entrypoints.cli.main.cmd_init
¶cmd_init() -> list[CLISubcommand]
Initialize all commands from separate modules
Source code in fastvideo/entrypoints/cli/main.py
fastvideo.entrypoints.cli.router_serve
¶
fastvideo router-serve CLI subcommand.
Launches the streaming router from a YAML config. Separate from
fastvideo serve because the router is an orthogonal process: it
fronts one or more running servers rather than hosting a generator
itself.
Classes¶
fastvideo.entrypoints.cli.router_serve.RouterServeSubcommand
¶
Bases: CLISubcommand
Start the multi-replica WebSocket router.
Source code in fastvideo/entrypoints/cli/router_serve.py
Functions:¶
fastvideo.entrypoints.cli.serve
¶
fastvideo.entrypoints.cli.utils
¶
Functions:¶
fastvideo.entrypoints.cli.utils.launch_distributed
¶Launch a distributed job with the given arguments
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_gpus
|
int
|
Number of GPUs to use |
required |
args
|
list[str]
|
Arguments to pass to v1_fastvideo_inference.py (defaults to sys.argv[1:]) |
required |
master_port
|
int | None
|
Port for the master process (default: random) |
None
|
Source code in fastvideo/entrypoints/cli/utils.py
fastvideo.entrypoints.openai
¶
Modules¶
fastvideo.entrypoints.openai.api_server
¶
Classes¶
Functions:¶
fastvideo.entrypoints.openai.api_server.create_app
¶create_app(fastvideo_args: FastVideoArgs, output_dir: str = DEFAULT_OUTPUT_DIR, default_request: GenerationRequest | None = None) -> FastAPI
Build the FastAPI application with all routers mounted
Source code in fastvideo/entrypoints/openai/api_server.py
fastvideo.entrypoints.openai.api_server.lifespan
async
¶lifespan(app: FastAPI) -> AsyncIterator[None]
Load model on startup, clean up on shutdown
Source code in fastvideo/entrypoints/openai/api_server.py
fastvideo.entrypoints.openai.api_server.run_server
¶run_server(fastvideo_args: FastVideoArgs, host: str = DEFAULT_HOST, port: int = DEFAULT_PORT, output_dir: str = DEFAULT_OUTPUT_DIR, default_request: GenerationRequest | None = None)
Create the app and run it with uvicorn
Source code in fastvideo/entrypoints/openai/api_server.py
fastvideo.entrypoints.openai.common_api
¶
Classes¶
fastvideo.entrypoints.openai.common_api.ModelCard
¶
Bases: BaseModel
OpenAI-compatible model card
Functions:¶
fastvideo.entrypoints.openai.common_api.available_models
async
¶Show available models
Source code in fastvideo/entrypoints/openai/common_api.py
fastvideo.entrypoints.openai.common_api.model_info
async
¶ fastvideo.entrypoints.openai.common_api.retrieve_model
async
¶retrieve_model(model: str)
Retrieve a model by name
Source code in fastvideo/entrypoints/openai/common_api.py
fastvideo.entrypoints.openai.protocol
¶
fastvideo.entrypoints.openai.state
¶
Global server state shared across API modules.
Keeping state in a dedicated module prevents the classic 'main vs package
module' duplication that occurs when api_server.py is run with python -m.
All modules that need the generator or server args should import from here.
Classes¶
Functions:¶
fastvideo.entrypoints.openai.state.clear_state
¶ fastvideo.entrypoints.openai.state.get_default_request
¶ fastvideo.entrypoints.openai.state.get_generator
¶get_generator() -> VideoGenerator
Return the global VideoGenerator instance (set during startup).
fastvideo.entrypoints.openai.state.get_server_args
¶get_server_args() -> FastVideoArgs
Return the global FastVideoArgs (set during startup).
fastvideo.entrypoints.openai.state.set_state
¶set_state(generator: VideoGenerator, fastvideo_args: FastVideoArgs, output_dir: str, default_request: GenerationRequest | None = None) -> None
Set all server state at once (called from lifespan).
Source code in fastvideo/entrypoints/openai/state.py
fastvideo.entrypoints.openai.stores
¶
Classes¶
fastvideo.entrypoints.openai.stores.AsyncDictStore
¶A small async-safe in-memory key-value store for dict items.
This encapsulates the usual pattern of a module-level dict guarded by an asyncio.Lock and provides simple CRUD methods that are safe to call concurrently from FastAPI request handlers and background tasks.
Source code in fastvideo/entrypoints/openai/stores.py
fastvideo.entrypoints.openai.utils
¶
Functions:¶
fastvideo.entrypoints.openai.utils.choose_image_ext
¶Pick a file extension for image outputs
Source code in fastvideo/entrypoints/openai/utils.py
fastvideo.entrypoints.openai.utils.merge_image_input_list
¶Merge multiple image input sources into a single flat list
Source code in fastvideo/entrypoints/openai/utils.py
fastvideo.entrypoints.openai.utils.parse_size
¶Parse a 'WIDTHxHEIGHT' string into (width, height)
Source code in fastvideo/entrypoints/openai/utils.py
fastvideo.entrypoints.openai.utils.save_image_to_path
async
¶Save an uploaded file or download from URL to target_path
Source code in fastvideo/entrypoints/openai/utils.py
fastvideo.entrypoints.streaming
¶
Classes¶
fastvideo.entrypoints.streaming.BlobStore
¶
Bases: ABC
Opaque byte-blob storage keyed by id.
A :class:ContinuationState payload can reference large tensors
stored in a :class:BlobStore rather than inlining them, so the
JSON payload stays small when the state travels over the wire.
fastvideo.entrypoints.streaming.FragmentedMP4Chunk
dataclass
¶
A single fMP4 byte chunk emitted by :class:FragmentedMP4Encoder.
kind identifies whether the chunk is the init segment (must be
fed into the client's SourceBuffer first) or a media fragment.
fastvideo.entrypoints.streaming.FragmentedMP4Encoder
¶
FragmentedMP4Encoder(*, width: int, height: int, fps: int, segment_idx: int, stream_id: str | None = None, ffmpeg_path: str = 'ffmpeg', preset: str = 'ultrafast', pixel_format_out: str = 'yuv420p', extra_args: list[str] | None = None)
Stream RGB frames in, fMP4 chunks out.
One encoder covers one segment. The server creates a new encoder per :class:`ltx2_segment_start`` boundary so each segment becomes one media fragment the client can append independently.
Example::
encoder = FragmentedMP4Encoder(width=1024, height=576, fps=24,
segment_idx=0)
async with encoder:
async for chunk in encoder.encode(frames):
await websocket.send_bytes(chunk.data)
Source code in fastvideo/entrypoints/streaming/stream.py
Methods:¶
fastvideo.entrypoints.streaming.FragmentedMP4Encoder.encode
async
¶encode(frames: list[ndarray] | AsyncIterator[ndarray]) -> AsyncIterator[FragmentedMP4Chunk]
Feed frames into ffmpeg and yield fMP4 chunks as they appear.
Source code in fastvideo/entrypoints/streaming/stream.py
fastvideo.entrypoints.streaming.GpuPool
¶
Bases: ABC
Abstract GPU pool.
acquire binds a session to a worker and holds that binding
across segments so continuation state can stay hot. run submits
a single GenerationRequest for a bound session.
Acquire / release are independent of run — a session can run many segments on one acquired worker, and must release on disconnect.
fastvideo.entrypoints.streaming.InMemoryBlobStore
¶
Bases: BlobStore
Thread-safe in-memory :class:BlobStore for single-process servers.
No eviction policy — callers are responsible for calling
:meth:drop when a blob's owning state is replaced or a session
ends. A redis- or filesystem-backed :class:BlobStore should
replace this when the streaming server lands as a real service
(PR 7.5+).
Source code in fastvideo/entrypoints/streaming/session_store.py
fastvideo.entrypoints.streaming.InMemorySessionStore
¶
Bases: SessionStore
Thread-safe in-memory :class:SessionStore.
Default implementation used by single-process deployments; a future Redis-backed store can be dropped in without changes to the server.
No eviction / TTL / bounded capacity — sessions only leave via
:meth:drop. The live streaming server (PR 7.5+) is responsible
for bounding growth and for dropping any :class:BlobStore blobs
referenced by a state when that state is replaced or a session
ends; this class does not know about blobs.
Source code in fastvideo/entrypoints/streaming/session_store.py
fastvideo.entrypoints.streaming.InProcessGpuPool
¶
InProcessGpuPool(generator: _GeneratorLike, *, gpu_id: int = 0, session_store: SessionStore | None = None)
Bases: GpuPool
Single-process pool backed by one :class:_GeneratorLike.
This is what PR 7.5's server uses by default; PR 7.6 adds the real
SubprocessGpuPool alternative but keeps this one for tests and
small deployments.
Source code in fastvideo/entrypoints/streaming/gpu_pool.py
fastvideo.entrypoints.streaming.LLMProvider
¶
Bases: Protocol
Provider interface every LLM adapter implements.
Providers are async-first because every built-in implementation
talks to an HTTP API. Synchronous providers can wrap their call in
asyncio.to_thread internally.
fastvideo.entrypoints.streaming.MockGenerator
dataclass
¶
MockGenerator(sleep_ms: float = 0.0)
Generator stand-in that returns synthetic gradient frames.
Each call produces one segment worth of frames whose pixels vary by
a constant derived from the request seed and segment index. Latency
is configurable via sleep_ms so the caller can exercise slow-
generate scenarios without spinning a GPU.
fastvideo.entrypoints.streaming.PoolAcquireTimeout
¶
Bases: RuntimeError
Raised when acquire times out waiting for a free worker.
fastvideo.entrypoints.streaming.PromptEnhancer
¶
PromptEnhancer(*, providers: Sequence[LLMProvider], model: str, timeout_ms: int = 20000, temperature: float = 0.7, max_tokens: int | None = 256, system_prompt_dir: str | None = None)
Orchestrates prompt operations across a priority-ordered provider list with structured fallback + hot-reloadable system prompts.
Usage::
enhancer = PromptEnhancer(
providers=[CerebrasProvider(), GroqProvider()],
model="gpt-oss-120b",
system_prompt_dir="/etc/fastvideo/prompts",
)
response = await enhancer.enhance("a fox running through snow")
Source code in fastvideo/entrypoints/streaming/prompt/enhancer.py
Methods:¶
fastvideo.entrypoints.streaming.PromptEnhancer.register_provider
¶register_provider(provider: LLMProvider, *, priority: int = -1) -> None
Insert an additional provider. priority=0 makes it primary;
priority=-1 (default) appends as a fallback.
Source code in fastvideo/entrypoints/streaming/prompt/enhancer.py
fastvideo.entrypoints.streaming.PromptEnhancer.reload_system_prompts
¶Re-read the system prompt files from system_prompt_dir.
The streaming server exposes this via a management endpoint so operators can iterate on prompt templates without restarting workers.
Source code in fastvideo/entrypoints/streaming/prompt/enhancer.py
fastvideo.entrypoints.streaming.PromptSafetyFilter
¶
PromptSafetyFilter(*, classifier_path: str | None, enabled: bool = True, block_threshold: float = 0.5)
Minimal fastText-backed prompt safety filter.
Loads the classifier lazily on first use so the streaming server can construct the filter eagerly at startup without paying the model-load cost when safety is disabled.
Source code in fastvideo/entrypoints/streaming/prompt/safety.py
fastvideo.entrypoints.streaming.SafetyDecision
¶
Bases: Enum
fastvideo.entrypoints.streaming.Session
dataclass
¶
Session(id: str = (lambda: hex)(), state: SessionState = INITIALIZING, created_at: float = monotonic(), last_activity: float = monotonic(), client_id: str | None = None, preset: str | None = None, preset_label: str | None = None, curated_prompts: list[str] = list(), segment_idx: int = 0, enhancement_enabled: bool = False, auto_extension_enabled: bool = False, loop_generation_enabled: bool = False, single_clip_mode: bool = False, generation_paused: bool = False, stream_mode: str = 'av_fmp4', gpu_id: int | None = None, continuation_state: ContinuationState | None = None, metadata: dict[str, Any] = dict())
Methods:¶
fastvideo.entrypoints.streaming.Session.transition
¶transition(target: SessionState) -> None
Move to target if the edge is allowed.
Raises :class:InvalidSessionTransition on illegal moves. The
self-loop on ACTIVE is legal so the server can re-assert
ACTIVE on segment completion without special casing.
Source code in fastvideo/entrypoints/streaming/session.py
fastvideo.entrypoints.streaming.SessionLogEvent
dataclass
¶
One line in the session JSONL file.
fastvideo.entrypoints.streaming.SessionLogger
¶
SessionLogger(log_dir: str | None)
Append-only JSONL logger keyed by session id.
Thread-safe; the server may be writing from multiple asyncio tasks (fMP4 encoder thread + control-frame handler) for the same session.
Source code in fastvideo/entrypoints/streaming/session_logger.py
fastvideo.entrypoints.streaming.SessionManager
¶
Registers sessions and enforces per-server session limits.
Source code in fastvideo/entrypoints/streaming/session.py
Methods:¶
fastvideo.entrypoints.streaming.SessionManager.reap_timed_out
¶Return the ids of sessions that have exceeded the idle timeout.
The caller is responsible for actually closing them — this
method only identifies dead sessions so the server can emit
session_timeout frames before dropping the WebSocket.
TODO: unused until a background driver calls it. Per-connection idle enforcement currently happens via asyncio.wait_for on receive_json; this helper catches sessions stuck before any receive (e.g. future QUEUED state) and is expected to be wired into the GPU-pool reaper.
Source code in fastvideo/entrypoints/streaming/session.py
fastvideo.entrypoints.streaming.SessionState
¶
Bases: Enum
State-machine positions for a streaming session.
Transitions are server-owned. See
docs/design/server_contracts/streaming.md for the full diagram.
fastvideo.entrypoints.streaming.SessionStore
¶
Bases: ABC
Keyed store for per-session continuation state.
Implementations own the session-id → state mapping. The streaming
server calls :meth:store after each segment and :meth:snapshot
when a client explicitly asks for an exportable state handle.
Methods:¶
fastvideo.entrypoints.streaming.SessionStore.hydrate
abstractmethod
¶Install state as the starting point for a session.
When session_id is None the store allocates a fresh id
(UUID4); when provided the store uses it verbatim, overwriting
any prior state at that id.
Source code in fastvideo/entrypoints/streaming/session_store.py
fastvideo.entrypoints.streaming.SessionStore.snapshot
abstractmethod
¶snapshot(session_id: str) -> ContinuationState | None
fastvideo.entrypoints.streaming.SubprocessGpuPool
¶
SubprocessGpuPool(generator_config: GeneratorConfig, *, pool_config: GpuPoolConfig, warmup_config: WarmupConfig | None = None, session_store: SessionStore | None = None, worker_factory: WorkerFactory | None = None)
Bases: GpuPool
One multiprocessing.Process per GPU.
Each worker boots :class:fastvideo.VideoGenerator from a typed
:class:GeneratorConfig inside the child process (post-
CUDA_VISIBLE_DEVICES setup) and consumes jobs from an mp Queue.
This is the production shape: the parent process stays CPU-only, and
GPU state never crosses process boundaries. Continuation state is
serialized through :class:SessionStore for cross-GPU handoff.
PR 7.6 ships this as an opt-in; PR 7.5's in-process pool remains the default until nightly runs validate the subprocess path.
Source code in fastvideo/entrypoints/streaming/gpu_pool.py
Methods:¶
fastvideo.entrypoints.streaming.SubprocessGpuPool.start
async
¶Spawn worker processes and wait for each to report ready.
Source code in fastvideo/entrypoints/streaming/gpu_pool.py
Functions:¶
fastvideo.entrypoints.streaming.build_app
¶
build_app(serve_config: ServeConfig, generator: _GeneratorProto | None = None, *, pool: GpuPool | None = None, session_store: SessionStore | None = None) -> FastAPI
Build the FastAPI app used by :func:run_server.
Exposed so tests can drive the WebSocket endpoint in-process via
starlette.testclient.TestClient(app).websocket_connect(...).
Exactly one of generator (backed by :class:InProcessGpuPool)
or pool (for the subprocess-backed production shape) must be
given.
Source code in fastvideo/entrypoints/streaming/server.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
fastvideo.entrypoints.streaming.build_health_router
¶
Build a router exposing streaming liveness/readiness endpoints.
pool may be either a concrete pool or a zero-argument callable that
returns the current pool. The callable form lets product servers keep their
own lifespan-managed runtime singleton without adding public global state.
Source code in fastvideo/entrypoints/streaming/health.py
fastvideo.entrypoints.streaming.build_mock_app
¶
build_mock_app(*, sleep_ms: float = 0.0)
Build a FastAPI app backed by :class:MockGenerator.
Source code in fastvideo/entrypoints/streaming/mock_server.py
fastvideo.entrypoints.streaming.get_pool_status
async
¶
Return the generic GPU pool status payload used by /status.
Source code in fastvideo/entrypoints/streaming/health.py
fastvideo.entrypoints.streaming.run_server
¶
run_server(serve_config: ServeConfig, *, generator: _GeneratorProto | None = None) -> None
Launch the streaming server.
Boots a :class:fastvideo.VideoGenerator from
serve_config.generator unless generator is provided, then
serves build_app(...) via uvicorn.
Source code in fastvideo/entrypoints/streaming/server.py
Modules¶
fastvideo.entrypoints.streaming.gpu_pool
¶
GPU pool manager for the streaming server.
Replaces the single-generator path in PR 7.5 with a typed pool abstraction. Three implementations ship here:
- :class:
InProcessGpuPool— one in-processVideoGenerator; used by tests and single-GPU dev deployments. - :class:
SubprocessGpuPool— onemultiprocessing.Processper GPU, each running :func:worker_mainagainst aGeneratorConfig. Jobs are dispatched viamultiprocessing.Queue. - :class:
GpuPool(abstract) — the interface both use.
Session-to-GPU binding lives in the pool so continuation state stays
on the GPU that generated the previous segment (matching the internal
gpu_pool.py's per-GPU cache behavior). Cross-GPU handoff is
supported via :class:SessionStore snapshot + hydrate, which
serializes the state before the migration and rehydrates it on the
new worker.
Typed config: workers start from a :class:GeneratorConfig (no flat
LTX-2 kwargs), satisfying the PR 6 + PR 7 contracts that the public
surface doesn't reintroduce the legacy kwarg bag.
Classes¶
fastvideo.entrypoints.streaming.gpu_pool.GpuPool
¶
Bases: ABC
Abstract GPU pool.
acquire binds a session to a worker and holds that binding
across segments so continuation state can stay hot. run submits
a single GenerationRequest for a bound session.
Acquire / release are independent of run — a session can run many segments on one acquired worker, and must release on disconnect.
fastvideo.entrypoints.streaming.gpu_pool.InProcessGpuPool
¶InProcessGpuPool(generator: _GeneratorLike, *, gpu_id: int = 0, session_store: SessionStore | None = None)
Bases: GpuPool
Single-process pool backed by one :class:_GeneratorLike.
This is what PR 7.5's server uses by default; PR 7.6 adds the real
SubprocessGpuPool alternative but keeps this one for tests and
small deployments.
Source code in fastvideo/entrypoints/streaming/gpu_pool.py
fastvideo.entrypoints.streaming.gpu_pool.PoolAcquireTimeout
¶
Bases: RuntimeError
Raised when acquire times out waiting for a free worker.
fastvideo.entrypoints.streaming.gpu_pool.PoolAssignment
dataclass
¶The worker a session is currently bound to.
fastvideo.entrypoints.streaming.gpu_pool.SubprocessGpuPool
¶SubprocessGpuPool(generator_config: GeneratorConfig, *, pool_config: GpuPoolConfig, warmup_config: WarmupConfig | None = None, session_store: SessionStore | None = None, worker_factory: WorkerFactory | None = None)
Bases: GpuPool
One multiprocessing.Process per GPU.
Each worker boots :class:fastvideo.VideoGenerator from a typed
:class:GeneratorConfig inside the child process (post-
CUDA_VISIBLE_DEVICES setup) and consumes jobs from an mp Queue.
This is the production shape: the parent process stays CPU-only, and
GPU state never crosses process boundaries. Continuation state is
serialized through :class:SessionStore for cross-GPU handoff.
PR 7.6 ships this as an opt-in; PR 7.5's in-process pool remains the default until nightly runs validate the subprocess path.
Source code in fastvideo/entrypoints/streaming/gpu_pool.py
fastvideo.entrypoints.streaming.gpu_pool.SubprocessGpuPool.start
async
¶Spawn worker processes and wait for each to report ready.
Source code in fastvideo/entrypoints/streaming/gpu_pool.py
Functions:¶
fastvideo.entrypoints.streaming.gpu_pool.worker_main
¶worker_main(*, gpu_id: int, worker_id: str, generator_config: GeneratorConfig, warmup_config: WarmupConfig, job_queue: Queue, result_queue: Queue, shutdown_event: Any) -> None
Per-worker subprocess entry.
Runs inside the child spawned by SubprocessGpuPool. Blocking
VideoGenerator construction + generation happens here, not in
the parent's event loop.
Source code in fastvideo/entrypoints/streaming/worker.py
fastvideo.entrypoints.streaming.health
¶
Health, readiness, and status routes for streaming servers.
Functions:¶
fastvideo.entrypoints.streaming.health.build_health_router
¶Build a router exposing streaming liveness/readiness endpoints.
pool may be either a concrete pool or a zero-argument callable that
returns the current pool. The callable form lets product servers keep their
own lifespan-managed runtime singleton without adding public global state.
Source code in fastvideo/entrypoints/streaming/health.py
fastvideo.entrypoints.streaming.health.get_pool_status
async
¶Return the generic GPU pool status payload used by /status.
Source code in fastvideo/entrypoints/streaming/health.py
fastvideo.entrypoints.streaming.mock_server
¶
Mock streaming server — a frontend dev aid.
Boots the same FastAPI app the real streaming server uses, but backs
it with :class:InProcessGpuPool wrapping a synthetic generator that
emits pre-baked RGB frames. No GPU or model weights required.
Use cases:
- Frontend development without a real model loaded.
- Integration tests that exercise the WS protocol end-to-end.
- Reproducing protocol bugs locally.
Launch: python -m fastvideo.entrypoints.streaming.mock_server.
Classes¶
fastvideo.entrypoints.streaming.mock_server.MockGenerator
dataclass
¶MockGenerator(sleep_ms: float = 0.0)
Generator stand-in that returns synthetic gradient frames.
Each call produces one segment worth of frames whose pixels vary by
a constant derived from the request seed and segment index. Latency
is configurable via sleep_ms so the caller can exercise slow-
generate scenarios without spinning a GPU.
Functions:¶
fastvideo.entrypoints.streaming.mock_server.build_mock_app
¶build_mock_app(*, sleep_ms: float = 0.0)
Build a FastAPI app backed by :class:MockGenerator.
Source code in fastvideo/entrypoints/streaming/mock_server.py
fastvideo.entrypoints.streaming.prompt
¶
Prompt pipeline for the streaming server.
- :mod:
providers— LLM backend abstraction + built-in adapters - :mod:
enhancer— provider-agnostic enhance / auto-extend / rewrite operations on top of the provider layer
All of this is optional; the streaming server runs fine without it
(PR 7.5's skeleton never invokes the enhancer). When the operator
enables ServeConfig.streaming.prompt.enabled, the server routes
each session_init_v2 curated prompt through enhance before the
first segment.
Classes¶
fastvideo.entrypoints.streaming.prompt.LLMProvider
¶
Bases: Protocol
Provider interface every LLM adapter implements.
Providers are async-first because every built-in implementation
talks to an HTTP API. Synchronous providers can wrap their call in
asyncio.to_thread internally.
fastvideo.entrypoints.streaming.prompt.LLMProviderError
¶
Bases: RuntimeError
Raised when an LLM provider fails a request.
retryable controls whether the enhancer falls back to the next
provider. It is settable per-instance so the same exception type
can describe retryable transport errors (5xx, 429) and
non-retryable client errors (4xx auth/bad-request) without forcing
a separate subclass for every status family.
Source code in fastvideo/entrypoints/streaming/prompt/providers/base.py
fastvideo.entrypoints.streaming.prompt.LLMTimeoutError
¶LLMTimeoutError(message: str)
Bases: LLMProviderError
Raised when an LLM provider times out — always retryable.
Source code in fastvideo/entrypoints/streaming/prompt/providers/base.py
fastvideo.entrypoints.streaming.prompt.PromptEnhancer
¶PromptEnhancer(*, providers: Sequence[LLMProvider], model: str, timeout_ms: int = 20000, temperature: float = 0.7, max_tokens: int | None = 256, system_prompt_dir: str | None = None)
Orchestrates prompt operations across a priority-ordered provider list with structured fallback + hot-reloadable system prompts.
Usage::
enhancer = PromptEnhancer(
providers=[CerebrasProvider(), GroqProvider()],
model="gpt-oss-120b",
system_prompt_dir="/etc/fastvideo/prompts",
)
response = await enhancer.enhance("a fox running through snow")
Source code in fastvideo/entrypoints/streaming/prompt/enhancer.py
fastvideo.entrypoints.streaming.prompt.PromptEnhancer.register_provider
¶register_provider(provider: LLMProvider, *, priority: int = -1) -> None
Insert an additional provider. priority=0 makes it primary;
priority=-1 (default) appends as a fallback.
Source code in fastvideo/entrypoints/streaming/prompt/enhancer.py
fastvideo.entrypoints.streaming.prompt.PromptEnhancer.reload_system_prompts
¶Re-read the system prompt files from system_prompt_dir.
The streaming server exposes this via a management endpoint so operators can iterate on prompt templates without restarting workers.
Source code in fastvideo/entrypoints/streaming/prompt/enhancer.py
Modules¶
fastvideo.entrypoints.streaming.prompt.enhancer
¶Provider-agnostic prompt orchestration for the streaming server.
Three operations the streaming server needs:
enhance— polish a user prompt (add cinematic detail, fix syntax)auto_extend— generate a follow-on prompt for loop generationrewrite— rewrite a seed prompt for a user-directed rewrite flow
All three share the same orchestration: pick a provider in priority
order, submit an LLMRequest, fall back to the next provider on
retryable errors, and surface a structured :class:LLMResponse back
to the caller.
System prompts are loaded from system_prompt_dir on construction
and can be hot-reloaded via :meth:PromptEnhancer.reload_system_prompts.
The streaming server's management endpoint calls that method in
response to a rewrite_seed_prompts_started frame.
fastvideo.entrypoints.streaming.prompt.enhancer.PromptEnhancer
¶PromptEnhancer(*, providers: Sequence[LLMProvider], model: str, timeout_ms: int = 20000, temperature: float = 0.7, max_tokens: int | None = 256, system_prompt_dir: str | None = None)
Orchestrates prompt operations across a priority-ordered provider list with structured fallback + hot-reloadable system prompts.
Usage::
enhancer = PromptEnhancer(
providers=[CerebrasProvider(), GroqProvider()],
model="gpt-oss-120b",
system_prompt_dir="/etc/fastvideo/prompts",
)
response = await enhancer.enhance("a fox running through snow")
Source code in fastvideo/entrypoints/streaming/prompt/enhancer.py
fastvideo.entrypoints.streaming.prompt.enhancer.PromptEnhancer.register_provider
¶register_provider(provider: LLMProvider, *, priority: int = -1) -> None
Insert an additional provider. priority=0 makes it primary;
priority=-1 (default) appends as a fallback.
Source code in fastvideo/entrypoints/streaming/prompt/enhancer.py
fastvideo.entrypoints.streaming.prompt.enhancer.PromptEnhancer.reload_system_prompts
¶Re-read the system prompt files from system_prompt_dir.
The streaming server exposes this via a management endpoint so operators can iterate on prompt templates without restarting workers.
Source code in fastvideo/entrypoints/streaming/prompt/enhancer.py
fastvideo.entrypoints.streaming.prompt.providers
¶LLM provider implementations used by the prompt enhancer.
fastvideo.entrypoints.streaming.prompt.providers.CerebrasProvider
dataclass
¶CerebrasProvider(api_key: str | None = None, base_url: str = _DEFAULT_BASE_URL, name: str = 'cerebras')
Cerebras inference adapter.
api_key falls back to CEREBRAS_API_KEY when unset.
fastvideo.entrypoints.streaming.prompt.providers.GroqProvider
dataclass
¶Groq inference adapter.
Identical wire format to :class:CerebrasProvider; both go through
:func:complete_openai_compatible. The two providers differ only
in base URL, env var, and model id conventions.
fastvideo.entrypoints.streaming.prompt.providers.LLMProvider
¶
Bases: Protocol
Provider interface every LLM adapter implements.
Providers are async-first because every built-in implementation
talks to an HTTP API. Synchronous providers can wrap their call in
asyncio.to_thread internally.
fastvideo.entrypoints.streaming.prompt.providers.LLMProviderError
¶
Bases: RuntimeError
Raised when an LLM provider fails a request.
retryable controls whether the enhancer falls back to the next
provider. It is settable per-instance so the same exception type
can describe retryable transport errors (5xx, 429) and
non-retryable client errors (4xx auth/bad-request) without forcing
a separate subclass for every status family.
Source code in fastvideo/entrypoints/streaming/prompt/providers/base.py
fastvideo.entrypoints.streaming.prompt.providers.LLMTimeoutError
¶LLMTimeoutError(message: str)
Bases: LLMProviderError
Raised when an LLM provider times out — always retryable.
Source code in fastvideo/entrypoints/streaming/prompt/providers/base.py
fastvideo.entrypoints.streaming.prompt.providers.base
¶LLM provider protocol + DTOs used by the prompt enhancer.
Third-party users add a new provider by implementing
:class:LLMProvider and registering it with a prompt enhancer
instance. The shipped providers live in sibling modules
(cerebras.py, groq.py) and each is ~100-200 LOC — the
provider layer is intentionally thin so the enhancer stays
provider-agnostic.
fastvideo.entrypoints.streaming.prompt.providers.base.LLMProvider
¶
Bases: Protocol
Provider interface every LLM adapter implements.
Providers are async-first because every built-in implementation
talks to an HTTP API. Synchronous providers can wrap their call in
asyncio.to_thread internally.
fastvideo.entrypoints.streaming.prompt.providers.base.LLMProviderError
¶
Bases: RuntimeError
Raised when an LLM provider fails a request.
retryable controls whether the enhancer falls back to the next
provider. It is settable per-instance so the same exception type
can describe retryable transport errors (5xx, 429) and
non-retryable client errors (4xx auth/bad-request) without forcing
a separate subclass for every status family.
Source code in fastvideo/entrypoints/streaming/prompt/providers/base.py
fastvideo.entrypoints.streaming.prompt.providers.base.LLMTimeoutError
¶LLMTimeoutError(message: str)
Bases: LLMProviderError
Raised when an LLM provider times out — always retryable.
Source code in fastvideo/entrypoints/streaming/prompt/providers/base.py
fastvideo.entrypoints.streaming.prompt.providers.cerebras
¶Cerebras LLM provider (OpenAI-compatible chat endpoint).
fastvideo.entrypoints.streaming.prompt.providers.groq
¶Groq LLM provider (OpenAI-compatible chat endpoint).
fastvideo.entrypoints.streaming.prompt.providers.groq.GroqProvider
dataclass
¶Groq inference adapter.
Identical wire format to :class:CerebrasProvider; both go through
:func:complete_openai_compatible. The two providers differ only
in base URL, env var, and model id conventions.
fastvideo.entrypoints.streaming.prompt.rewrite
¶Rewrite payload builder.
The UI's "rewrite seed prompts" flow asks the enhancer to produce a
batch of alternative prompts given one seed. This module packages the
seed + options into the payload the enhancer expects and unpacks the
response back into a typed :class:RewriteResult.
Separating this from :mod:enhancer keeps the enhancer provider-
agnostic; anything UI-specific (how many alternatives to request, how
to split the response, temperature) lives here.
fastvideo.entrypoints.streaming.prompt.rewrite.RewriteOptions
dataclass
¶ fastvideo.entrypoints.streaming.prompt.rewrite.build_rewrite
async
¶build_rewrite(enhancer: PromptEnhancer, seed_prompt: str, *, options: RewriteOptions | None = None) -> RewriteResult
Run a rewrite op through the enhancer and return a typed result.
Source code in fastvideo/entrypoints/streaming/prompt/rewrite.py
fastvideo.entrypoints.streaming.prompt.safety
¶Optional prompt safety filter.
Uses a fastText classifier to score prompts against a banned-content
rubric. Only loaded when ServeConfig.streaming.safety.enabled is
True and fastText is installed — users who don't need it see no
runtime cost.
Install: pip install fastvideo[prompt-safety] (ships fasttext as an
optional extra) or install fasttext directly.
fastvideo.entrypoints.streaming.prompt.safety.PromptSafetyFilter
¶PromptSafetyFilter(*, classifier_path: str | None, enabled: bool = True, block_threshold: float = 0.5)
Minimal fastText-backed prompt safety filter.
Loads the classifier lazily on first use so the streaming server can construct the filter eagerly at startup without paying the model-load cost when safety is disabled.
Source code in fastvideo/entrypoints/streaming/prompt/safety.py
fastvideo.entrypoints.streaming.prompt.safety.SafetyDecision
¶
Bases: Enum
fastvideo.entrypoints.streaming.prompt.safety.SafetyDecision.UNAVAILABLE
class-attribute
instance-attribute
¶Returned when the classifier can't run (not configured, fastText
missing). Safety is opt-in; the server treats UNAVAILABLE as
ALLOW but logs it so operators know the filter is off.
fastvideo.entrypoints.streaming.prompt.safety.first_blocked
¶first_blocked(filter_: PromptSafetyFilter, prompts: list[str]) -> SafetyResult | None
Return the first prompt the filter blocks, or None.
Source code in fastvideo/entrypoints/streaming/prompt/safety.py
fastvideo.entrypoints.streaming.protocol
¶
JSON WebSocket protocol schemas for the streaming server.
Every control message shares the envelope {"type": <str>, ...}.
Pydantic models live here so the server can parse / validate incoming
frames and emit well-typed outgoing frames without hand-rolled dicts.
The message catalogue matches the contract in
docs/design/server_contracts/streaming.md; additions must land in
both places in the same PR.
Classes¶
fastvideo.entrypoints.streaming.protocol.ContinuationStateSnapshot
¶ fastvideo.entrypoints.streaming.protocol.MediaInit
¶
Bases: BaseModel
Descriptor for the fMP4 initialization segment that follows.
fastvideo.entrypoints.streaming.protocol.SegmentPromptSource
¶
Bases: BaseModel
Request a new segment using a specific prompt.
fastvideo.entrypoints.streaming.protocol.SessionInitV2
¶
Bases: BaseModel
Opening frame the client sends after the WebSocket handshake.
fastvideo.entrypoints.streaming.protocol.SnapshotState
¶
Bases: BaseModel
Request the current ContinuationState for export.
Functions:¶
fastvideo.entrypoints.streaming.protocol.parse_client_message
¶Parse an incoming WebSocket dict into a typed client message.
Unknown type values raise :class:pydantic.ValidationError; the
server handler turns that into an error frame with
code="invalid_message".
Source code in fastvideo/entrypoints/streaming/protocol.py
fastvideo.entrypoints.streaming.router
¶
Multi-replica load balancer + WebSocket proxy for the streaming server.
Sits in front of one-or-more streaming-server replicas and forwards
WebSocket sessions to a healthy primary, with failover to secondaries.
Kept in-repo under fastvideo/entrypoints/streaming/router/ per the
PR plan's default; the alternative (separate package) is an open
question deferred to review.
Classes¶
fastvideo.entrypoints.streaming.router.ReplicaRegistry
¶ReplicaRegistry(replicas: list[ReplicaEndpoint])
Stateful map of replica URL → :class:Replica.
Selection favors primary replicas when healthy; otherwise the first
healthy non-primary is returned. When none are healthy, the
registry returns None so the router can reject incoming
sessions with gpu_unavailable.
Source code in fastvideo/entrypoints/streaming/router/registry.py
fastvideo.entrypoints.streaming.router.ReplicaRegistry.select
¶Pick the best healthy replica.
Priority order:
- The first healthy primary (insertion order).
- The first healthy non-primary (insertion order).
Nonewhen nothing is healthy.
This MVP picks the first match within each tier; it does NOT load-balance across multiple healthy replicas of the same tier. Round-robin and weighted distribution are deferred until a real N-way active deployment exists.
Source code in fastvideo/entrypoints/streaming/router/registry.py
fastvideo.entrypoints.streaming.router.RouterConfig
dataclass
¶RouterConfig(host: str = '0.0.0.0', port: int = 9000, replicas: list[ReplicaEndpoint] = list(), health_check_path: str = '/health', health_check_interval_seconds: float = 5.0, health_check_timeout_seconds: float = 2.0, failure_threshold: int = 3, recovery_threshold: int = 2)
Typed router config loaded from a YAML file.
Example::
router:
host: 0.0.0.0
port: 9000
replicas:
- url: http://streamer-a:8000
primary: true
- url: http://streamer-b:8000
health_check:
path: /health
interval_seconds: 5
failure_threshold: 3
Validation runs in __post_init__: empty replicas, non-positive
intervals/timeouts, thresholds < 1, non-http(s) URLs, and more than
one primary all raise ValueError so misconfigurations surface at
load time rather than as confusing runtime failures.
Functions:¶
fastvideo.entrypoints.streaming.router.build_router_app
¶build_router_app(config: RouterConfig, *, registry: ReplicaRegistry | None = None) -> FastAPI
Build the router FastAPI app.
registry can be injected for tests; defaults to one built from
config.replicas.
Source code in fastvideo/entrypoints/streaming/router/main.py
Modules¶
fastvideo.entrypoints.streaming.router.config
¶Typed router configuration.
fastvideo.entrypoints.streaming.router.config.ReplicaEndpoint
dataclass
¶ fastvideo.entrypoints.streaming.router.config.RouterConfig
dataclass
¶RouterConfig(host: str = '0.0.0.0', port: int = 9000, replicas: list[ReplicaEndpoint] = list(), health_check_path: str = '/health', health_check_interval_seconds: float = 5.0, health_check_timeout_seconds: float = 2.0, failure_threshold: int = 3, recovery_threshold: int = 2)
Typed router config loaded from a YAML file.
Example::
router:
host: 0.0.0.0
port: 9000
replicas:
- url: http://streamer-a:8000
primary: true
- url: http://streamer-b:8000
health_check:
path: /health
interval_seconds: 5
failure_threshold: 3
Validation runs in __post_init__: empty replicas, non-positive
intervals/timeouts, thresholds < 1, non-http(s) URLs, and more than
one primary all raise ValueError so misconfigurations surface at
load time rather than as confusing runtime failures.
fastvideo.entrypoints.streaming.router.main
¶Router FastAPI entry point.
Exposes the same /v1/stream WebSocket path the backend servers do,
accepts a client, picks a healthy replica from the registry, and
proxies frames bidirectionally.
PR 7.9 ships the minimum-viable shape: explicit replica list, single
primary, JSON + binary passthrough in both directions, and a
/status endpoint for operators. Sticky-session routing (so a
reconnect lands on the same backend) is left for a follow-up.
fastvideo.entrypoints.streaming.router.main.build_router_app
¶build_router_app(config: RouterConfig, *, registry: ReplicaRegistry | None = None) -> FastAPI
Build the router FastAPI app.
registry can be injected for tests; defaults to one built from
config.replicas.
Source code in fastvideo/entrypoints/streaming/router/main.py
fastvideo.entrypoints.streaming.router.registry
¶Replica registry + health-check loop.
The registry tracks the set of known backend replicas and their live health. The router consults it for "pick a backend for this session" decisions and a background task updates it from periodic HTTP probes.
State machine per replica::
HEALTHY ──(N consecutive failures)──▶ UNHEALTHY
▲ │
└──────(M consecutive successes)──────┘
Where N = :attr:RouterConfig.failure_threshold and
M = :attr:RouterConfig.recovery_threshold.
fastvideo.entrypoints.streaming.router.registry.HttpProbe
module-attribute
¶HttpProbe = Any
Structural alias for health-probe callables. Concrete signature is
async def __call__(url: str, *, timeout: float) -> tuple[float,
str | None]; typing.Callable cannot express keyword-only parameters,
so duck-typing is the pragmatic compromise.
fastvideo.entrypoints.streaming.router.registry.ReplicaRegistry
¶ReplicaRegistry(replicas: list[ReplicaEndpoint])
Stateful map of replica URL → :class:Replica.
Selection favors primary replicas when healthy; otherwise the first
healthy non-primary is returned. When none are healthy, the
registry returns None so the router can reject incoming
sessions with gpu_unavailable.
Source code in fastvideo/entrypoints/streaming/router/registry.py
fastvideo.entrypoints.streaming.router.registry.ReplicaRegistry.select
¶Pick the best healthy replica.
Priority order:
- The first healthy primary (insertion order).
- The first healthy non-primary (insertion order).
Nonewhen nothing is healthy.
This MVP picks the first match within each tier; it does NOT load-balance across multiple healthy replicas of the same tier. Round-robin and weighted distribution are deferred until a real N-way active deployment exists.
Source code in fastvideo/entrypoints/streaming/router/registry.py
fastvideo.entrypoints.streaming.router.registry.run_health_check_loop
async
¶run_health_check_loop(registry: ReplicaRegistry, config: RouterConfig, *, stop_event: Event, http_get: HttpProbe | None = None) -> None
Poll all replicas' health endpoints in parallel on a fixed interval.
http_get is pluggable so unit tests can inject a deterministic
probe without hitting the network. The default builds a single
httpx.AsyncClient shared across the loop's lifetime so the
common case (steady polling against a stable replica set) reuses
TCP/TLS connections instead of paying handshake cost per probe.
Probes within one polling cycle run concurrently via asyncio.gather
so a slow replica doesn't push the cycle past
health_check_interval_seconds.
Source code in fastvideo/entrypoints/streaming/router/registry.py
fastvideo.entrypoints.streaming.server
¶
Single-generator FastAPI + WebSocket streaming server.
Classes¶
Functions:¶
fastvideo.entrypoints.streaming.server.build_app
¶build_app(serve_config: ServeConfig, generator: _GeneratorProto | None = None, *, pool: GpuPool | None = None, session_store: SessionStore | None = None) -> FastAPI
Build the FastAPI app used by :func:run_server.
Exposed so tests can drive the WebSocket endpoint in-process via
starlette.testclient.TestClient(app).websocket_connect(...).
Exactly one of generator (backed by :class:InProcessGpuPool)
or pool (for the subprocess-backed production shape) must be
given.
Source code in fastvideo/entrypoints/streaming/server.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
fastvideo.entrypoints.streaming.server.run_server
¶run_server(serve_config: ServeConfig, *, generator: _GeneratorProto | None = None) -> None
Launch the streaming server.
Boots a :class:fastvideo.VideoGenerator from
serve_config.generator unless generator is provided, then
serves build_app(...) via uvicorn.
Source code in fastvideo/entrypoints/streaming/server.py
fastvideo.entrypoints.streaming.session
¶
Per-connection session lifecycle for the streaming server.
Each WebSocket opens exactly one :class:Session. :class:SessionManager
enforces the generation_segment_cap and session_timeout_seconds
budgets from :class:fastvideo.api.StreamingConfig.
Classes¶
fastvideo.entrypoints.streaming.session.InvalidSessionTransition
¶
Bases: RuntimeError
Raised when a session is asked to transition along an illegal edge.
fastvideo.entrypoints.streaming.session.Session
dataclass
¶Session(id: str = (lambda: hex)(), state: SessionState = INITIALIZING, created_at: float = monotonic(), last_activity: float = monotonic(), client_id: str | None = None, preset: str | None = None, preset_label: str | None = None, curated_prompts: list[str] = list(), segment_idx: int = 0, enhancement_enabled: bool = False, auto_extension_enabled: bool = False, loop_generation_enabled: bool = False, single_clip_mode: bool = False, generation_paused: bool = False, stream_mode: str = 'av_fmp4', gpu_id: int | None = None, continuation_state: ContinuationState | None = None, metadata: dict[str, Any] = dict())
fastvideo.entrypoints.streaming.session.Session.transition
¶transition(target: SessionState) -> None
Move to target if the edge is allowed.
Raises :class:InvalidSessionTransition on illegal moves. The
self-loop on ACTIVE is legal so the server can re-assert
ACTIVE on segment completion without special casing.
Source code in fastvideo/entrypoints/streaming/session.py
fastvideo.entrypoints.streaming.session.SessionManager
¶Registers sessions and enforces per-server session limits.
Source code in fastvideo/entrypoints/streaming/session.py
fastvideo.entrypoints.streaming.session.SessionManager.reap_timed_out
¶Return the ids of sessions that have exceeded the idle timeout.
The caller is responsible for actually closing them — this
method only identifies dead sessions so the server can emit
session_timeout frames before dropping the WebSocket.
TODO: unused until a background driver calls it. Per-connection idle enforcement currently happens via asyncio.wait_for on receive_json; this helper catches sessions stuck before any receive (e.g. future QUEUED state) and is expected to be wired into the GPU-pool reaper.
Source code in fastvideo/entrypoints/streaming/session.py
fastvideo.entrypoints.streaming.session.SessionRejected
¶
Bases: RuntimeError
Raised when session creation fails (queue full, auth, etc.).
fastvideo.entrypoints.streaming.session_init_image
¶
Persist the initial-image blob attached to a streaming session.
Classes¶
fastvideo.entrypoints.streaming.session_init_image.SessionInitImage
dataclass
¶Location of the persisted init image.
Callers pass path to InputConfig.image_path; display_name
is only used for logs.
Functions:¶
fastvideo.entrypoints.streaming.session_init_image.persist_session_init_image
¶persist_session_init_image(payload: Any, *, output_dir: str | None = None) -> SessionInitImage | None
Decode a client init-image blob and persist it to disk.
payload shape (matches the internal UI protocol)::
{
"mime": "image/png",
"name": "ref.png",
"data": "<base64 bytes>",
}
Returns None when payload is falsy (no init image). Raises
:class:ValueError on schema / size / decode errors so the caller
can surface a user-facing error frame.
Source code in fastvideo/entrypoints/streaming/session_init_image.py
fastvideo.entrypoints.streaming.session_logger
¶
Per-session JSONL event logger.
Each session gets its own JSONL file under the configured log root so post-hoc analytics (enhancer latency, GPU assignment, segment timings) can be recovered without a tracing backend. The internal UI uses this format; keeping the same shape makes log tooling portable.
Classes¶
fastvideo.entrypoints.streaming.session_logger.SessionLogEvent
dataclass
¶One line in the session JSONL file.
fastvideo.entrypoints.streaming.session_logger.SessionLogger
¶SessionLogger(log_dir: str | None)
Append-only JSONL logger keyed by session id.
Thread-safe; the server may be writing from multiple asyncio tasks (fMP4 encoder thread + control-frame handler) for the same session.
Source code in fastvideo/entrypoints/streaming/session_logger.py
fastvideo.entrypoints.streaming.session_store
¶
Session state store for the FastVideo streaming server.
The streaming server keeps continuation state (decoded frames + audio latents from the previous segment) server-side so the client doesn't re-upload multi-megabyte tensors each WebSocket message. Two operations are needed:
snapshot(session_id) -> ContinuationState— serialize the current state so it can be exported (e.g. over HTTP) or migrated to a different server.hydrate(state) -> session_id— load a previously serialized state into a new session (for resume-after-disconnect flows).
The store is an ABC with an :class:InMemorySessionStore default; Redis
or other backends can drop in without touching the pipeline.
Large tensor payloads (video frames, audio latents) are kept out of the
JSON payload via an accompanying :class:BlobStore. Both stores share a
process today; they are separate types so that a future implementation
can put blobs on S3 while keeping session metadata in Redis.
Classes¶
fastvideo.entrypoints.streaming.session_store.BlobStore
¶
Bases: ABC
Opaque byte-blob storage keyed by id.
A :class:ContinuationState payload can reference large tensors
stored in a :class:BlobStore rather than inlining them, so the
JSON payload stays small when the state travels over the wire.
fastvideo.entrypoints.streaming.session_store.InMemoryBlobStore
¶
Bases: BlobStore
Thread-safe in-memory :class:BlobStore for single-process servers.
No eviction policy — callers are responsible for calling
:meth:drop when a blob's owning state is replaced or a session
ends. A redis- or filesystem-backed :class:BlobStore should
replace this when the streaming server lands as a real service
(PR 7.5+).
Source code in fastvideo/entrypoints/streaming/session_store.py
fastvideo.entrypoints.streaming.session_store.InMemorySessionStore
¶
Bases: SessionStore
Thread-safe in-memory :class:SessionStore.
Default implementation used by single-process deployments; a future Redis-backed store can be dropped in without changes to the server.
No eviction / TTL / bounded capacity — sessions only leave via
:meth:drop. The live streaming server (PR 7.5+) is responsible
for bounding growth and for dropping any :class:BlobStore blobs
referenced by a state when that state is replaced or a session
ends; this class does not know about blobs.
Source code in fastvideo/entrypoints/streaming/session_store.py
fastvideo.entrypoints.streaming.session_store.SessionStore
¶
Bases: ABC
Keyed store for per-session continuation state.
Implementations own the session-id → state mapping. The streaming
server calls :meth:store after each segment and :meth:snapshot
when a client explicitly asks for an exportable state handle.
fastvideo.entrypoints.streaming.session_store.SessionStore.drop
abstractmethod
¶drop(session_id: str) -> None
fastvideo.entrypoints.streaming.session_store.SessionStore.hydrate
abstractmethod
¶Install state as the starting point for a session.
When session_id is None the store allocates a fresh id
(UUID4); when provided the store uses it verbatim, overwriting
any prior state at that id.
Source code in fastvideo/entrypoints/streaming/session_store.py
fastvideo.entrypoints.streaming.session_store.SessionStore.snapshot
abstractmethod
¶snapshot(session_id: str) -> ContinuationState | None
fastvideo.entrypoints.streaming.stream
¶
fMP4 stream encoder used by the streaming server.
The client's Media Source Extensions player needs a continuous fMP4
byte stream: first an initialization segment (ftyp + moov),
then one or more media segments (moof + mdat). We pipe raw
RGB frames into an ffmpeg subprocess configured for fragmented output
via -movflags empty_moov+default_base_moof+frag_keyframe+faststart
and stream the bytes back out.
Classes¶
fastvideo.entrypoints.streaming.stream.FragmentedMP4Chunk
dataclass
¶A single fMP4 byte chunk emitted by :class:FragmentedMP4Encoder.
kind identifies whether the chunk is the init segment (must be
fed into the client's SourceBuffer first) or a media fragment.
fastvideo.entrypoints.streaming.stream.FragmentedMP4Encoder
¶FragmentedMP4Encoder(*, width: int, height: int, fps: int, segment_idx: int, stream_id: str | None = None, ffmpeg_path: str = 'ffmpeg', preset: str = 'ultrafast', pixel_format_out: str = 'yuv420p', extra_args: list[str] | None = None)
Stream RGB frames in, fMP4 chunks out.
One encoder covers one segment. The server creates a new encoder per :class:`ltx2_segment_start`` boundary so each segment becomes one media fragment the client can append independently.
Example::
encoder = FragmentedMP4Encoder(width=1024, height=576, fps=24,
segment_idx=0)
async with encoder:
async for chunk in encoder.encode(frames):
await websocket.send_bytes(chunk.data)
Source code in fastvideo/entrypoints/streaming/stream.py
fastvideo.entrypoints.streaming.stream.FragmentedMP4Encoder.encode
async
¶encode(frames: list[ndarray] | AsyncIterator[ndarray]) -> AsyncIterator[FragmentedMP4Chunk]
Feed frames into ffmpeg and yield fMP4 chunks as they appear.
Source code in fastvideo/entrypoints/streaming/stream.py
fastvideo.entrypoints.streaming.worker
¶
Per-GPU worker subprocess entry for :class:SubprocessGpuPool.
The pool manages binding, lifecycle, and message dispatch in the parent
process. The worker constructs its :class:VideoGenerator from a typed
:class:GeneratorConfig, runs the two-segment warmup so both
initial-segment and continuation-branch compile graphs are hot, and
then loops on the job queue.
Functions:¶
fastvideo.entrypoints.streaming.worker.worker_main
¶worker_main(*, gpu_id: int, worker_id: str, generator_config: GeneratorConfig, warmup_config: WarmupConfig, job_queue: Queue, result_queue: Queue, shutdown_event: Any) -> None
Per-worker subprocess entry.
Runs inside the child spawned by SubprocessGpuPool. Blocking
VideoGenerator construction + generation happens here, not in
the parent's event loop.
Source code in fastvideo/entrypoints/streaming/worker.py
fastvideo.entrypoints.streaming_generator
¶
Classes¶
fastvideo.entrypoints.streaming_generator.StreamingVideoGenerator
¶
StreamingVideoGenerator(fastvideo_args: FastVideoArgs, executor_class: type[Executor], log_stats: bool, use_queue_mode: bool = True)
Bases: VideoGenerator
This class extends VideoGenerator with streaming capabilities, allowing incremental video generation with step-by-step control.
Source code in fastvideo/entrypoints/streaming_generator.py
Functions:¶
fastvideo.entrypoints.video_generator
¶
VideoGenerator module for FastVideo.
This module provides a consolidated interface for generating videos using diffusion models.
Classes¶
fastvideo.entrypoints.video_generator.VideoGenerator
¶
VideoGenerator(fastvideo_args: FastVideoArgs, executor_class: type[Executor], log_stats: bool, *, log_queue=None)
A unified class for generating videos using diffusion models.
This class provides a simple interface for video generation with rich customization options, similar to popular frameworks like HF Diffusers.
Initialize the video generator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fastvideo_args
|
FastVideoArgs
|
The inference arguments |
required |
executor_class
|
type[Executor]
|
The executor class to use for inference |
required |
log_stats
|
bool
|
Whether to log statistics |
required |
log_queue
|
Optional multiprocessing.Queue to forward worker logs to |
None
|
Source code in fastvideo/entrypoints/video_generator.py
Methods:¶
fastvideo.entrypoints.video_generator.VideoGenerator.default_health_check_request
staticmethod
¶Return the minimal typed request Dynamo uses for probes.
256x256, 8 frames, 1 inference step -- fast enough to be a
viable liveness check, non-trivial enough to exercise the
DiT -> VAE -> decode path. Consumers adapt this shape to their
transport's health-check payload (see
docs/design/server_contracts/dynamo.md).
Source code in fastvideo/entrypoints/video_generator.py
fastvideo.entrypoints.video_generator.VideoGenerator.from_fastvideo_args
classmethod
¶from_fastvideo_args(fastvideo_args: FastVideoArgs, *, log_queue=None) -> VideoGenerator
Create a video generator with the specified arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fastvideo_args
|
FastVideoArgs
|
The inference arguments |
required |
log_queue
|
Optional multiprocessing.Queue to forward worker logs to |
None
|
Returns:
| Type | Description |
|---|---|
VideoGenerator
|
The created video generator |
Source code in fastvideo/entrypoints/video_generator.py
fastvideo.entrypoints.video_generator.VideoGenerator.from_pretrained
classmethod
¶from_pretrained(model_path: str | GeneratorConfig | Mapping[str, Any] | None = None, **kwargs) -> VideoGenerator
Create a video generator from a pretrained model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_path
|
str | GeneratorConfig | Mapping[str, Any] | None
|
Path or identifier for the pretrained model |
None
|
pipeline_config
|
Pipeline config to use for inference |
required | |
**kwargs
|
Additional arguments to customize model loading, set any FastVideoArgs or PipelineConfig attributes here. |
{}
|
Returns:
| Type | Description |
|---|---|
VideoGenerator
|
The created video generator |
Priority level: Default pipeline config < User's pipeline config < User's kwargs
Stable convenience kwargs remain supported here for common engine and offload settings. Advanced model- or pipeline-specific options should move to VideoGenerator.from_config(...).
Source code in fastvideo/entrypoints/video_generator.py
fastvideo.entrypoints.video_generator.VideoGenerator.generate
¶generate(request: GenerationRequest | Mapping[str, Any], *, log_queue=None) -> GenerationResult | list[GenerationResult]
Generate video or image outputs from a typed inference request.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request
|
GenerationRequest | Mapping[str, Any]
|
A |
required |
log_queue
|
Optional multiprocessing.Queue to forward worker logs to during this request. |
None
|
Returns:
| Type | Description |
|---|---|
GenerationResult | list[GenerationResult]
|
A |
GenerationResult | list[GenerationResult]
|
|
GenerationResult | list[GenerationResult]
|
prompts. |
Source code in fastvideo/entrypoints/video_generator.py
fastvideo.entrypoints.video_generator.VideoGenerator.generate_async
async
¶Async generation that yields typed :class:VideoEvents.
Three consumers share this substrate:
- Streaming server (:mod:
fastvideo.entrypoints.streaming) — pipes :class:VideoPartialEventframes into fMP4. - Stateless OpenAI server — ignores progress events, forwards
:class:
VideoFinalEventas the HTTP response body. - Dynamo native backend
(
components/src/dynamo/fastvideo/) — wraps each event as anNvVideosResponsechunk.
The aggregated code path shipped here yields a single
:class:VideoProgressEvent at start and one
:class:VideoFinalEvent at end. Future work will thread
per-step progress events through the pipeline's denoise loop
so streaming consumers don't have to wait on a materialized
final.
Source code in fastvideo/entrypoints/video_generator.py
fastvideo.entrypoints.video_generator.VideoGenerator.generate_video
¶generate_video(prompt: str | None = None, sampling_param: SamplingParam | None = None, mouse_cond: Tensor | None = None, keyboard_cond: Tensor | None = None, grid_sizes: tuple[int, int, int] | list[int] | Tensor | None = None, **kwargs) -> dict[str, Any] | list[dict[str, Any]]
Generate a video based on the given prompt.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str | None
|
The prompt to use for generation (optional if prompt_txt is provided) |
None
|
negative_prompt
|
The negative prompt to use (overrides the one in fastvideo_args) |
required | |
output_path
|
Path to save the video (overrides the one in fastvideo_args) |
required | |
prompt_path
|
Path to prompt file |
required | |
save_video
|
Whether to save the video to disk |
required | |
return_frames
|
Whether to include raw frames in the result dict |
required | |
num_inference_steps
|
Number of denoising steps (overrides fastvideo_args) |
required | |
guidance_scale
|
Classifier-free guidance scale (overrides fastvideo_args) |
required | |
num_frames
|
Number of frames to generate (overrides fastvideo_args) |
required | |
height
|
Height of generated video (overrides fastvideo_args) |
required | |
width
|
Width of generated video (overrides fastvideo_args) |
required | |
fps
|
Frames per second for saved video (overrides fastvideo_args) |
required | |
seed
|
Random seed for generation (overrides fastvideo_args) |
required | |
callback
|
Callback function called after each step |
required | |
callback_steps
|
Number of steps between each callback |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any] | list[dict[str, Any]]
|
A metadata dictionary for single-prompt generation, or a list of |
dict[str, Any] | list[dict[str, Any]]
|
metadata dictionaries for prompt-file batch generation. |
Source code in fastvideo/entrypoints/video_generator.py
359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 | |
fastvideo.entrypoints.video_generator.VideoGenerator.shutdown
¶ fastvideo.entrypoints.video_generator.VideoGenerator.unmerge_lora_weights
¶Use unmerged weights for inference to produce videos that align with validation videos generated during training.