Streaming WebSocket Server Contract¶
The streaming server (fastvideo/entrypoints/streaming/server.py) speaks
a JSON-over-WebSocket protocol with binary fMP4 chunks for media. This
document is the authoritative spec for the message catalogue and the
session state machine. Any change to either must update this document
in the same PR that touches protocol.py or session.py.
Endpoint¶
| Path | Protocol | Purpose |
|---|---|---|
WS /v1/stream |
WebSocket (JSON + binary) | Per-session realtime streaming |
GET /health |
HTTP | Liveness probe (status, stream_mode, active sessions) |
The server is launched by fastvideo serve --config <serve.yaml> when
the config carries a streaming: block. Without that block the same CLI
launches the OpenAI stateless HTTP server instead.
Connection lifecycle¶
Every WebSocket connection holds exactly one Session. Sessions move
through the states in SessionState (fastvideo/entrypoints/streaming/session.py).
┌──────────────┐
│ INITIALIZING │ ← WebSocket accepted, before init frame
└──────┬───────┘
│ session_init_v2 received
┌──────────────┼──────────────┐
▼ ▼ ▼
QUEUED GPU_BINDING REJECTED
│ │ ↑
│ slot ready │ │ max-sessions hit
▼ ▼ │ or invalid init
┌────────┐ │
│ ACTIVE │ ────────┘
└────┬───┘
segment loop │
│
┌───────────┼───────────┐
▼ ▼ ▼
COMPLETE ERROR TIMEOUT
(clean leave) (any failure) (idle / segment_cap reached)
Terminal states (COMPLETE, ERROR, TIMEOUT, REJECTED) are sinks —
no transitions out. The transition matrix is enforced in
session.py::_VALID_TRANSITIONS; bad transitions raise.
SessionManager enforces the per-process budgets pulled from
StreamingConfig:
session_timeout_seconds— idle reaper drops sessions that haven't advanced; non-terminal sessions transition toTIMEOUT.generation_segment_cap— a session that hits the cap transitions toCOMPLETEafter the last segment ships.
Message catalogue¶
Every JSON frame carries {"type": <str>, ...}. Pydantic models in
protocol.py are the source of truth; this table is the human-readable
view.
Client → server¶
type |
Required fields | Purpose |
|---|---|---|
session_init_v2 |
— | Opening frame. Carries preset, curated prompts, optional initial image, feature toggles, optional continuation_state to resume from a snapshot. |
segment_prompt_source |
prompt |
Request the next segment using the supplied prompt; optional sampling overrides (seed, num_inference_steps, guidance_scale, negative_prompt). |
seed_prompts_updated |
seed_prompts |
Replace the session's seed-prompt list; takes effect on the next segment. |
enhancement_updated |
enabled |
Toggle prompt enhancement for subsequent segments. |
auto_extension_updated |
enabled |
Toggle automatic per-segment prompt extension. |
loop_generation_updated |
enabled |
Toggle loop-generation mode. |
generation_paused_updated |
paused |
Pause/resume segment generation; queued requests defer. |
snapshot_state |
— | Request the current ContinuationState for export; server replies with continuation_state_snapshot. |
The opening frame must be session_init_v2. Any other first frame is
rejected with an error (code invalid_message) and the WebSocket is
closed.
Server → client¶
type |
Carries | When emitted |
|---|---|---|
queue_status |
position, queue_depth |
After session_init_v2 accepted, before GPU binding. |
gpu_assigned |
GPU id, model id | Once a generator slot is bound. |
ltx2_stream_start |
session-level metadata | Once the session enters ACTIVE. |
ltx2_segment_start |
segment_idx, prompt, prompt source |
When a segment_prompt_source request begins generation. |
step_complete |
segment_idx, denoise timings |
After the segment's denoising loop finishes (before media emission). |
media_init |
segment_idx, mime, stream id |
First frame of fMP4 output for the segment. |
| binary frame | fMP4 fragment bytes | Subsequent media chunks; the protocol enforces that media_init precedes any binary frames. |
media_segment_complete |
segment_idx, chunk count, byte count |
Last media chunk for the segment. |
ltx2_segment_complete |
segment_idx, segment summary |
Segment fully shipped; ready for the next segment_prompt_source. |
ltx2_stream_complete |
session summary | Session reached generation_segment_cap or client requested clean shutdown. |
session_timeout |
reason | Session hit session_timeout_seconds; immediately followed by close. |
continuation_state_snapshot |
kind, payload |
Reply to snapshot_state. The payload is the same shape produced by LTX2ContinuationState.to_continuation_state(...). |
error |
code, message |
Any validation/runtime error. Non-fatal errors keep the connection open; fatal errors precede a close. |
Continuation state¶
The session optionally accepts a continuation_state dict inside the
opening session_init_v2 frame. When present, the server hydrates it
into a ContinuationState(kind, payload) envelope and feeds it as the
request.state on the first segment's GenerationRequest — letting a
client resume after a disconnect, migrate sessions across processes,
or replay a prior session.
After every segment, if the runtime returns a fresh state, the server
persists it to the SessionStore so a snapshot_state request can
export it. The store and serialization contracts live with the model
family (e.g. fastvideo/pipelines/basic/ltx2/continuation.py for LTX-2).
Example flow¶
client server
────── ──────
WS /v1/stream ─────── connect ─────────────────────────►
◄────── (accept)
{"type": "session_init_v2",
"preset": "ltx2_two_stage",
"curated_prompts": ["a fox in snow", "the fox jumps"],
"initial_image": {...},
"stream_mode": "av_fmp4"} ─────────────────────────────►
(validate, queue, bind)
◄──── {"type": "queue_status",
"position": 0, "queue_depth": 0}
◄──── {"type": "gpu_assigned",
"gpu_id": 0, "model_id": "..."}
◄──── {"type": "ltx2_stream_start", ...}
{"type": "segment_prompt_source",
"prompt": "a fox in snow",
"source": "curated"} ───────────────────────────────────►
(run pipeline)
◄──── {"type": "ltx2_segment_start",
"segment_idx": 1, ...}
◄──── {"type": "step_complete",
"segment_idx": 1, "timings": {...}}
◄──── {"type": "media_init",
"segment_idx": 1,
"mime": "video/mp4", ...}
◄──── <binary fMP4 init segment>
◄──── <binary fMP4 fragment>
◄──── <binary fMP4 fragment>
◄──── {"type": "media_segment_complete",
"segment_idx": 1, "chunks": 12}
◄──── {"type": "ltx2_segment_complete",
"segment_idx": 1, ...}
{"type": "segment_prompt_source",
"prompt": "the fox jumps"} ─────────────────────────────►
(segment 2 …)
{"type": "snapshot_state"} ──────────────────────────────►
◄──── {"type": "continuation_state_snapshot",
"kind": "ltx2.v1",
"payload": {"schema_version": 1, ...}}
(close) ──────────────────────────────────────────────────►
(session → COMPLETE)
Backward / forward compatibility¶
- Adding a new client message: append a Pydantic model to
protocol.pywith a uniquetype; add the discriminator entry toClientMessage; add a row to the table above. Old clients that don't send the new message remain compatible. - Adding a new server message: emit only when a new feature flag is enabled (or always emit, since clients ignore unknown types).
- Changing an existing message: bump the
type(e.g.session_init_v2→session_init_v3) and accept both for one release cycle. Never silently change field semantics under the sametype.