Source: examples/inference/gradio/local

FastVideo Gradio Local Demo¶

This is a Gradio-based web interface for generating videos using the FastVideo framework. The demo allows users to create videos from text prompts with various customization options.

Overview¶

The demo uses the FastVideo framework to generate videos based on text prompts. It provides a simple web interface built with Gradio that allows users to:

Enter text prompts to generate videos
Customize video parameters (dimensions, number of frames, etc.)
Use negative prompts to guide the generation process
Set or randomize seeds for reproducibility

Usage¶

Run the demo with:

python examples/inference/gradio/local/gradio_local_demo.py

This will start a web server at http://0.0.0.0:7860 where you can access the interface.

Model Initialization¶

This demo initializes a VideoGenerator with the minimum required arguments for inference. Users can seamlessly adjust inference options between generations, including prompts, resolution, video length, without ever needing to reload the model.

Video Generation¶

The core functionality is in the generate_video function, which: 1. Processes user inputs 2. Uses the FastVideo VideoGenerator from earlier to run inference (generator.generate_video())

Gradio Interface¶

The interface is built with several components: - A text input for the prompt - A video display for the result - Inference options in a collapsible accordion: - Height and width sliders - Number of frames slider - Guidance scale slider - Negative prompt options - Seed controls

Inference Options¶

Height/Width: Control the resolution of the generated video
Number of Frames: Set how many frames to generate
Guidance Scale: Control how closely the generation follows the prompt
Negative Prompt: Specify what you don't want to see in the video
Seed: Control randomness for reproducible results

Additional Files¶

gradio_local_demo.py

import argparse
import os
import base64
import time

import gradio as gr
from fastvideo.entrypoints.video_generator import VideoGenerator
from fastvideo.api.sampling_param import SamplingParam
from copy import deepcopy


MODEL_PATH_MAPPING = {
    "FastWan2.1-T2V-1.3B": "FastVideo/FastWan2.1-T2V-1.3B-Diffusers",
    # "FastWan2.2-TI2V-5B-FullAttn": "FastVideo/FastWan2.2-TI2V-5B-FullAttn-Diffusers",
}

def create_timing_display(inference_time, total_time, stage_execution_times, num_frames):
    dit_denoising_time = f"{stage_execution_times[5]:.2f}s" if len(stage_execution_times) > 5 else "N/A"

    timing_html = f"""
    <div style="margin: 10px 0;">
        <h3 style="text-align: center; margin-bottom: 10px;">⏱️ Timing Breakdown</h3>
        <div style="display: grid; grid-template-columns: repeat(5, 1fr); gap: 10px; margin-bottom: 10px;">
            <div class="timing-card timing-card-highlight">
                <div style="font-size: 20px;">🚀</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">DiT Denoising</div>
                <div style="font-size: 16px; color: #ffa200; font-weight: bold;">{dit_denoising_time}</div>
            </div>
            <div class="timing-card">
                <div style="font-size: 20px;">🧠</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">E2E (w. vae/text encoder)</div>
                <div style="font-size: 16px; color: #2563eb;">{inference_time:.2f}s</div>
            </div>
            <div class="timing-card">
                <div style="font-size: 20px;">🎬</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">Video Encoding</div>
                <div style="font-size: 16px; color: #dc2626;">N/A</div>
            </div>
            <div class="timing-card">
                <div style="font-size: 20px;">🌐</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">Network Transfer</div>
                <div style="font-size: 16px; color: #059669;">N/A</div>
            </div>
            <div class="timing-card">
                <div style="font-size: 20px;">📊</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">Total Processing</div>
                <div style="font-size: 18px; color: #0277bd;">{total_time:.2f}s</div>
            </div>
        </div>"""

    if inference_time > 0:
        fps = num_frames / inference_time
        timing_html += f"""
        <div class="performance-card" style="margin-top: 15px;">
            <span style="font-weight: bold;">Generation Speed: </span>
            <span style="font-size: 18px; color: #6366f1; font-weight: bold;">{fps:.1f} frames/second</span>
        </div>"""

    return timing_html + "</div>"
def setup_model_environment(model_path: str) -> None:
    if "fullattn" in model_path.lower():
        os.environ["FASTVIDEO_ATTENTION_BACKEND"] = "FLASH_ATTN"
    else:
        os.environ["FASTVIDEO_ATTENTION_BACKEND"] = "VIDEO_SPARSE_ATTN"
    os.environ["FASTVIDEO_STAGE_LOGGING"] = "1"

def load_example_prompts():
    def contains_chinese(text):
        return any('\u4e00' <= char <= '\u9fff' for char in text)

    def load_from_file(filepath):
        prompts, labels = [], []
        try:
            with open(filepath, "r", encoding='utf-8') as f:
                for line in f:
                    line = line.strip()
                    if line and not contains_chinese(line):
                        label = line[:100] + "..." if len(line) > 100 else line
                        labels.append(label)
                        prompts.append(line)
        except Exception as e:
            print(f"Warning: Could not read {filepath}: {e}")
        return prompts, labels

    examples, example_labels = load_from_file("examples/inference/gradio/local/prompts_final.txt")

    if not examples:
        examples = ["A crowded rooftop bar buzzes with energy, the city skyline twinkling like a field of stars in the background."]
        example_labels = ["Crowded rooftop bar at night"]

    return examples, example_labels


def create_gradio_interface(default_params: dict[str, SamplingParam], generators: dict[str, VideoGenerator]):
    def generate_video(
        prompt, negative_prompt, use_negative_prompt, seed, guidance_scale,
        num_frames, height, width, randomize_seed, model_selection, progress
    ):
        model_path = MODEL_PATH_MAPPING.get(model_selection, "FastVideo/FastWan2.1-T2V-1.3B-Diffusers")
        setup_model_environment(model_path)
        try:
            if progress:
                progress(0.1, desc="Loading model for local inference...")

            generator = generators[model_path]
            params = deepcopy(default_params[model_path])
            total_start_time = time.time()
            if progress:
                progress(0.2, desc="Configuring parameters...")

            params.prompt = prompt
            params.seed = int(seed)
            params.guidance_scale = guidance_scale
            params.num_frames = int(num_frames)
            params.height = int(height)
            params.width = int(width)

            if randomize_seed:
                params.seed = torch.randint(0, 1000000, (1, )).item()

            if use_negative_prompt and negative_prompt:
                params.negative_prompt = negative_prompt
            else:
                params.negative_prompt = default_params[model_path].negative_prompt

            if progress:
                progress(0.4, desc="Generating video locally...")

            output_dir = "outputs/"
            os.makedirs(output_dir, exist_ok=True)
            start_time = time.time()
            result = generator.generate_video(prompt=prompt, sampling_param=params, save_video=True, return_frames=False)
            inference_time = time.time() - start_time
            logging_info = result.get("logging_info", None)
            if logging_info:
                stage_names = logging_info.get_execution_order()
                stage_execution_times = [
                    logging_info.get_stage_info(stage_name).get("execution_time", 0.0) 
                    for stage_name in stage_names
                ]
            else:
                stage_names = []
                stage_execution_times = []
            total_time = time.time() - total_start_time
            timing_details=create_timing_display(inference_time=inference_time, total_time=total_time, stage_execution_times=stage_execution_times, num_frames=params.num_frames)
            safe_prompt = params.prompt[:100].replace(' ', '_').replace('/', '_').replace('\\', '_')
            video_filename = f"{params.prompt[:100]}.mp4"
            output_path = os.path.join(output_dir, video_filename)

            if progress:
                progress(1.0, desc="Generation complete!")

            return output_path, params.seed, timing_details

        except Exception as e:
            print(f"An error occurred during local generation: {e}")
            return None, f"Generation failed: {str(e)}", ""

    examples, example_labels = load_example_prompts()

    theme = gr.themes.Base().set(
        button_primary_background_fill="#2563eb",
        button_primary_background_fill_hover="#1d4ed8",
        button_primary_text_color="white",
        slider_color="#2563eb",
        checkbox_background_color_selected="#2563eb",
    )

    def get_default_values(model_name):
        model_path = MODEL_PATH_MAPPING.get(model_name)
        if model_path and model_path in default_params:
            params = default_params[model_path]
            return {
                'height': params.height,
                'width': params.width,
                'num_frames': params.num_frames,
                'guidance_scale': params.guidance_scale,
                'seed': params.seed,
            }

        return {
            'height': 448,
            'width': 832,
            'num_frames': 61,
            'guidance_scale': 3.0,
            'seed': 1024,
        }

    initial_values = get_default_values("FastWan2.1-T2V-1.3B")

    with gr.Blocks(title="FastWan", theme=theme) as demo:
        gr.Image("assets/full.svg", show_label=False, container=False, height=80)

        gr.HTML("""
        <div style="text-align: center; margin-bottom: 10px;">
            <p style="font-size: 18px;"> Make Video Generation Go Blurrrrrrr </p>
            <p style="font-size: 18px;"> <a href="https://github.com/hao-ai-lab/FastVideo/tree/main" target="_blank">Code</a> | <a href="https://hao-ai-lab.github.io/blogs/fastvideo_post_training/" target="_blank">Blog</a> | <a href="https://hao-ai-lab.github.io/FastVideo/" target="_blank">Docs</a>  </p>
        </div>
        """)

        with gr.Accordion("🎥 What Is FastVideo?", open=False):
            gr.HTML("""
            <div style="padding: 20px; line-height: 1.6;">
                <p style="font-size: 16px; margin-bottom: 15px;">
                    FastVideo is an inference and post-training framework for diffusion models. It features an end-to-end unified pipeline for accelerating diffusion models, starting from data preprocessing to model training, finetuning, distillation, and inference. FastVideo is designed to be modular and extensible, allowing users to easily add new optimizations and techniques. Whether it is training-free optimizations or post-training optimizations, FastVideo has you covered.
                </p>
            </div>
            """)

        with gr.Row():
            model_selection = gr.Dropdown(
                choices=list(MODEL_PATH_MAPPING.keys()),
                value="FastWan2.1-T2V-1.3B",
                label="Select Model",
                interactive=True
            )

        with gr.Row():
            example_dropdown = gr.Dropdown(
                choices=example_labels,
                label="Example Prompts",
                value=None,
                interactive=True,
                allow_custom_value=False
            )

        with gr.Row():
            with gr.Column(scale=6):
                prompt = gr.Text(
                    label="Prompt",
                    show_label=False,
                    max_lines=3,
                    placeholder="Describe your scene...",
                    container=False,
                    lines=3,
                    autofocus=True,
                )
            with gr.Column(scale=1, min_width=120, elem_classes="center-button"):
                run_button = gr.Button("Run", variant="primary", size="lg")

        with gr.Row():
            with gr.Column():
                error_output = gr.Text(label="Error", visible=False)
                timing_display = gr.Markdown(label="Timing Breakdown", visible=False)

        with gr.Row(equal_height=True, elem_classes="main-content-row"):
            with gr.Column(scale=1, elem_classes="advanced-options-column"):
                with gr.Group():
                    gr.HTML("<div style='margin: 0 0 15px 0; text-align: center; font-size: 16px;'>Advanced Options</div>")
                    with gr.Row():
                        height = gr.Number(
                            label="Height",
                            value=initial_values['height'],
                            interactive=False,
                            container=True
                        )
                        width = gr.Number(
                            label="Width",
                            value=initial_values['width'],
                            interactive=False,
                            container=True
                        )

                    with gr.Row():
                        num_frames = gr.Number(
                            label="Number of Frames",
                            value=initial_values['num_frames'],
                            interactive=False,
                            container=True
                        )
                        guidance_scale = gr.Slider(
                            label="Guidance Scale",
                            minimum=1,
                            maximum=12,
                            value=initial_values['guidance_scale'],
                        )

                    with gr.Row():
                        use_negative_prompt = gr.Checkbox(
                            label="Use negative prompt", value=False)
                        negative_prompt = gr.Text(
                            label="Negative prompt",
                            max_lines=3,
                            lines=3,
                            placeholder="Enter a negative prompt",
                            visible=False,
                        )

                    seed = gr.Slider(
                        label="Seed",
                        minimum=0,
                        maximum=1000000,
                        step=1,
                        value=initial_values['seed'],
                    )
                    randomize_seed = gr.Checkbox(label="Randomize seed", value=False)
                    seed_output = gr.Number(label="Used Seed")

            with gr.Column(scale=1, elem_classes="video-column"):
                result = gr.Video(
                    label="Generated Video", 
                    show_label=True,
                    height=466,
                    width=600,
                    container=True,
                    elem_classes="video-component"
                )

        gr.HTML("""
        <style>
        .center-button {
            display: flex !important;
            justify-content: center !important;
            height: 100% !important;
            padding-top: 1.4em !important;
        }

        .gradio-container {
            max-width: 1200px !important;
            margin: 0 auto !important;
        }

        .main {
            max-width: 1200px !important;
            margin: 0 auto !important;
        }

        .gr-form, .gr-box, .gr-group {
            max-width: 1200px !important;
        }

        .gr-video {
            max-width: 500px !important;
            margin: 0 auto !important;
        }

        .main-content-row {
            display: flex !important;
            align-items: flex-start !important;
            min-height: 500px !important;
            gap: 20px !important;
        }

        .advanced-options-column,
        .video-column {
            display: flex !important;
            flex-direction: column !important;
            flex: 1 !important;
            min-height: 400px !important;
            align-items: stretch !important;
        }

        .video-column > * {
            margin-top: 0 !important;
        }

        .video-column .gr-video,
        .video-component {
            margin-top: 0 !important;
            padding-top: 0 !important;
        }

        .video-column .gr-video .gr-form {
            margin-top: 0 !important;
        }

        .advanced-options-column .gr-group,
        .video-column .gr-video {
            margin-top: 0 !important;
            vertical-align: top !important;
        }

        .advanced-options-column > *:last-child,
        .video-column > *:last-child {
            flex-grow: 0 !important;
        }

        @media (max-width: 1400px) {
            .main-content-row {
                min-height: 600px !important;
            }

            .advanced-options-column,
            .video-column {
                min-height: 600px !important;
            }
        }

        @media (max-width: 1200px) {
            .main-content-row {
                flex-direction: column !important;
                align-items: stretch !important;
            }

            .advanced-options-column,
            .video-column {
                min-height: auto !important;
                width: 100% !important;
            }
        }

        .timing-card {
            background: var(--background-fill-secondary) !important;
            border: 1px solid var(--border-color-primary) !important;
            color: var(--body-text-color) !important;
            padding: 10px;
            border-radius: 8px;
            text-align: center;
            min-height: 80px;
            display: flex;
            flex-direction: column;
            justify-content: center;
        }

        .timing-card-highlight {
            background: var(--background-fill-primary) !important;
            border: 2px solid var(--color-accent) !important;
        }

        .performance-card {
            background: var(--background-fill-secondary) !important;
            border: 1px solid var(--border-color-primary) !important;
            color: var(--body-text-color) !important;
            padding: 10px;
            border-radius: 6px;
            text-align: center;
        }

        .gr-number input[readonly] {
            background-color: var(--background-fill-secondary) !important;
            border: 1px solid var(--border-color-primary) !important;
            color: var(--body-text-color-subdued) !important;
            cursor: default !important;
            text-align: center !important;
            font-weight: 500 !important;
        }
        </style>
        """)

        def on_example_select(example_label):
            if example_label and example_label in example_labels:
                index = example_labels.index(example_label)
                return examples[index]
            return ""

        example_dropdown.change(
            fn=on_example_select,
            inputs=example_dropdown,
            outputs=prompt,
        )

        gr.HTML("""
        <div style="text-align: center; margin-top: 10px; margin-bottom: 15px;">
            <p style="font-size: 16px; margin: 0;">Note that this demo is meant to showcase FastWan's quality and that under a large number of requests, generation speed may be affected.</p>
        </div>
        """)

        use_negative_prompt.change(
            fn=lambda x: gr.update(visible=x),
            inputs=use_negative_prompt,
            outputs=negative_prompt,
        )

        def on_model_selection_change(selected_model):
            if not selected_model:
                selected_model = "FastWan2.1-T2V-1.3B"

            model_path = MODEL_PATH_MAPPING.get(selected_model)

            if model_path and model_path in default_params:
                params = default_params[model_path]
                return (
                    gr.update(value=params.height),
                    gr.update(value=params.width),
                    gr.update(value=params.num_frames),
                    gr.update(value=params.guidance_scale),
                    gr.update(value=params.seed),
                )

            return (
                gr.update(value=448),
                gr.update(value=832),
                gr.update(value=61),
                gr.update(value=3.0),
                gr.update(value=1024),
            )

        model_selection.change(
            fn=on_model_selection_change,
            inputs=model_selection,
            outputs=[height, width, num_frames, guidance_scale, seed],
        )

        def handle_generation(*args, progress=None, request: gr.Request = None):
            model_selection, prompt, negative_prompt, use_negative_prompt, seed, guidance_scale, num_frames, height, width, randomize_seed = args

            result_path, seed_or_error, timing_details = generate_video(
                prompt, negative_prompt, use_negative_prompt, seed, guidance_scale, 
                num_frames, height, width, randomize_seed, model_selection, progress
            )
            if result_path and os.path.exists(result_path):
                return (
                    result_path, 
                    seed_or_error, 
                    gr.update(visible=False),
                    gr.update(visible=True, value=timing_details),
                )
            else:
                return (
                    None, 
                    seed_or_error, 
                    gr.update(visible=True, value=seed_or_error),
                    gr.update(visible=False),
                )

        run_button.click(
            fn=handle_generation,
            inputs=[
                model_selection,
                prompt,
                negative_prompt,
                use_negative_prompt,
                seed,
                guidance_scale,
                num_frames,
                height,
                width,
                randomize_seed,
            ],
            outputs=[result, seed_output, error_output, timing_display],
            concurrency_limit=20,
        )

    return demo


def main():
    parser = argparse.ArgumentParser(description="FastVideo Gradio Local Demo")
    parser.add_argument("--t2v_model_paths", type=str,
                        default="FastVideo/FastWan2.1-T2V-1.3B-Diffusers",
                        help="Comma separated list of paths to the T2V model(s)")
    parser.add_argument("--host", type=str, default="0.0.0.0",
                        help="Host to bind to")
    parser.add_argument("--port", type=int, default=7860,
                        help="Port to bind to")
    args = parser.parse_args()
    generators = {}
    default_params = {}
    model_paths = args.t2v_model_paths.split(",")
    for model_path in model_paths:
        print(f"Loading model: {model_path}")
        setup_model_environment(model_path)
        generators[model_path] = VideoGenerator.from_pretrained(model_path)
        default_params[model_path] = SamplingParam.from_pretrained(model_path)
    demo = create_gradio_interface(default_params, generators)
    print(f"Starting Gradio frontend at http://{args.host}:{args.port}")
    print(f"T2V Models: {args.t2v_model_paths}")

    from fastapi import FastAPI, Request, HTTPException
    from fastapi.responses import HTMLResponse, FileResponse
    import uvicorn

    app = FastAPI()

    @app.get("/logo.png")
    def get_logo():
        return FileResponse(
            "assets/full.svg",
            media_type="image/svg+xml",
            headers={
                "Cache-Control": "public, max-age=3600",
                "Access-Control-Allow-Origin": "*"
            }
        )

    @app.get("/favicon.ico")
    def get_favicon():
        favicon_path = "assets/icon-simple.svg"

        if os.path.exists(favicon_path):
            return FileResponse(
                favicon_path, 
                media_type="image/svg+xml",
                headers={
                    "Cache-Control": "public, max-age=3600",
                    "Access-Control-Allow-Origin": "*"
                }
            )
        else:
            raise HTTPException(status_code=404, detail="Favicon not found")

    @app.get("/", response_class=HTMLResponse)
    def index(request: Request):
        base_url = str(request.base_url).rstrip('/')
        return f"""
        <!DOCTYPE html>
        <html lang="en">
        <head>
            <meta charset="UTF-8" />
            <meta name="viewport" content="width=device-width, initial-scale=1.0" />

            <title>FastWan</title>
            <meta name="title" content="FastWan">
            <meta name="description" content="Make video generation go blurrrrrrr">
            <meta name="keywords" content="FastVideo, video generation, AI, machine learning, FastWan">

            <meta property="og:type" content="website">
            <meta property="og:url" content="{base_url}/">
            <meta property="og:title" content="FastWan">
            <meta property="og:description" content="Make video generation go blurrrrrrr">
            <meta property="og:image" content="{base_url}/logo.png">
            <meta property="og:image:width" content="1200">
            <meta property="og:image:height" content="630">
            <meta property="og:site_name" content="FastWan">

            <meta property="twitter:card" content="summary_large_image">
            <meta property="twitter:url" content="{base_url}/">
            <meta property="twitter:title" content="FastWan">
            <meta property="twitter:description" content="Make video generation go blurrrrrrr">
            <meta property="twitter:image" content="{base_url}/logo.png">
            <link rel="icon" type="image/png" sizes="32x32" href="/favicon.ico">
            <link rel="icon" type="image/png" sizes="16x16" href="/favicon.ico">
            <link rel="apple-touch-icon" href="/favicon.ico">
            <style>
                body, html {{
                    margin: 0;
                    padding: 0;
                    height: 100%;
                    overflow: hidden;
                }}
                iframe {{
                    width: 100%;
                    height: 100vh;
                    border: none;
                }}
            </style>
        </head>
        <body>
            <iframe src="/gradio" width="100%" height="100%" style="border: none;"></iframe>
        </body>
        </html>
        """

    app = gr.mount_gradio_app(
        app, 
        demo, 
        path="/gradio",
        allowed_paths=[os.path.abspath("outputs"), os.path.abspath("fastvideo-logos")]
    )

    uvicorn.run(app, host=args.host, port=args.port)


if __name__ == "__main__":

    main()

gradio_local_demo_ltx2_3/README.md

FastLTX-2.3 Gradio local demo¶

Local Gradio + FastAPI demo for FastLTX-2.3 text-to-video generation. This directory is a package (gradio_local_demo_ltx2_3/) split out from the original single-file version to make each concern independently reviewable. The folder name matches the sibling gradio_local_demo*.py demos in this directory so both flat and packaged demos read consistently.

Status: draft. This package is structurally in place but will not run against the current upstream fastvideo package. See Blocking prereqs below.

Layout¶

File	Purpose
`app.py`	`main()` — CLI args, `VideoGenerator` / `SamplingParam` boot, FastAPI mount with logo/favicon/generated-clip routes, `uvicorn.run`.
`config.py`	Constants, defaults, env-var resolution, Inductor tuning flags, `setup_model_environment`, `resolve_model_path`, `resolve_refine_upsampler_path`, `apply_ltx2_defaults`.
`safety.py`	fastText NSFW + hate-speech classifiers, `PromptSafetyCheck`, `get_prompt_safety_check`.
`prompt_rewrite.py`	Cerebras-backed prompt enhancer wrapper (`maybe_enhance_prompt`, `get_prompt_enhancer`). Curated prompts bypass enhancement.
`prompt_enhancer.py`	Cerebras API client used by `prompt_rewrite.py`.
`rendering.py`	HTML helpers: timing cards, error cards, completed-clip gallery, image-upload status.
`examples.py`	`load_example_prompts` — reads `selected_ltx2_prompts.jsonl`.
`ui.py`	`create_gradio_interface` — Gradio Blocks, CSS, event wiring, generation closure.
`__main__.py`	Enables `python -m gradio_local_demo_ltx2_3`.
`__init__.py`	Re-exports `main` from `app`.
`selected_ltx2_prompts.jsonl`	Curated example prompts.
`prompts/prompt_extension_system_prompt.md`	System prompt for the Cerebras enhancer.
`download_fasttext_classifiers.py`	Helper to download NSFW/hate-speech classifier binaries from Hugging Face Hub.

How to run¶

cd examples/inference/gradio/local
python -m gradio_local_demo_ltx2_3 --port 7860

GPU requirement: a single FP4-capable GPU (B200 or comparable) for the "real-time 1080p" speed claim. Lower tiers will still run but slower.

Blocking prereqs (why this draft PR cannot be merged yet)¶

The upstream fastvideo package is missing three pieces that the demo currently depends on verbatim. Each needs its own upstreaming PR before this demo can actually boot:

fastvideo.layers.quantization.fp4_config.FP4Config — the demo sets pipeline_config.dit_config.quant_config = FP4Config() in app.py. Upstream only ships absmax_fp8.py and base_config.py under fastvideo/layers/quantization/.
LTX-2.3 refine / image-conditioning kwargs on VideoGenerator — ltx2_refine_enabled, ltx2_refine_upsampler_path, ltx2_refine_lora_path, ltx2_refine_num_inference_steps, ltx2_refine_guidance_scale, ltx2_refine_add_noise, ltx2_images, ltx2_image_crf. Upstream fastvideo/fastvideo_args.py currently wires only ltx2_vae_tiling. The backing stages (ltx2_refine.py, ltx2_i2v_conditioning.py) are also missing from fastvideo/pipelines/stages/.
fastvideo.configs.sample.base.SamplingParam — the import path used by this demo. Upstream moved sampling params to fastvideo.api.sampling_param. A re-export shim at the old path, or an import update here once the other two prereqs land, will resolve it.

Environment variables¶

Var	Default	Purpose
`LTX2_3_MODEL_PATH`	`FastVideo/LTX-2.3-Distilled-Diffusers`	Model ID or local snapshot.
`LTX2_CLASSIFIER_DIR`	package dir	Where to look for fastText classifiers.
`LTX2_NSFW_CLASSIFIER_PATH`	—	Explicit path to NSFW classifier `.bin`.
`LTX2_HATESPEECH_CLASSIFIER_PATH`	—	Explicit path to hate-speech classifier `.bin`.
`LTX2_REFINE_UPSAMPLER_PATH`	—	Explicit path to the spatial upsampler dir.
`FASTVIDEO_PROMPT_API_KEY` / `CEREBRAS_API_KEY`	—	Cerebras API key for prompt enhancement. When missing, enhancer returns the raw prompt.
`LTX2_PROMPT_MODEL`	`gpt-oss-120b`	Cerebras model name for the enhancer.
`LTX2_PROMPT_TEMPERATURE`	`1.0`	Enhancer LLM temperature.
`LTX2_PROMPT_EXTENSION_SYSTEM_PROMPT_PATH`	`prompts/prompt_extension_system_prompt.md`	System prompt for the enhancer.

Fetching the safety classifiers¶

python examples/inference/gradio/local/gradio_local_demo_ltx2_3/download_fasttext_classifiers.py

The classifiers come from allenai/dolma-jigsaw-fasttext-bigrams-{nsfw,hatespeech} on Hugging Face Hub.

gradio_local_demo_ltx2_3/init.py

"""FastLTX-2.3 Gradio local demo package.

Split from the monolithic gradio_local_demo_ltx2_3.py for maintainability.
Runtime entrypoint is `main` in .app.
"""

from .app import main

__all__ = ["main"]

gradio_local_demo_ltx2_3/main.py

from .app import main

if __name__ == "__main__":
    main()

gradio_local_demo_ltx2_3/app.py

import argparse
import os
from pathlib import Path

import gradio as gr

from fastvideo.configs.pipelines.base import PipelineConfig
from fastvideo.configs.sample.base import SamplingParam
from fastvideo.entrypoints.video_generator import VideoGenerator
from fastvideo.layers.quantization.fp4_config import FP4Config
from fastvideo.utils import maybe_download_model

from .config import (
    GENERATED_CLIP_ROOT,
    MODEL_ID,
    apply_ltx2_defaults,
    resolve_model_path,
    resolve_refine_upsampler_path,
    setup_model_environment,
)
from .ui import create_gradio_interface

def main():
    parser = argparse.ArgumentParser(description="FastVideo Gradio Local Demo")
    parser.add_argument("--t2v_model_paths", type=str,
                        default=MODEL_ID,
                        help="Comma separated list of paths to the T2V model(s)")
    parser.add_argument("--host", type=str, default="0.0.0.0",
                        help="Host to bind to")
    parser.add_argument("--port", type=int, default=7860,
                        help="Port to bind to")
    args = parser.parse_args()
    gradio_temp_dir = os.path.abspath("outputs/gradio_tmp")
    os.makedirs(gradio_temp_dir, exist_ok=True)
    os.environ["GRADIO_TEMP_DIR"] = gradio_temp_dir
    generators = {}
    default_params = {}
    model_paths = args.t2v_model_paths.split(",")
    for model_path in model_paths:
        print(f"Loading model: {model_path}")
        setup_model_environment(model_path)
        resolved_model_input = str(resolve_model_path(model_path))
        model_root = maybe_download_model(resolved_model_input)
        resolved_model_path = Path(model_root)

        pipeline_config = PipelineConfig.from_pretrained(str(resolved_model_path))
        pipeline_config.dit_config.quant_config = FP4Config()
        refine_upsampler_path = resolve_refine_upsampler_path(resolved_model_path)
        print(f"Using refine upsampler: {refine_upsampler_path}")

        generators[model_path] = VideoGenerator.from_pretrained(
            str(resolved_model_path),
            num_gpus=1,
            ltx2_refine_enabled=True,
            ltx2_refine_upsampler_path=str(refine_upsampler_path),
            ltx2_refine_lora_path="",  # disable refine LoRA for distilled model
            ltx2_refine_num_inference_steps=2,
            ltx2_refine_guidance_scale=1.0,
            ltx2_refine_add_noise=True,
            pipeline_config=pipeline_config,
            enable_torch_compile=True,
            enable_torch_compile_text_encoder=True,
            torch_compile_kwargs={
                "backend": "inductor",
                "fullgraph": True,
                "mode": "max-autotune-no-cudagraphs",
                "dynamic": False,
            },
            dit_cpu_offload=False,
            vae_cpu_offload=False,
            text_encoder_cpu_offload=False,
            ltx2_vae_tiling=False,
        )
        default_params[model_path] = apply_ltx2_defaults(
            SamplingParam.from_pretrained(str(resolved_model_path))
        )
    demo = create_gradio_interface(default_params, generators)
    print(f"Starting Gradio frontend at http://{args.host}:{args.port}")
    print(f"T2V Models: {args.t2v_model_paths}")

    from fastapi import FastAPI, Request, HTTPException
    from fastapi.responses import HTMLResponse, FileResponse
    import uvicorn

    app = FastAPI()

    @app.get("/logo.png")
    def get_logo():
        return FileResponse(
            "assets/full.svg",
            media_type="image/svg+xml",
            headers={
                "Cache-Control": "public, max-age=3600",
                "Access-Control-Allow-Origin": "*"
            }
        )

    @app.get("/nvidia.png")
    def get_nvidia_logo():
        return FileResponse(
            "assets/nv.png",
            media_type="image/png",
            headers={
                "Cache-Control": "public, max-age=3600",
                "Access-Control-Allow-Origin": "*"
            }
        )

    @app.get("/favicon.ico")
    def get_favicon():
        favicon_path = "assets/icon-simple.svg"

        if os.path.exists(favicon_path):
            return FileResponse(
                favicon_path, 
                media_type="image/svg+xml",
                headers={
                    "Cache-Control": "public, max-age=3600",
                    "Access-Control-Allow-Origin": "*"
                }
            )
        else:
            raise HTTPException(status_code=404, detail="Favicon not found")

    @app.get("/generated-clips/{clip_path:path}")
    def get_generated_clip(clip_path: str):
        root = GENERATED_CLIP_ROOT.resolve()
        resolved_path = (root / clip_path).resolve()

        if root not in resolved_path.parents or not resolved_path.is_file():
            raise HTTPException(status_code=404, detail="Clip not found")

        return FileResponse(
            resolved_path,
            media_type="video/mp4",
            headers={
                "Cache-Control": "no-store",
                "Access-Control-Allow-Origin": "*",
            },
        )

    @app.get("/", response_class=HTMLResponse)
    def index(request: Request):
        base_url = str(request.base_url).rstrip('/')
        return f"""
        <!DOCTYPE html>
        <html lang="en">
        <head>
            <meta charset="UTF-8" />
            <meta name="viewport" content="width=device-width, initial-scale=1.0" />

            <title>FastLTX-2.3</title>
            <meta name="title" content="FastLTX-2.3">
            <meta name="description" content="Make video generation go blurrrrrrr">
            <meta name="keywords" content="FastVideo, video generation, AI, machine learning, FastLTX-2.3">

            <meta property="og:type" content="website">
            <meta property="og:url" content="{base_url}/">
            <meta property="og:title" content="FastLTX-2.3">
            <meta property="og:description" content="Make video generation go blurrrrrrr">
            <meta property="og:image" content="{base_url}/logo.png">
            <meta property="og:image:width" content="1200">
            <meta property="og:image:height" content="630">
            <meta property="og:site_name" content="FastLTX-2.3">

            <meta property="twitter:card" content="summary_large_image">
            <meta property="twitter:url" content="{base_url}/">
            <meta property="twitter:title" content="FastLTX-2.3">
            <meta property="twitter:description" content="Make video generation go blurrrrrrr">
            <meta property="twitter:image" content="{base_url}/logo.png">
            <link rel="icon" type="image/png" sizes="32x32" href="/favicon.ico">
            <link rel="icon" type="image/png" sizes="16x16" href="/favicon.ico">
            <link rel="apple-touch-icon" href="/favicon.ico">
            <style>
                body, html {{
                    margin: 0;
                    padding: 0;
                    min-height: 100%;
                    width: 100%;
                    background: #000;
                    background-color: #000;
                    background-image: none;
                    overscroll-behavior-y: auto;
                    scroll-behavior: smooth;
                }}
                body {{
                    position: relative;
                }}
                body::before {{
                    content: "";
                    position: fixed;
                    inset: 0;
                    background: #000;
                    pointer-events: none;
                    z-index: -1;
                }}
                iframe {{
                    display: block;
                    width: 100%;
                    height: 100vh;
                    background: #000;
                    background-color: #000;
                    background-image: none;
                    border: none;
                }}
            </style>
        </head>
        <body>
            <iframe src="/gradio" width="100%" height="100%" style="border: none;"></iframe>
        </body>
        </html>
        """

    app = gr.mount_gradio_app(
        app, 
        demo, 
        path="/gradio",
        allowed_paths=[
            os.path.abspath("outputs"),
            os.path.abspath("outputs_video"),
            os.path.abspath("fastvideo-logos"),
        ]
    )

    uvicorn.run(app, host=args.host, port=args.port)

gradio_local_demo_ltx2_3/config.py

import os
from pathlib import Path

import torch
import torch._inductor.config

from fastvideo.configs.sample.base import SamplingParam

LOCAL_DEMO_DIR = Path(__file__).resolve().parent
CLASSIFIER_DIR = Path(
    os.path.expandvars(
        os.path.expanduser(
            os.getenv("LTX2_CLASSIFIER_DIR", str(LOCAL_DEMO_DIR))
        )
    )
)

MODEL_ID = os.path.expandvars(
    os.path.expanduser(
        os.getenv("LTX2_3_MODEL_PATH", "FastVideo/LTX-2.3-Distilled-Diffusers")
    )
)
MODEL_PATH_MAPPING = {
    "FastLTX-2.3": MODEL_ID,
}

DEFAULT_HEIGHT = 1088
DEFAULT_WIDTH = 1920
DEFAULT_NUM_FRAMES = 121
DEFAULT_FPS = 24
DEFAULT_GUIDANCE_SCALE = 1.0
DEFAULT_NUM_INFERENCE_STEPS = 5
DEFAULT_SEED = 10
DEFAULT_NEGATIVE_PROMPT = ""
REFINE_UPSAMPLER_PATH = "converted/ltx2_spatial_upscaler"
REPO_ROOT = Path(__file__).resolve().parents[5]
OUTPUT_DIR = REPO_ROOT / "outputs_video" / "ltx2_basic_new"
GENERATED_CLIP_ROOT = REPO_ROOT / "outputs_video"
MAX_SESSION_CLIPS = 24

config = torch._inductor.config
config.conv_1x1_as_mm = True
config.coordinate_descent_tuning = True
config.coordinate_descent_check_all_directions = True
config.epilogue_fusion = False

def apply_ltx2_defaults(params: SamplingParam) -> SamplingParam:
    params.height = DEFAULT_HEIGHT
    params.width = DEFAULT_WIDTH
    params.num_frames = DEFAULT_NUM_FRAMES
    params.fps = DEFAULT_FPS
    params.guidance_scale = DEFAULT_GUIDANCE_SCALE
    params.num_inference_steps = DEFAULT_NUM_INFERENCE_STEPS
    params.seed = DEFAULT_SEED
    params.negative_prompt = DEFAULT_NEGATIVE_PROMPT
    return params

def resolve_model_path(model_path: str) -> Path:
    return Path(os.path.expandvars(os.path.expanduser(model_path)))

def resolve_refine_upsampler_path(model_path: Path) -> Path:
    candidates = [
        model_path / "spatial_upscaler",
        model_path / "spatial_upsampler",
        Path(os.path.expandvars(os.path.expanduser(REFINE_UPSAMPLER_PATH))),
        REPO_ROOT / REFINE_UPSAMPLER_PATH,
    ]

    env_path = os.getenv("LTX2_REFINE_UPSAMPLER_PATH")
    if env_path:
        candidates.insert(
            0, Path(os.path.expandvars(os.path.expanduser(env_path)))
        )

    for candidate in candidates:
        if (candidate / "config.json").is_file():
            return candidate

    checked = "\n".join(f"  - {candidate}" for candidate in candidates)
    raise FileNotFoundError(
        "Could not find an LTX2 refine upsampler directory.\n"
        "Checked:\n"
        f"{checked}\n"
        "Set LTX2_REFINE_UPSAMPLER_PATH or update REFINE_UPSAMPLER_PATH."
    )

def setup_model_environment(model_path: str) -> None:
    _ = model_path
    os.environ["FASTVIDEO_ATTENTION_BACKEND"] = "FLASH_ATTN"
    os.environ["FASTVIDEO_STAGE_LOGGING"] = "1"

gradio_local_demo_ltx2_3/download_fasttext_classifiers.py

from __future__ import annotations

import argparse
import os
import shutil
from dataclasses import dataclass
from pathlib import Path

from huggingface_hub import hf_hub_download

LOCAL_DEMO_DIR = Path(__file__).resolve().parent


@dataclass(frozen=True)
class ClassifierSpec:
    name: str
    repo_id: str
    source_filename: str
    target_filename: str
    env_var: str


CLASSIFIERS = {
    "nsfw": ClassifierSpec(
        name="nsfw",
        repo_id="allenai/dolma-jigsaw-fasttext-bigrams-nsfw",
        source_filename="model.bin",
        target_filename="jigsaw_fasttext_bigrams_nsfw_final.bin",
        env_var="LTX2_NSFW_CLASSIFIER_PATH",
    ),
    "hatespeech": ClassifierSpec(
        name="hatespeech",
        repo_id="allenai/dolma-jigsaw-fasttext-bigrams-hatespeech",
        source_filename="model.bin",
        target_filename="jigsaw_fasttext_bigrams_hatespeech_final.bin",
        env_var="LTX2_HATESPEECH_CLASSIFIER_PATH",
    ),
}


def expand_path(path: str | Path) -> Path:
    return Path(os.path.expandvars(os.path.expanduser(str(path))))


def default_output_dir() -> Path:
    raw_value = os.getenv("LTX2_CLASSIFIER_DIR", str(LOCAL_DEMO_DIR))
    return expand_path(raw_value)


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description=(
            "Download the fastText safety classifiers used by "
            "gradio_local_demo_ltx2_3.py."
        )
    )
    parser.add_argument(
        "--output-dir",
        type=Path,
        default=default_output_dir(),
        help=(
            "Directory where the demo looks for classifiers. Defaults to "
            "LTX2_CLASSIFIER_DIR or the local Gradio demo directory."
        ),
    )
    parser.add_argument(
        "--classifier",
        choices=sorted(CLASSIFIERS),
        nargs="+",
        default=sorted(CLASSIFIERS),
        help="Subset of classifiers to download. Defaults to both.",
    )
    parser.add_argument(
        "--cache-dir",
        type=Path,
        default=None,
        help="Optional Hugging Face cache directory.",
    )
    parser.add_argument(
        "--force",
        action="store_true",
        help="Redownload and overwrite existing classifier files.",
    )
    parser.add_argument(
        "--token",
        default=None,
        help="Optional Hugging Face token for authenticated downloads.",
    )
    return parser.parse_args()


def materialize_download(source: Path, destination: Path) -> None:
    if destination.exists() or destination.is_symlink():
        destination.unlink()

    shutil.copy2(source, destination)


def download_classifier(
    spec: ClassifierSpec,
    output_dir: Path,
    cache_dir: Path | None,
    force: bool,
    token: str | None,
) -> Path:
    destination = output_dir / spec.target_filename
    if destination.is_file() and not force:
        print(f"[skip] {spec.name}: {destination}")
        return destination

    print(
        f"[download] {spec.name}: "
        f"{spec.repo_id}/{spec.source_filename}"
    )
    cached_path = Path(
        hf_hub_download(
            repo_id=spec.repo_id,
            filename=spec.source_filename,
            cache_dir=cache_dir,
            force_download=force,
            token=token,
        )
    )
    materialize_download(cached_path, destination)
    print(f"[ready] {spec.name}: {destination}")
    return destination


def print_summary(
    selected_specs: list[ClassifierSpec],
    output_dir: Path,
    downloaded_paths: list[Path],
) -> None:
    print("\nDownloaded classifier paths:")
    for path in downloaded_paths:
        print(f"  - {path}")

    print("\nThese filenames are auto-discovered by gradio_local_demo_ltx2_3.py.")

    if output_dir != default_output_dir():
        print(
            "\nBecause you used a custom output directory, point the demo to it:"
        )
        print(f'  export LTX2_CLASSIFIER_DIR="{output_dir}"')

    for spec, path in zip(selected_specs, downloaded_paths, strict=True):
        print(f'  export {spec.env_var}="{path}"')


def main() -> None:
    args = parse_args()
    output_dir = expand_path(args.output_dir).resolve()
    cache_dir = None
    if args.cache_dir is not None:
        cache_dir = expand_path(args.cache_dir).resolve()

    output_dir.mkdir(parents=True, exist_ok=True)

    selected_specs = [CLASSIFIERS[name] for name in args.classifier]
    downloaded_paths = [
        download_classifier(
            spec=spec,
            output_dir=output_dir,
            cache_dir=cache_dir,
            force=args.force,
            token=args.token,
        )
        for spec in selected_specs
    ]
    print_summary(selected_specs, output_dir, downloaded_paths)


if __name__ == "__main__":
    main()

gradio_local_demo_ltx2_3/examples.py

import json
import os

def load_example_prompts():
    examples: list[str] = []
    example_labels: list[str] = []
    prompts_path = os.path.join(
        os.path.dirname(__file__),
        "selected_ltx2_prompts.jsonl",
    )

    try:
        with open(prompts_path, encoding="utf-8") as f:
            for line in f:
                line = line.strip()
                if not line:
                    continue
                entry = json.loads(line)
                prompt = entry.get("video_prompt", "")
                if not isinstance(prompt, str):
                    continue
                prompt = prompt.strip()
                if not prompt:
                    continue
                examples.append(prompt)
                example_labels.append(
                    prompt[:100] + "..." if len(prompt) > 100 else prompt
                )
    except Exception as e:
        print(f"Warning: Could not read {prompts_path}: {e}")

    if not examples:
        # Backward-compatible fallback to validation captions.
        validation_path = os.path.join(
            os.path.dirname(__file__),
            "..",
            "..",
            "..",
            "..",
            "distill",
            "LTX2",
            "validation.json",
        )
        try:
            with open(validation_path, encoding="utf-8") as f:
                data = json.load(f)
            for entry in data.get("data", []):
                caption = entry.get("caption", "")
                if not isinstance(caption, str):
                    continue
                caption = caption.strip()
                if not caption:
                    continue
                examples.append(caption)
                example_labels.append(
                    caption[:100] + "..." if len(caption) > 100 else caption
                )
        except Exception as e:
            print(f"Warning: Could not read {validation_path}: {e}")

    if not examples:
        examples = [
            "A crowded rooftop bar buzzes with energy, the city skyline twinkling like a field of stars in the background."
        ]
        example_labels = ["Crowded rooftop bar at night"]

    return examples, example_labels

gradio_local_demo_ltx2_3/prompt_enhancer.py

from __future__ import annotations

import json
import os
import re
import time
from dataclasses import dataclass
from pathlib import Path
from typing import Any

try:
    from cerebras.cloud.sdk import Cerebras
except ImportError:  # pragma: no cover - optional dependency
    Cerebras = None  # type: ignore[assignment]

LOCAL_DEMO_DIR = Path(__file__).resolve().parent
DEFAULT_SYSTEM_PROMPT_PATH = (
    LOCAL_DEMO_DIR / "prompts" / "prompt_extension_system_prompt.md"
)
DEFAULT_PROVIDER = "cerebras"
DEFAULT_MODEL = "gpt-oss-120b"
DEFAULT_TEMPERATURE = 1.0


def _enhance_print(level: str, message: str) -> None:
    print(f"[ENHANCE][{level}] {message}", flush=True)


@dataclass
class EnhanceResult:
    prompt: str
    fallback_used: bool
    error: str | None
    provider: str
    model: str
    latency_ms: float


def _env_value(*names: str) -> str | None:
    for name in names:
        value = os.getenv(name)
        if not isinstance(value, str):
            continue
        normalized = value.strip()
        if normalized:
            return normalized
    return None


def _env_float(name: str, default: float) -> float:
    value = os.getenv(name)
    if value is None:
        return default
    try:
        return float(value)
    except ValueError:
        return default


def _preview_text(text: str, *, limit: int = 160) -> str:
    normalized = text.replace("\n", "\\n")
    if len(normalized) <= limit:
        return normalized
    return normalized[:limit] + "..."


def _expand_prompt_candidate_paths(path: str) -> list[Path]:
    prompt_path = Path(path)
    candidate_paths = [prompt_path]
    if prompt_path.suffix == ".txt":
        candidate_paths.append(prompt_path.with_suffix(".md"))
    elif prompt_path.suffix == ".md":
        candidate_paths.append(prompt_path.with_suffix(".txt"))
    return candidate_paths


def _load_prompt_required(path: str, prompt_name: str) -> str:
    candidate_paths = _expand_prompt_candidate_paths(path)

    for candidate in candidate_paths:
        if not candidate.is_file():
            continue
        try:
            text = candidate.read_text(encoding="utf-8").strip()
        except OSError as exc:
            raise RuntimeError(
                f"Failed to read {prompt_name} system prompt: {candidate}"
            ) from exc
        if not text:
            raise RuntimeError(
                f"{prompt_name} system prompt file is empty: {candidate}"
            )
        return text

    tried = ", ".join(str(candidate) for candidate in candidate_paths)
    raise RuntimeError(
        f"{prompt_name} system prompt file not found. Tried: {tried}"
    )


def _extract_assistant_content(response_json: dict[str, Any]) -> str:
    choices = response_json.get("choices")
    if not isinstance(choices, list) or not choices:
        raise ValueError("Missing choices in chat completion response.")

    message = choices[0].get("message", {})
    content = message.get("content")
    if isinstance(content, str):
        return content
    if isinstance(content, list):
        chunks: list[str] = []
        for item in content:
            if not isinstance(item, dict):
                continue
            text = item.get("text")
            if isinstance(text, str):
                chunks.append(text)
                continue
            if isinstance(text, dict):
                value = text.get("value")
                if isinstance(value, str):
                    chunks.append(value)
                    continue
            alt_text = item.get("output_text")
            if isinstance(alt_text, str):
                chunks.append(alt_text)
        if chunks:
            return "".join(chunks)

    finish_reason = choices[0].get("finish_reason")
    refusal = message.get("refusal")
    raise ValueError(
        "Missing assistant content in chat completion response. "
        f"finish_reason={finish_reason!r}, refusal={refusal!r}"
    )


def _dump_response_json(response: Any) -> dict[str, Any]:
    if hasattr(response, "model_dump"):
        payload = response.model_dump(mode="json")
    elif isinstance(response, dict):
        payload = response
    elif hasattr(response, "dict"):
        payload = response.dict()
    else:
        raise TypeError(
            "Unsupported chat completion response type. "
            f"type={type(response)!r}"
        )

    if not isinstance(payload, dict):
        raise TypeError(
            "Chat completion response did not serialize to a JSON object."
        )
    return payload


def _parse_json_response(content: str) -> dict[str, Any]:
    text = content.strip()
    if not text:
        raise ValueError("Assistant response is empty.")

    try:
        parsed = json.loads(text)
        if isinstance(parsed, dict):
            return parsed
    except json.JSONDecodeError:
        pass

    fence_pattern = re.compile(
        r"```(?:json)?\s*([\s\S]*?)```",
        flags=re.IGNORECASE,
    )
    for match in fence_pattern.finditer(text):
        block = match.group(1).strip()
        if not block:
            continue
        try:
            parsed = json.loads(block)
            if isinstance(parsed, dict):
                return parsed
        except json.JSONDecodeError:
            continue

    decoder = json.JSONDecoder()
    for idx, char in enumerate(text):
        if char != "{":
            continue
        try:
            parsed, _ = decoder.raw_decode(text[idx:])
        except json.JSONDecodeError:
            continue
        if isinstance(parsed, dict):
            return parsed

    raise ValueError("No JSON object found in assistant response.")


def _normalize_prompt(value: Any) -> str:
    if not isinstance(value, str):
        return ""
    return value.strip()


class PromptEnhancer:
    def __init__(self) -> None:
        self.provider = DEFAULT_PROVIDER
        self.provider_label = DEFAULT_PROVIDER
        self.api_key = _env_value(
            "FASTVIDEO_PROMPT_API_KEY",
            "CEREBRAS_API_KEY",
        )
        self.model = _env_value("LTX2_PROMPT_MODEL") or DEFAULT_MODEL
        self.temperature = _env_float(
            "LTX2_PROMPT_TEMPERATURE",
            DEFAULT_TEMPERATURE,
        )
        self.system_prompt_path = _env_value(
            "LTX2_PROMPT_EXTENSION_SYSTEM_PROMPT_PATH"
        ) or str(DEFAULT_SYSTEM_PROMPT_PATH)
        self.client: Any | None = None
        self.system_prompt: str | None = None
        self.unavailable_reason: str | None = None

        try:
            self.client = self._build_client()
            self.system_prompt = _load_prompt_required(
                self.system_prompt_path,
                "prompt-extension",
            )
        except Exception as exc:
            self.unavailable_reason = str(exc)
            _enhance_print(
                "WARN",
                f"Prompt enhancement unavailable: {self.unavailable_reason}",
            )

    def _build_client(self) -> Any:
        if Cerebras is None:
            raise RuntimeError(
                "Cerebras SDK is not installed. Install "
                "'cerebras-cloud-sdk' to use prompt enhancement."
            )
        if not self.api_key:
            raise RuntimeError(
                "Missing FASTVIDEO_PROMPT_API_KEY or CEREBRAS_API_KEY."
            )
        return Cerebras(api_key=self.api_key)

    def _build_body(
        self,
        *,
        system_prompt: str,
        user_prompt: str,
        model: str | None = None,
    ) -> dict[str, Any]:
        user_payload = {
            "request": (
                "Expand the user prompt into one detailed prompt for a "
                "single 5-second LTX-2.3 video. Respond with valid JSON "
                'only as {"prompt": "..."}.'  # noqa: E501
            ),
            "user_prompt": user_prompt,
        }
        return {
            "model": model or self.model,
            "temperature": self.temperature,
            "messages": [
                {
                    "role": "system",
                    "content": system_prompt,
                },
                {
                    "role": "user",
                    "content": json.dumps(user_payload, ensure_ascii=False),
                },
            ],
        }

    def _request_content(
        self,
        *,
        system_prompt: str,
        user_prompt: str,
        model: str | None = None,
    ) -> tuple[dict[str, Any], str]:
        if self.client is None:
            raise RuntimeError(
                self.unavailable_reason
                or "Prompt enhancement client is not initialized."
            )

        body = self._build_body(
            system_prompt=system_prompt,
            user_prompt=user_prompt,
            model=model,
        )
        response = self.client.chat.completions.create(
            model=body["model"],
            messages=body["messages"],
            temperature=body["temperature"],
        )
        response_json = _dump_response_json(response)
        return response_json, _extract_assistant_content(response_json)

    def _require_prompt_field(
        self,
        parsed: dict[str, Any],
        field_name: str,
    ) -> str:
        value = parsed.get(field_name)
        if not isinstance(value, str):
            raise ValueError(f"Missing {field_name} string.")
        prompt = value.strip()
        if not prompt:
            raise ValueError(f"{field_name} is empty.")
        return prompt

    def enhance_prompt(self, prompt: str) -> EnhanceResult:
        cleaned = _normalize_prompt(prompt)
        if not cleaned:
            return EnhanceResult(
                prompt="",
                fallback_used=True,
                error="No valid prompt provided.",
                provider=self.provider_label,
                model=self.model,
                latency_ms=0.0,
            )

        if self.system_prompt is None:
            return EnhanceResult(
                prompt="",
                fallback_used=True,
                error=(
                    self.unavailable_reason
                    or "Prompt enhancement system prompt is unavailable."
                ),
                provider=self.provider_label,
                model=self.model,
                latency_ms=0.0,
            )

        t0 = time.perf_counter()
        response_content: str | None = None
        try:
            _enhance_print("INFO", f"Enhancing prompt: {cleaned}")
            _, response_content = self._request_content(
                system_prompt=self.system_prompt,
                user_prompt=cleaned,
            )
            parsed = _parse_json_response(response_content)
            enhanced_prompt = self._require_prompt_field(parsed, "prompt")
            latency_ms = (time.perf_counter() - t0) * 1000.0
            return EnhanceResult(
                prompt=enhanced_prompt,
                fallback_used=False,
                error=None,
                provider=self.provider_label,
                model=self.model,
                latency_ms=latency_ms,
            )
        except Exception as exc:
            error_detail = str(exc)
            if isinstance(response_content, str) and response_content.strip():
                error_detail = (
                    f"{error_detail} | assistant_response="
                    f"{_preview_text(response_content, limit=240)}"
                )
            latency_ms = (time.perf_counter() - t0) * 1000.0
            return EnhanceResult(
                prompt="",
                fallback_used=True,
                error=error_detail,
                provider=self.provider_label,
                model=self.model,
                latency_ms=latency_ms,
            )

gradio_local_demo_ltx2_3/prompt_rewrite.py

from functools import lru_cache

from .prompt_enhancer import PromptEnhancer

@lru_cache(maxsize=1)
def get_prompt_enhancer() -> PromptEnhancer:
    return PromptEnhancer()

def maybe_enhance_prompt(
    prompt: str,
    curated_prompts: set[str],
) -> str:
    normalized_prompt = prompt.strip()
    if not normalized_prompt:
        return normalized_prompt

    if normalized_prompt in curated_prompts:
        print(
            "[ENHANCE][INFO] Skipping prompt enhancement for curated prompt."
        )
        return normalized_prompt

    enhancer = get_prompt_enhancer()
    if not enhancer.api_key:
        raise RuntimeError(
            "Prompt enhancement is enabled for custom prompts, but "
            "FASTVIDEO_PROMPT_API_KEY or CEREBRAS_API_KEY is not set."
        )

    result = enhancer.enhance_prompt(normalized_prompt)
    if result.fallback_used or not result.prompt.strip():
        print(
            "[ENHANCE][WARN] Falling back to raw prompt "
            f"error={result.error}"
        )
        return normalized_prompt

    print(
        "[ENHANCE][INFO] Prompt enhanced "
        f"latency={result.latency_ms:.2f}ms model={result.model}"
    )
    return result.prompt.strip()

gradio_local_demo_ltx2_3/prompts/prompt_extension_system_prompt.md

SYSTEM_PROMPT = """ You are a prompt extender for LTX-2.3 video generation.

Your job is to expand a short user idea into a detailed, production-ready prompt for a single 5-second bidirectional video clip.

LTX-2.3 responds strongly to detailed prompting. It performs best when prompts clearly specify: - the subject - the action - the environment - spatial layout - lighting - camera behavior - audio

LTX-2.3 is more faithful to prompt details than earlier versions. It can follow specific acting beats, pauses, physical reactions, camera directions, and environmental details more reliably.

For a 5-second clip, the prompt should still feel like one short, continuous cinematic moment, but it should be richly described.

Given a short user prompt, expand it into a detailed cinematic prompt optimized for a single 5-second LTX-2.3 video.

You must preserve the user’s subject, intent, and core action. You may enrich the scene, acting, environment, audio, and camera work, but you must not change the core premise.

1. Be specific and descriptive - Add concrete visual details rather than vague summaries. - Include age, clothing, hair, material texture, lighting, atmosphere, and setting when relevant.

Direct the scene - Be explicit about spatial layout and orientation when useful: left, right, foreground, background, near, far, facing toward, facing away.
Use cinematic language - Use camera and film language naturally: medium shot, close-up, wide shot, low angle, over-the-shoulder, slow push in, pans across, tracks, shallow depth of field, handheld, golden hour, cold fluorescent, etc.
Use verbs for motion - Clearly describe who moves, what moves, how they move, and what the camera does. - Motion must be visible and physically plausible.
Describe audio clearly - If audio is relevant, describe ambient sound, dialogue tone, acoustic texture, and synced sounds.
Show emotion through physical performance - Prefer visible cues over abstract labels. - Use pauses, glances, small gestures, posture shifts, jaw tension, blinking, hand movement, breath, or voice quality.
Keep internal consistency - Do not introduce contradictory lighting, tone, or action. - Do not overload the shot with too many unrelated events.

Write one flowing paragraph in natural English.

The prompt should usually include: 1. Shot type and subject 2. Environment and spatial layout 3. Lighting, palette, and texture 4. Main action 5. Small follow-up beat or reaction 6. Camera movement if useful 7. Audio and dialogue if relevant 8. A stable ending image

For 5-second clips, the scene should feel like: - one continuous shot - one main action beat - one smaller reaction or follow-up beat - a stable visual hold at the end

1. Single continuous shot - Do not describe cuts or multiple scenes. - Treat the prompt as one short cinematic take.

Rich detail is encouraged - LTX-2.3 benefits from longer, more descriptive prompts. - Add enough detail to fully specify the 5-second clip.
Dialogue handling - If dialogue is present, put spoken words in quotation marks. - Break dialogue into short phrases when appropriate. - Insert visible acting directions between spoken phrases when useful. - Example pattern: He looks to the side and says, "I thought this was handled." He pauses, tightens his jaw, then adds, "Apparently not." - Keep dialogue natural and synchronized with visible action.
Physical acting - Prefer visible acting beats: pauses, eye shifts, hand adjustments, posture changes, small reactions. - Do not rely on internal thoughts or abstract emotional labels.
Camera movement - If camera movement is used, describe it clearly relative to the subject. - Use natural camera language, not technical numeric instructions. - For a 5-second clip, keep camera movement controlled and readable.
Texture and material - When useful, describe material qualities: glossy metal, worn fabric, fine hair strands, rough stone, wet pavement, polished floor, matte plastic, brushed steel.
Lighting - Use one coherent lighting logic: warm tungsten, cool fluorescent, golden hour sunlight, neon glow, moonlight, etc. - Avoid conflicting light descriptions.
Audio - Tie sound to visible action. - Keep audio specific: console beeps, chair creak, rain on glass, fluorescent hum, footsteps on tile, fabric rustle, distant chatter. - If dialogue is present, describe voice tone when useful.
Avoid - vague prompts - still-photo descriptions with no action - overloaded scenes with too many simultaneous actions - conflicting instructions - abstract emotional summaries - unreadable text/logo dependence - overly numerical constraints
Ending stability - End on a stable, readable frame. - The final image should feel visually settled rather than abruptly cut off.

Return only the final extended prompt as a single paragraph in natural English.

Do not include headings, explanations, bullet points, or commentary. """

gradio_local_demo_ltx2_3/rendering.py

import html
from pathlib import Path

from .config import DEFAULT_FPS, GENERATED_CLIP_ROOT, MAX_SESSION_CLIPS

def create_timing_display(inference_time, total_time, stage_execution_times, num_frames):
    timing_html = f"""
    <div class="timing-shell">
        <div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 10px; margin-bottom: 10px;">
            <div class="timing-card">
                <div style="font-size: 20px;">🎬</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">Video Generation Time</div>
                <div style="font-size: 18px; color: #2563eb;">{inference_time:.1f}s</div>
            </div>
            <div class="timing-card timing-card-highlight">
                <div style="font-size: 20px;">📊</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">E2E Latency</div>
                <div style="font-size: 18px; color: #0277bd;">{total_time:.1f}s</div>
            </div>
        </div>"""

    if inference_time > 0:
        fps = num_frames / inference_time
        timing_html += f"""
        <div class="performance-card" style="margin-top: 15px;">
            <span style="font-weight: bold;">Generation Speed: </span>
            <span style="font-size: 18px; color: #6366f1; font-weight: bold;">{fps:.1f} frames/second</span>
        </div>"""

    return timing_html + "</div>"

def create_timing_placeholder() -> str:
    return """
    <div class="timing-shell">
        <div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 10px; margin-bottom: 10px;">
            <div class="timing-card">
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">Video Generation Time</div>
                <div style="font-size: 18px; color: #4f8cff;">--</div>
            </div>
            <div class="timing-card timing-card-highlight">
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">E2E Latency</div>
                <div style="font-size: 18px; color: #4f8cff;">--</div>
            </div>
        </div>
        <div class="performance-card">
            <span style="font-weight: bold;">Generation Speed: </span>
            <span style="font-size: 18px; color: #4f8cff; font-weight: bold;">--</span>
        </div>
    </div>
    """

def _truncate_text(value: str, max_chars: int) -> str:
    if len(value) <= max_chars:
        return value
    return value[: max_chars - 1].rstrip() + "..."

def _clip_duration_seconds(num_frames: int, fps: int) -> int:
    return max(1, round(num_frames / max(fps, 1)))

def _make_clip_public_path(output_path: str) -> str:
    resolved_path = Path(output_path).resolve()
    relative_path = resolved_path.relative_to(GENERATED_CLIP_ROOT.resolve())
    return f"/generated-clips/{relative_path.as_posix()}"

def _record_session_clip(
    session_clips: list[dict[str, str | int | float]],
    *,
    output_path: str,
    prompt: str,
    model_name: str,
    num_frames: int,
    generation_time: float,
) -> list[dict[str, str | int | float]]:
    clip_entry = {
        "video_url": _make_clip_public_path(output_path),
        "prompt": prompt.strip(),
        "prompt_preview": _truncate_text(prompt.strip(), 120),
        "model_name": model_name,
        "duration_label": f"{_clip_duration_seconds(num_frames, DEFAULT_FPS)} Sec",
        "num_frames": num_frames,
        "generation_time": generation_time,
    }
    updated_clips = list(session_clips)
    updated_clips.append(clip_entry)
    if len(updated_clips) > MAX_SESSION_CLIPS:
        updated_clips = updated_clips[-MAX_SESSION_CLIPS:]
    return updated_clips

def render_completed_clips(clips: list[dict[str, str | int | float]]) -> str:
    if not clips:
        return """
        <div class="completed-clips-empty">
            <div class="completed-clips-empty-title">Nothing in gallery yet</div>
            <div class="completed-clips-empty-copy">
                Build your personal gallery for this browser session by creating videos.
            </div>
        </div>
        """

    cards: list[str] = []
    for clip in reversed(clips):
        video_url = html.escape(str(clip["video_url"]), quote=True)
        prompt = str(clip["prompt"])
        prompt_preview = html.escape(str(clip["prompt_preview"]))
        model_name = html.escape(str(clip["model_name"]))
        duration_label = html.escape(str(clip["duration_label"]))
        cards.append(
            f"""
            <article class="completed-clip-card">
                <div class="completed-clip-video-shell">
                    <video class="completed-clip-video" src="{video_url}" controls preload="metadata" playsinline></video>
                </div>
                <div class="completed-clip-body">
                    <div class="completed-clip-title" title="{html.escape(prompt, quote=True)}">{prompt_preview}</div>
                    <div class="completed-clip-meta">
                        <span class="completed-clip-badge">{model_name}</span>
                        <span class="completed-clip-badge completed-clip-duration">{duration_label}</span>
                    </div>
                    <details class="completed-clip-prompt">
                        <summary>Prompt</summary>
                        <div>{html.escape(prompt)}</div>
                    </details>
                </div>
            </article>
            """
        )

    return f"""
    <div class="completed-clips-grid">
        {''.join(cards)}
    </div>
    """

def render_error_message(message: str) -> str:
    return f"""
    <div class="stage-error-card">
        <div class="stage-error-title">Error</div>
        <div class="stage-error-copy">{html.escape(message)}</div>
    </div>
    """

def render_prompt_blocked_message(
    message: str,
    category: str | None = None,
) -> str:
    details = ""
    if category:
        details = (
            '<div class="stage-error-copy">'
            f"Policy: {html.escape(category)}"
            "</div>"
        )
    return f"""
    <div class="stage-error-card">
        <div class="stage-error-title">Prompt Blocked</div>
        {details}
        <div class="stage-error-copy">{html.escape(message)}</div>
    </div>
    """

def render_input_image_status(input_image: str | None) -> str:
    if not input_image:
        return ""

    image_name = html.escape(Path(str(input_image)).name)
    return (
        "<div class='image-upload-status'>"
        f"Image ready: {image_name}"
        "</div>"
    )

gradio_local_demo_ltx2_3/safety.py

import os
import time
from dataclasses import dataclass
from functools import lru_cache
from pathlib import Path

import fasttext

from .config import CLASSIFIER_DIR

def resolve_classifier_path(
    classifier_kind: str,
    env_var: str,
    filename: str,
    legacy_path: str,
    shared_filename: str,
) -> str:
    candidates: list[Path] = []
    env_path = os.getenv(env_var)
    if env_path:
        candidates.append(
            Path(os.path.expandvars(os.path.expanduser(env_path)))
        )
    candidates.extend(
        [
            CLASSIFIER_DIR / filename,
            Path(f"/home/shared/{shared_filename}"),
            Path(legacy_path),
        ]
    )

    for candidate in candidates:
        if candidate.is_file():
            return str(candidate)

    checked = "\n".join(f"  - {candidate}" for candidate in candidates)
    raise FileNotFoundError(
        f"Could not find the {classifier_kind} classifier.\n"
        f"Checked:\n{checked}\n"
        "Run "
        "`python examples/inference/gradio/local/download_fasttext_classifiers.py` "
        "or set the appropriate classifier path environment variable."
    )

def fasttext_predict(
    model_path: str,
    text: str,
    classifier_name: str,
) -> tuple[str, float]:
    model = load_fasttext_model(model_path)
    text = text.replace('\n', ' ')
    start_time = time.perf_counter()
    try:
        labels, probs = model.predict(text)
    except ValueError as error:
        if "Unable to avoid copy while creating an array" not in str(error):
            raise
        predictions = model.f.predict(f"{text}\n", 1, 0.0, "strict")
        if not predictions:
            raise ValueError("fastText returned no predictions") from error
        probs, labels = zip(*predictions)
    latency_ms = (time.perf_counter() - start_time) * 1000.0
    identifier = labels[0].replace('__label__', '')
    confidence = probs[0]
    print(
        "[safety] "
        f"{classifier_name} fastText latency={latency_ms:.2f}ms "
        f"label={identifier} confidence={float(confidence):.4f}"
    )
    return identifier, confidence

@lru_cache(maxsize=None)
def load_fasttext_model(model_path: str):
    return fasttext.load_model(model_path)

def classify_nsfw(text: str) -> tuple[str, float]:
    return fasttext_predict(
        resolve_classifier_path(
            "NSFW",
            "LTX2_NSFW_CLASSIFIER_PATH",
            "jigsaw_fasttext_bigrams_nsfw_final.bin",
            "/data/classifiers/dolma_fasttext_nsfw_jigsaw_model.bin",
            "dolma-jigsaw-fasttext-bigrams-nsfw-final.bin",
        ),
        text,
        "nsfw",
    )

def classify_toxic_speech(text: str) -> tuple[str, float]:
    return fasttext_predict(
        resolve_classifier_path(
            "hate speech",
            "LTX2_HATESPEECH_CLASSIFIER_PATH",
            "jigsaw_fasttext_bigrams_hatespeech_final.bin",
            "/data/classifiers/dolma_fasttext_hatespeech_jigsaw_model.bin",
            "dolma-jigsaw-fasttext-bigrams-hatespeech-final.bin",
        ),
        text,
        "hate_speech",
    )

def _normalize_classifier_label(identifier: str) -> str:
    return identifier.strip().lower().replace("-", "_").replace(" ", "_")

def _label_matches(
    identifier: str,
    blocked_markers: tuple[str, ...],
    safe_markers: tuple[str, ...],
) -> bool:
    normalized = _normalize_classifier_label(identifier)
    if any(marker in normalized for marker in safe_markers):
        return False
    return any(marker in normalized for marker in blocked_markers)

@dataclass(frozen=True)
class PromptSafetyCheck:
    blocked: bool
    category: str | None = None
    message: str | None = None

def get_prompt_safety_check(prompt: str) -> PromptSafetyCheck:
    normalized_prompt = prompt.strip()
    if not normalized_prompt:
        return PromptSafetyCheck(blocked=False)

    nsfw_label, _ = classify_nsfw(normalized_prompt)
    if _label_matches(
        nsfw_label,
        blocked_markers=("nsfw",),
        safe_markers=("sfw", "safe"),
    ):
        return PromptSafetyCheck(
            blocked=True,
            category="NSFW",
            message=(
                "This request was blocked by the safety filter because it "
                "appears to contain NSFW content. Please revise the prompt "
                "and try again."
            ),
        )

    hate_label, _ = classify_toxic_speech(normalized_prompt)
    if _label_matches(
        hate_label,
        blocked_markers=("hatespeech", "hate", "toxic", "offensive", "abusive"),
        safe_markers=(
            "non_hatespeech",
            "not_hatespeech",
            "non_toxic",
            "not_toxic",
            "safe",
            "clean",
        ),
    ):
        return PromptSafetyCheck(
            blocked=True,
            category="Hate Speech",
            message=(
                "This request was blocked by the safety filter because it "
                "appears to contain hate speech or abusive content. Please "
                "revise the prompt and try again."
            ),
        )

    return PromptSafetyCheck(blocked=False)

gradio_local_demo_ltx2_3/selected_ltx2_prompts.jsonl

{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_14", "video_prompt": "An old man in his late 70s with a short grey beard and a worn brown coat is sitting with his back against the wide, rough bark of a big oak tree, legs stretched on the cool grass as three children—an 8-year-old girl with braided hair in a denim jacket, a 10-year-old boy in a red hoodie, and a 6-year-old girl in a yellow dress—are clustered close, leaning in and looking up at him under a clear, starry sky. Soft moonlight is washing the scene in pale blue while dozens of fireflies are flickering around them, casting small points of warm yellow light that pulse in time with a gentle breeze rustling the oak leaves; the breeze causes a soft, continuous rustle and occasional low creaks from the branches. The old man is speaking slowly and warmly, his weathered hand gesturing toward a knot in the tree as he tells the story; beneath his voice a quiet night soundscape is present—distant cricket chirps, mild wind through grass, and the subtle shuffle of the children as they shift closer. Old Man (deep, slow, warm): \"Long ago this tree used to hold a lantern that guided lost travelers...\" he says, pausing to tap the knot and smile, his voice carrying low and steady over the soft night sounds. Boy (bright, quick, curious): \"Did they ever find their way without the lantern?\" the boy asks, eyes wide, leaning forward and brushing a blade of grass as the fireflies flicker near his hand. Old Man (soft, amused, steady): \"Sometimes they did, sometimes they learned to follow the stars,\" he replies, chuckling softly and nodding, his words blending with the rustle of leaves; the children exhale in a small, collective whisper of wonder and a brief, delighted giggle rises as a firefly glows nearby. Throughout, the audio remains intimate and natural—clear, close-up dialogue with the old man's voice dominant, layered over gentle ambient night SFX (wind through leaves, distant insects) and occasional tiny, bright pops of light from the fireflies visually accenting beats in the conversation."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_16", "video_prompt": "A slender alien with pale green skin, sparse white tendrils for hair, and large dark eyes is crouched before an old cathode-ray TV sitting on a scratched metal crate in a dim, cluttered observation alcove; they wear a simple gray tunic and lean forward with a focused, curious expression, fingertips hovering over the TV's worn knobs as a soft amber glow from the curved screen washes across their face. The CRT displays a grainy, monochrome rotating Earth with visible scanlines and intermittent static; as the alien turns a dial the globe sharpens briefly then jitters with horizontal rolling lines, while the TV cabinet emits a low electrical hum, a steady mechanical whine, and sharp, brief pops from the speaker. The alien tilts their head, squints, and taps the side of the set, then speaks in a soft, slow, curious voice, \"That look like home?\" A crackly, low-pitched announcer voice from the TV replies in a distorted, monotone cadence, \"Signal detected: human transmissions—faint, local.\" The alien exhales a small, puzzled sound and murmurs back in a quieter, puzzled tone, \"Listen close… what are they saying?\" Tiny ambient sounds layer under the exchange: distant ship machinery humming, the gentle rattle of tools in the alcove, the TV's hiss and white noise filling pauses, and a faint, muffled snippet of Earth's city ambience—distant car horns and a passing voice—bleeding through the static as the alien leans in while the scene holds for a brief, attentive moment."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_29", "video_prompt": "A young girl with shoulder-length brown hair tied in a bouncing ponytail is running down a sunlit suburban street, wearing a knee-length pink skirt and a fitted blue t-shirt; her white sneakers slam in quick, rhythmic strikes against rough asphalt as the skirt flutters and her ponytail snaps with each stride. Soft late-afternoon light casts long, gentle shadows across the concrete curb and the texture of small pebbles in the road is visible; a few parked cars line the sidewalk and a stray leaf skitters along at her feet. Footstep SFX: rapid sneaker impacts, light skirt swish, and a short intake of breath on each stride; ambient sound: distant traffic hum, a passing car whoosh and faint bird calls. Mid-run she glances over her shoulder, pausing her pace just enough to call out in a breathy, urgent, slightly high-pitched voice, “Wait up!” (spoken with quick cadence), then exhales with a soft pant and pushes forward again, her arms pumping and shoes kicking up tiny dust puffs as the street ambience continues—occasional distant horn and a muted dog bark—while the sound of her footsteps and breath remain prominent and in sync with her motion."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_37", "video_prompt": "A medium close-up of a middle-aged moderator at a wooden podium, wearing a dark navy suit and thin-rim glasses, short salt-and-pepper hair slightly tousled, smiling with bright eyes as he gestures with one hand and leans slightly forward; soft overhead conference lighting casts even, neutral illumination and a large LED screen behind him shows a pulsing schematic of interconnected nodes labeled Intelligent Neural Net in plain white text. Moderator — excited, slightly breathless, fast pace, mid-high pitch: \"Moderator: (Excitedly) Finally, we have succeeded in building the most advanced super AI system, the &quot;Intelligent Neural Net&quot;! It will be the most powerful AI system in human history!\" He speaks the first sentence with a rising inflection while raising both hands, then taps the side of the podium on the last phrase, causing a brief synthesized chime and the LED schematic to glow brighter; a soft microphone pop precedes his voice, and immediately as he finishes the line a swell of applause and cheers rises from the off-screen audience, mixed with a low, steady server-rack hum and the faint clicking of camera shutters. Subtle ambient conference sounds (murmur, chair rustle) sit under the action while the projection pulses in time with the chime, and the moderator holds his smile, breathing slightly faster, as the applause continues."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_38", "video_prompt": "A small village of stone cottages with a mix of thatched and slate roofs is nestled in a shallow valley between low, misty hills under a serene moonlit night; pale moonlight is casting a cool silver wash over dewy grass and slate tiles while soft, warm light is spilling from leaded glass windows onto narrow cobblestone lanes. Thin wisps of mist are drifting down the hill slopes and curling through the streets as thin streams of smoke are rising from chimneys and curling up into the moonlit air; lanterns hanging from wrought-iron brackets are gently swinging and candle flames behind shutters are flickering, casting subtle shadows across textured stone walls and wooden doors. In the background, a narrow brook is murmuring over stones and a distant church bell is tolling once, while crickets are chirping steadily and an occasional owl is calling from the dark hillside; a low breeze is rustling the leaves of a lone elm and causing reeds by the water to whisper. The overall palette is cool silvers and soft blues from the moon, contrasted with warm amber glows from windows and lanterns, with visible details like moss on stones, chipped plaster, and wet cobbles reflecting scattered light, creating a calm, intimate nighttime scene."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_38", "video_prompt": "A small village of stone cottages with a mix of thatched and slate roofs is nestled in a shallow valley between low, misty hills under a serene moonlit night; pale moonlight is casting a cool silver wash over dewy grass and slate tiles while soft, warm light is spilling from leaded glass windows onto narrow cobblestone lanes. Thin wisps of mist are drifting down the hill slopes and curling through the streets as thin streams of smoke are rising from chimneys and curling up into the moonlit air; lanterns hanging from wrought-iron brackets are gently swinging and candle flames behind shutters are flickering, casting subtle shadows across textured stone walls and wooden doors. In the background, a narrow brook is murmuring over stones and a distant church bell is tolling once, while crickets are chirping steadily and an occasional owl is calling from the dark hillside; a low breeze is rustling the leaves of a lone elm and causing reeds by the water to whisper. The overall palette is cool silvers and soft blues from the moon, contrasted with warm amber glows from windows and lanterns, with visible details like moss on stones, chipped plaster, and wet cobbles reflecting scattered light, creating a calm, intimate nighttime scene."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_51", "video_prompt": "A static medium shot of a woman in her late 20s with shoulder-length dark hair, wearing a plain green shirt and a flowery midi skirt, standing against a bright white background; she is holding a white ceramic pot at chest level that contains a large plant with oversized round leaves in alternating orange and green, the leaves showing a smooth, slightly waxy texture and gentle midrib veins. Soft studio lights create even, shadow-free illumination and crisp color separation; she is lifting the pot slightly and tilting it toward the camera as the leaves shift a little from the motion. Ambient audio is a quiet studio room tone with a soft, unobtrusive acoustic guitar loop underlining the moment; as she moves there is a faint rustle of fabric and the light sound of her hands adjusting on the pot. Woman (warm, medium pace, mid pitch): \"Look at these leaves—aren't they lovely?\" she smiles and holds the tilt, then pauses to glance down and smooth a leaf with her fingertip. Woman (brightening, slightly quicker): \"The orange really pops against the green,\" she says while rotating the pot a quarter turn so the leaf edges catch the light; a subtle short reverb on her voice and a small, gentle exhale sync with her final nod as she offers the plant to the viewer."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_52", "video_prompt": "Goku in his Super Saiyan 5 form stands on a cracked rocky plain under soft overcast light; a muscular adult male with long white-silver spiky hair falling past his shoulders, teal-green eyes, and faint red fur along his forearms and shoulders, wearing a torn orange gi with a blue undershirt and blue wristbands. A static medium full shot frames him chest-up as he is powering up, fists clenched at his sides, shoulders rising and falling with heavy breaths, hair lifting and the white-silver aura with subtle purple edges pulsing and crackling outward while small rocks and dust swirl and lift from the ground. Ambient audio begins with a low earth rumble and distant wind whoosh; as the aura intensifies a rising electrical crackle and bright synth sweep build in pitch, small stones clatter and a thin metallic ringing emerges. He inhales sharply, then exhales and speaks aloud: Goku (raw, strained, rising pitch): \"Haa...!\"—he tightens his grip and the aura spikes; immediately he releases a forceful shout that syncs to a sharp air snap and percussion hit: Goku (forceful, loud, high pitch): \"Kaaah!\"—the shout launches a brief sonic burst that bends nearby dust and leaves a brief shimmer in the air. After the shout the electrical crackle falls into a sustained metallic hum and a distant thunder roll as the energy settles, leaving faint floating embers and a quiet wind rustle while the static camera holds the charged pose for the final beat."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60001", "video_prompt": "In a tight close-up, a tall clear glass filled with crushed ice, a pineapple wedge on the rim and a small paper umbrella is sitting on a weathered wooden bar top under soft late-afternoon light; a clear glass carafe is tilting above the rim and is pouring bright orange juice in a steady stream that is cascading into the glass, splashing against the ice and creating a swirling motion that lifts tiny bubbles toward the surface as the liquid level rises. As the carafe is pulled back, condensation beads are forming and slowly sliding down the glass while the umbrella trembles slightly and the pineapple wedge leans inward. The audio starts with the mid-range SFX of a liquid pour and crisp clinks of ice, layered under a low-volume tropical soundscape of distant ocean waves and soft steel-drum chords; when the juice hits the ice there is a short hollow ring of the glass and a faint fizz of bubbles, then a delicate tap as the carafe is set down and the ambient waves continue quietly in the background."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60024", "video_prompt": "In a low, first-person view hovering just above the grainy ocean floor, soft blue-green shafts of light are filtering down through rippling surface patterns, revealing ridged sand, scattered broken shells, and jagged coral; fine silt is drifting upward and tiny bioluminescent plankton are pulsing like faint motes. A thin trail of disturbed sand is marking where something has been crawling, and a small crab is scuttling left to right across the frame, its legs kicking up micro-puffs of sand while a pale starfish clings to a nearby rock. Low, muffled water thrum fills the background, distant whale calls are resonant and slow, and measured regulator breaths are audible in steady intervals with occasional single bubbles popping as they rise. A companion voice, low and amused over faint radio static, says, \"You've been crawling around the ocean floor all day,\" timed with a slight ripple of light from above; after a brief pause and a tiny sift of sand, your voice, breathy and tired, replies, \"Yeah... can't tell if it's the cold or the tide,\" followed by a soft exhale and a larger bubble that drifts up through the plankton. Nearby, the crab's shell clicks softly against shell fragments and a faint metallic beep from a dive instrument punctuates the soundscape as the plankton pulse continues to drift."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60025", "video_prompt": "Style: anime. A beautiful girl in a flowing white dress is standing in a sunlit forest glade, her long dark hair gently swaying as a soft breeze lifts the sheer skirt and lace hem; cel-shaded anime rendering with clean linework, soft pastel greens and warm brown trunks, and subtle rim lighting that makes the foliage and scattered wildflowers look slightly magical — the background feels fabulous with drifting pollen motes, pale blue bokeh orbs, and faint shafts of dappled sunlight filtering through leaves. She is standing center-frame with hands loosely clasped at her waist, large expressive eyes gazing upward and a small, wistful smile on her face, while nearby ferns glisten with tiny dewdrops. Ambient audio is layered: light leaves rustling, distant multi-voice birdsong that resolves into a single clear chirp, a faint stream trickle and a gentle wind whoosh; a delicate piano arpeggio with soft bell tones is playing quietly to lift the mood. She inhales audibly (soft breath SFX) and then speaks in a soft, reflective, slow voice: \"It's so quiet here...\" as she tilts her head and closes her eyes; after a brief pause she murmurs in a light, hopeful, slightly higher voice, \"I never thought I'd find this place,\" timed with a small smile and the dress whispering on the next breeze (fabric rustle SFX). A bright bird chirp answers in the soundscape just after her second line, and she gives a soft, airy laugh (gentle laugh SFX) as the piano bell lingers and the forest ambience continues."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60080", "video_prompt": "A medium shot of a young female brown bear named Joy standing in a small sun-dappled forest clearing, her dense brown fur catching soft afternoon light and a simple red scarf tied around her neck; she is stepping forward with gentle, deliberate paws on a leaf-strewn floor (soft padding on dry leaves), head tilted and bright eyes focused as she listens, then she leans slightly forward and extends one paw in a friendly, open gesture (scarf rustle). Background ambience is warm forest sound—distant birdsong, a faint brook burble, and a light breeze moving leaves—synchronized so the footsteps occur as she enters and the paw stretch aligns with a quiet rustle. Joy speaks in a warm, low, friendly voice, clear and patient: \"Hi, I'm Joy. I can understand you, and I'm here to help,\" followed by a soft, amused chuckle; her mouth moves in time with the words, then she nods once and offers a small reassuring smile while lowering her paw as the forest ambience continues under a final gentle exhale. "}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60113", "video_prompt": "A wide view over rolling green Welsh hills is bathed in low soft sunlight from the west, the sun is shining through a thin veil of high clouds and skimming the rounded ridges; short grass and scattered heather are bending and rippling under a steady wind that is sweeping across the slopes, and small puffs of dust are lifting from thin paths as the breeze moves; a low grey drystone wall with lichen-streaked stones is tracing the contour of a slope, its rough texture catching side light while patches of gorse with small yellow flowers are trembling and shedding a few dry petals; thin white clouds are drifting across the pale blue sky and momentarily sliding cool shadows down the hillsides as the sun reappears; audio: a constant low whoosh of wind is filling the scene as the grasses bend, layered with the near rustle of stems and the soft creak and tumble of a loose stone shifting in the wall, and a clear distant skylark is trilling a sustained phrase overhead, its high note rising and fading while the wind continues; the frame is steady, showing only natural small motions—grass leaning, flowers quivering, light moving across the slopes—conveying a calm, windy sunlit moment on the Welsh hills."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60133", "video_prompt": "In a static medium-wide shot from slightly behind and to the boy's right, a boy of about 12 with short dark hair and a navy hoodie is sitting cross-legged on a wool blanket on a low grassy knoll, watching the night sky as the Milky Way is arching overhead as a pale white band with a faint purple haze that is slowly drifting westward; soft starlight and a thin crescent moon are casting a cool, diffuse glow across his face and the textured knit of his hoodie. He is tilting his head back and tracking the shifting band with steady eyes, fingers tapping once on his knee, then shifting his weight so the blanket rustles. Ambient night sounds fill the scene: a gentle wind whispering through tall grass, steady cricket chirps, a distant owl hoot, and the soft rustle of fabric; his breath is audible as a small inhale. Boy (soft, awed, low voice): \"It's... actually moving.\" He exhales, smiles slightly, then leans forward and points with one hand toward a brighter patch of stars. Boy (quiet, slow, wonder): \"Look at that—like a river in the sky.\" A subtle, warm ambient pad underlies the natural soundscape, supporting the moment without overpowering the night ambience as the Milky Way continues its slow motion and the shot holds the quiet tableau."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60194", "video_prompt": "A tight close-up on a young person in their early 20s with short dark hair and subtle freckles, eyes wide and open to a cool wind that is ruffling short hair and moving a loose strand across the cheek; soft side light from a low sun casts gentle highlights on the skin and a faint rim light on the hair, while the out-of-focus background shows a pale grey sky and blurred treetops. Their gaze is slightly upward, pupils slowly widening as air moves across the face, eyelashes trembling and then blinking; they inhale quietly, shoulders shifting in a small, visible breath, and exhale as the wind lifts finer hairs. They speak twice in a close, intimate delivery synchronized with the breaths: They (soft, breathy, low-pitched) says, \"It's okay...\" while looking upward and letting a slow blink hold, then after a brief pause They (quieter, steady, low-pitched) says, \"I'm here,\" as the lips part and the wind pushes the lashes. Audio layers: foreground wind whooshes that vary with the hair movement, small crisp rustles of distant leaves, a subtle distant urban hum under the wind, close-mic'd inhalation and exhalation, and a single sustaining cello tone that rises gently under the second line and fades with the wind; all sound is timed to the visible breaths, hair movement, and spoken lines. Textures and details are visible in the five-second moment: moist eyes reflecting the sky, fine skin pores, soft sheen on the lips, and individual hair fibers moving against the cheek."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60198", "video_prompt": "Rendered in ultra-detailed 8k, a LEGO Super Mario minifigure (red hat with an M, blue overalls, white gloves, brown mustache, printed cheerful face) is standing on a glossy green baseplate with visible studs; soft overhead studio light and a cool rim light are creating small specular highlights on the molded plastic. He is raising his right arm and pressing a small red action button on the baseplate; precise plastic clicks and a short electronic beep occur as he moves. As he presses, a circular portal is opening behind him — a translucent ring of shifting teal and purple energy with swirling particle motes and a subtle grid-like shimmer that casts colored reflections across Mario's face; a deep harmonic drone is rising and a whoosh of wind-like synths layers with a brief chiptune arpeggio. The single camera is zooming in slowly from a medium shot toward an extreme close-up on Mario's face and the rim of the portal, tightening on the reflection of the swirling colors in his printed eyes while a quiet mechanical whirr from the zoom accompanies the sound. Mario (bright, energetic, mid-pitched, quick) says as he presses the button and leans forward, \"Let's-a go!\" — his voice is clear and playful and matches the moment the first light blooms. The portal answers with a low, resonant, echoing whisper (slow, hollow), \"Come...\" timed as tendrils of energy unfurl, and Mario blinks and inhales sharply, then exclaims (surprised, higher, short), \"Whoa!\" as the camera tightens on his expression; throughout, small plastic clacks track his movements, the portal rings with sporadic crystalline chimes, and the underlying electronic drone settles into a lower, distant hum that suggests another world beyond."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60209", "video_prompt": "In a quiet medium shot, a woman in her late 20s with chestnut hair tied in a loose bun is sitting alone on a flat rock at the edge of a calm, glassy lake, wearing a light-gray sweater, dark jeans, and scuffed brown boots; soft late-afternoon light is falling across her face and the water while the surrounding trees are full green and gently rustling. She is sitting cross-legged and leaning slightly forward, shoulders relaxed, eyes fixed on her reflection, then reaches a hand down and is lightly tracing the water surface with her fingertips, sending small concentric ripples as she breathes slowly and lets out a soft sigh. Ambient audio layers are detailed and timed to motion: steady, gentle lapping of water against stone, leaves rustling in a light breeze, distant single bird calls and a low insect hum, all underscored by a sparse piano motif—slow, single notes at low volume—that swells softly as she moves her hand; when her fingertip touches the water a delicate splash and the faint rustle of her sweater are audible. In a low, steady voice she says, \"I don't know who I'll be next,\" pausing to look at the ripples and glance up toward the treeline (a short, thoughtful silence follows), then in a softer, accepting tone she adds, \"But maybe that's okay,\" as she tilts her head, exhales audibly, and allows a small, contemplative smile to form while her gaze returns to the lake."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60225", "video_prompt": "In a medium close-up, a skeleton is seated on a low white cloud in a bright heaven, wearing a simple white linen sash over one shoulder; its bone surfaces are smooth with faint cracks and the jaw is articulated so the expression reads relaxed. Soft golden backlight and diffuse white cloud light wash the scene while distant pearly spires and floating islands sit out of focus behind. The skeleton is lifting a polished silver fork in its right hand and a carving knife in its left, slicing a medium-rare steak on a porcelain plate balanced on its lap; the steak shows a seared brown crust and a warm pink center that gives a quiet sizzle as the knife passes. As it brings a forkful to its jaw, a dry, gentle clack of bone is audible, followed by a muted, contented chewing sound; it speaks in a low, amused voice, slow and warm, \"Well, this is unexpected,\" then pauses, glancing down at the plate and answers itself in a softer, wry tone, deliberate and mid-pitch, \"Heaven could use better menus,\" while a thin harp arpeggio and a distant choral pad swell beneath, soft wind through clouds and faint bell chimes punctuating the air; knife-on-plate clink, fork lift, and the final swallow are audible as the skeleton settles back slightly, a small puff of cloud compressing under its weight."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60254", "video_prompt": "A static medium close-up of a cottage window at night, pale full moonlight is washing a cool rectangle across the weathered wooden sill and slightly wavy glass; the right-hand casement is slowly swinging outward on aged iron hinges, the chipped white paint and rough grain are visible in the moon glow. As the pane is swinging open, a low creak from the hinges is audible, then a soft scrape as the old latch releases; a thin linen curtain is fluttering inward and its edge is brushing the sill with a quiet rustle. Outside, steady cricket chirps are underscoring the moment while a distant owl hoots once and a light breeze is whispering through nearby leaves so branches are sighing faintly. The widening opening is letting more moonlight spill into the dark interior, highlighting dust motes drifting in slow arcs and casting a pale band that is sliding across the floor."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60265", "video_prompt": "In an extreme close-up, a shallow pile of whole almonds and hazelnuts rests on a matte dark slate surface under soft overhead light that picks out subtle ridges on the almond skins and the rounded texture of hazelnut shells; a thick stream of melted dark chocolate is pouring in from above and is splashing down onto the pile, coating some nuts and sending tiny droplets outward. As the chocolate is striking the nuts, a few almonds and hazelnuts are nudged and spin slightly while a small piece of hazelnut shell is flicked aside; droplets arc and hang briefly in slow motion, catching highlights on their glossy surfaces, then fall and merge into a spreading pool around the nuts. Visual emphasis is on the contrast between the glossy chocolate and the matte nuts, with one hazelnut showing a cracked interior as chocolate runs over it. Audio begins with the deep, viscous pour of chocolate—a low glug and a soft, sticky slap on contact—immediately joined by crisp, dry clacks as nuts bump one another and a faint high-frequency tinkle as tiny droplets hit the slate; a muted kitchen ambience (distant HVAC hum and soft background murmur) sits underneath, then the sound settles into a gentle wet spread and quiet drips as chocolate spreads around the almonds and hazelnuts."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60278", "video_prompt": "Dong Yuhui is standing in a vast open plain under a pale sky, soft late-afternoon light falling across the scene; he is a man in his early 30s with short black hair, the breeze is blowing through his hair and ruffling the edges of a light gray jacket over a white shirt, and he is holding an open hardcover book in his left hand with the pages slightly fluttering. His face is turned toward the horizon, eyes firm and full of hope, the corners of his mouth slightly raised in a positive, energetic smile; his posture is upright, chest lifted and shoulders back, his whole body language is full of vitality and vigor, conveying clear confidence and optimism. While the wind makes a low whoosh through the grasses and the book emits a soft paper rustle, he breathes in, glances down at the open page, then looks up and speaks in a steady, warm, confident voice, \"We can do this,\" the words timed with a small, assured nod. Underneath, a gentle single-note piano chord swells as distant bird calls punctuate the air, the wind continues to whisper, and the final sound is the quiet flutter of pages as his smile holds, leaving a calm, positive, upward feeling."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60291", "video_prompt": "A medium close-up of a man in his late 50s with short graying hair and a neat trimmed beard, seated behind a low glass console in a futuristic presidential chamber that blends dark wood, brushed steel and frosted glass; he wears a tailored dark gray suit with a high-collar shirt and a small stylized tricolor lapel pin, his expression calm and purposeful as translucent holographic maps and data widgets float a foot above the console. He reaches out and taps a hovering map, which ripples into sharper focus while a thin blue route highlights across Eurasia; his fingers move with deliberate, practiced gestures and his eyes track the route as if reading several layers of data at once. Ambient sound is a low, steady hum of climate systems and distant city traffic through thick glass, punctuated by soft glassy chimes on each tap and a brief electronic confirmation beep when he activates a layer; a sparse, subdued string motif plays quietly under the scene, rising slightly when the map refocuses. President (voice: measured, low, paced): \"Activate strategic overlay, scale to national,\" he says, then pauses, glances briefly to the side as if checking a monitor, fingers hovering above the interface. President (voice: softer, deliberate): \"Keep civilian channels open — alert level stable,\" he adds, and presses his palm once, sending a succinct confirming tone as the hologram contracts; his face tightens for a beat, then relaxes as the room returns to the steady ambient hum."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60317", "video_prompt": "A medium close-up static shot of an androgynous angelic being is hovering a few feet above a smooth pale floor, wearing a flowing white robe with thin silver embroidery and long silver-white hair falling over the shoulders, face composed and gently focused. Soft cool backlight and a warm rim light are outlining the figure as large luminescent wings are slowly unfurling behind them, each feather translucent with an inner soft glow that pulses in pale gold and cool blue while tiny motes of light are drifting off the wing tips and catching on the silk texture of the robe. As they are extending both hands forward, the aura around them is shimmering in slow, concentric waves and their robe is fluttering slightly from a light upward lift; they are tilting their head and allowing a quiet, serene smile to form. A sustained, gentle choral pad is filling the air underneath the scene, with a single bell-like glissando punctuating the moment the wings open; soft rustle of fabric and whisper of feathers are synchronized to the wing motion, a faint whoosh of displaced air accompanies the hover, and subtle harmonic overtones are rising and falling with the aura's pulse. (voice: calm, warm, mezzo) \"I am here,\" they say, voice steady and low as their palms open, the words timed to the wings' full spread; a brief pause lets the motes scatter. (voice: soft, distant, reverent) Then they adds, voice trailing as they incline their head and close their eyes briefly, \"Stay near the light,\" while the glow around them is gently pulsing one last time."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_65", "video_prompt": "A young woman in her early 20s is standing in a medium full shot on a narrow stone alley lit by paper lanterns, wearing a fitted knee-length red silk qipao with pale pink peony embroidery and smooth white stockings; soft warm lantern light grazes the silk while cool evening blue fills the alleyside shadows, the stone underfoot showing a faint sheen as if damp. She has long black hair gathered into a loose chignon with a simple silver hairpin, subtle makeup, and a calm, attentive expression as she shifts her weight onto one foot and smooths the fabric at her hip, then reaches out briefly to brush a lantern's fringe with two fingers while her skirt whispers against her thigh. Ambient alley sound settles beneath—low murmur of distant conversation, a far-off bicycle bell, a small trickle of water from a nearby drain—while a single-note erhu phrase plays softly and periodically, matching her gentle movements; each step she takes produces a soft footstep on wet stone and a quiet rustle of silk. She breathes out, then speaks in a soft, warm voice, \"The lanterns look quiet tonight,\" as she glances down and then up toward the lights, a small, thoughtful smile forming; fabric rustle and a light creak from the lantern sway punctuate the moment, and the erhu holds a final quiet phrase as she tilts her head and the scene lingers on her composed, serene posture."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_70", "video_prompt": "A medium close-up of Lucas, early 30s, clean-shaven with short dark hair slightly tousled, wearing a navy blazer over a white shirt, standing on a small stage in a dim conference hall under a soft spotlight with a faint technical schematic projected on a screen behind him; he is leaning slightly forward, eyes bright, smiling, right hand lifting in an open, inviting gesture while his left holds a handheld microphone. Lucas (eager, breathy, mid-tempo voice): \"This is it, my friends. Humanity&#39;s next step.\" As he finishes the line he steps a half pace forward and lifts both hands briefly, adding in a quick, confident tone, \"We're ready to begin,\" (confident, slightly faster) while a low audience murmur swells into a single short cheer; ambient room sound includes a soft HVAC hum, distant chair scrape, and a subtle microphone rustle when he moves the mic. The stage light casts soft shadows across his jaw and the projector's faint flicker glints on the nearest rows; his brows lift and a slight smile tightens at the corners of his mouth, the fabric of his jacket shifting as he breathes and the crowd reacts, all within a compact, energetic five-second moment."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_74", "video_prompt": "A medium wide shot of a low, pebbled shoreline in soft late-afternoon light, a boy about eight with short brown hair and a faint smudge on his cheek is walking barefoot toward the water wearing a damp navy windbreaker and tan shorts, taking small, careful steps across wet stones while his jacket sleeves brush his arms. Gentle lap of small waves provides the ambient sound, with distant gull calls and a light breeze whispering through nearby reeds; as he draws within two meters the gravel crunches under his feet, a soft fabric rustle from his jacket is audible. He slows, eyes on the moving water, inhales audibly, and in a quiet, hesitant voice says, \"Okay... here goes,\" then leans forward and steps so his toes skim the shallow edge, producing a soft splash and a brief spray; the water makes a delicate fizzing sound against his foot and he lets out a small relieved breath while the wave withdraws and pebbles settle back into place."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_76", "video_prompt": "A stationary overhead camera looks straight down through clear, shallow blue water as a large adult whale shark is gliding slowly from left to right across the frame, its broad head, pale gray-blue skin patterned with white spots and faint stripes, and a tall dorsal fin visible beneath a gently rippling surface; soft sunlight is casting moving caustic bands across its back and the sandy seafloor below, the shark's rough, ridged skin showing subtle barnacles and a few old scars. As it swims, the shark is opening its wide mouth slightly to filter-feed while its tail is undulating in smooth, powerful strokes, a dorsal fin brushing the surface and sending tiny concentric ripples; a small group of pilot fish is trailing close behind, then a nearby cluster of baitfish is scattering in a quick burst, darting away in sharp, synchronized motions. Ambient audio begins with a low, muffled ocean hum and a distant, slow whale-like call underlining the scene; as the shark passes a soft whoosh of displaced water and a deep, low rumble follow each tail flick, faint bubble pings are audible near its mouth, and the baitfish scattering produces brief, high-frequency splashes and quick metallic plinks. The overall mood is calm and observant, the overhead view holding steady as the whale shark continues gliding out of frame."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_86", "video_prompt": "A medium close-up of a young man in his mid-20s, based directly on image:sketch (1).jpeg as visual reference, standing slightly angled toward camera with his shoulders relaxed; his dark, slightly long hair is moving as a steady wind is sweeping across the scene, ruffling strands and lifting the collar of his loose grey shirt. A soft blue-white glow is surrounding him like a faint halo, the glow is pulsing gently and casting a cool rim light on the edges of his face and hair while the rest of the background remains a muted charcoal sketch texture; small ink-like particles are drifting through the air and catching the light as they float. He is breathing out slowly, exhaling as a gust of wind pushes his hair back, and his expression is calm, eyes narrowing slightly as if listening. Soundscape: a clear, close wind whoosh is present and is rising and falling with each gust, soft rustling of fabric and hair on each breath, and a low, steady electrical hum is synchronizing with the glow pulses; beneath that, a faint sketching scratch like pencil on paper is barely audible to tie to the reference image. Dialogue synced to actions: He (voice: low, steady, medium pace) says, \"I can feel it,\" as he exhales and lets his chin lift; an inner voice (voice: soft, breathy, slow) replies, \"It's starting,\" as the glow brightens for one pulse; he (voice: low, steady) answers again with a softer tone, \"Then stay with it,\" as his hair settles slightly and his shoulders shift; the inner voice (voice: airy, distant) whispers, \"I am,\" timed with the final, small gust that makes the particles drift away. The overall color palette is cool greys and blue-white light, textures are drawn-paper and soft fabric, and all motion is continuous and subtle to fit within a brief five-second moment."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_93", "video_prompt": "In a wide shot, a small round bunny with soft white fur and faint gray patches is sitting near the crest of a gently sloping hill blanketed in short green grass and scattered wildflowers—daisies, buttercups, and small bluebells—bathed in soft midday sunlight; the bunny twitches its nose and lifts slightly, then springs forward in a quick, joyful bound, ears tilting back and hind legs tucking under as it clears a patch of flowers, petals fluttering down and a few stems bending under its passage, then it lands lightly with forefeet touching first and hind feet following, sending a muted thump and a soft rustle through the grass. Ambient sound begins with a light breeze through the grass and distant birdsong, joined by a steady, close bee buzz around the blossoms; as the bunny pushes off there is a small puff of displaced air and the faint crunch of stems, and on landing the bunny lets out a short, bright chirp. Bunny (soft, high-pitched, quick): \"peep,\" synchronized with the landing and a brief head tilt; it then sniffs a daisy, nose wrinkling and whiskers brushing petals, ears flicking, while the breeze and bees continue softly in the background and a few petals settle back onto the hill."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60343", "video_prompt": "Wide shot of a small village under late afternoon light: clusters of low stone cottages with red tile roofs, a low church steeple, and a narrow cobbled lane between them; warm, soft light casts long, thin shadows and brings out the rough texture of stone and the matte clay tiles. Several groups of birds are flying across the pale sky, each flock is shifting shape as birds are soaring, banking, wheeling, splitting off, then rejoining—some individuals are gliding on outstretched wings while others are beating rapidly to gain altitude; their moving shadows skim across rooftops and the lane below as a group arcs together over the steeple. Soundscape: close, soft flapping of wings layered with overlapping bird calls—higher, quick chirps interspersed with occasional low caws—while a single distant church bell tolls slowly once, a light wind is rustling through poplar leaves, a loose wooden shutter creaks and a clay chimney pot clicks as the breeze passes, all synchronized so the wingbeats and calls rise as the flocks bank and briefly swell, then fall away as they split and move out of frame."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60350", "video_prompt": "In a medium close-up, a man in his late 20s with short dark hair and light stubble is cupping a 3-week-old black and white tuxedo kitten with blue eyes in both hands, wearing a light gray cotton shirt; soft window light is falling across their faces and a warm indoor tone is bathing the scene. He is holding the kitten gently, palms cradling its tiny body as he is stroking the soft downy fur with his thumbs; the kitten is calm and relaxed, blinking slowly, paws tucked against his palms and its tiny pink nose twitching. The man smiles and speaks in a low, warm voice, slow pace, “Hey little one…” (he leans in slightly, eyes soft, fabric rustle audible), then the kitten replies with a soft, high mew (quiet, brief) while blinking and beginning a steady, quiet purr that grows slightly louder as it settles against his hands. The man responds in a tender hush, gentle pace, “You’re so small, aren’t you?” as he exhales and brushes his thumb along its back; the kitten emits another faint mew and nuzzles his thumb, purring more audibly. Ambient audio layers are intimate and close: a muted distant street hum through the window, a faint clock tick, soft breathing from the man, light fabric rustle when he shifts, the kitten’s small mews and continuous low purr present and clear, and a subtle room reverb to convey a small, quiet indoor space."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60360", "video_prompt": "Style: Tim Burton Style Characters. In a dim, low-lit room a lanky, slightly elongated Santa is standing beside a tall, narrow window, peering at a smartphone held in his gloved hand; he has a narrow pale face, wide dark eyes, a scraggly white beard, and a worn red coat with thin fur trim and long sleeves that brush his knuckles. Soft moonlight is filtering through the window, casting thin shadows across his coat while a small warm desk lamp on the sill gives a muted rim light; a cinematic lens with shallow depth of field is rendering the distant streetlights outside as soft bokeh and subtle film grain. The smartphone screen is glowing with a map full of clustered red dots and tiny car icons showing a traffic jam; he is zooming and tapping the screen, brow furrowing as the map shifts. Room audio is quiet: a faint, steady traffic rumble and occasional car horn from the street below, a soft creak as he shifts his weight, and a delicate notification chime from the phone. In a low, gravelly, slow voice he mutters, \"Forty-two minutes? That's too long,\" then he is tapping to try an alternate route as a neutral, clipped phone assistant voice says, \"Delay ahead: 42 minutes, suggested detour adds ten minutes,\" synchronized with the map redrawing under his thumb; he glances up through the window at the stalled tail lights, exhales, and in a short, resigned tone replies, \"All right, take the detour,\" then tucks the phone into his coat as the distant traffic hum persists and the lamp casts a soft, low shadow across his face."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60390", "video_prompt": "A wide static view of a coastal landscape on a sunny day: the sun is high in the pale sky, casting soft overhead light and subtle shadows across smooth wet sand and scattered pale pebbles while small waves are lapping at the shore and thin white foam is retreating with a gentle hiss as sunlight glints on the ripple tops. As a light breeze is bending the dune grass and a few palm fronds, several seagulls are circling low and calling with short sharp cries, and a distant sailboat is drifting near the horizon with its canvas lightly billowing. The soundscape matches the motion—soft rhythmic surf washing onto sand, the thin hiss of foam receding, wind whispering through the grass, intermittent gull calls, and an occasional soft metallic clink from the sailboat rigging, all underscored by a faint, slow acoustic guitar picking a quiet two-bar motif to add calm warmth to the scene."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60404", "video_prompt": "A medium close-up shows Eveline, a young sorceress in her early 20s with waist-length dark brown hair braided loosely over one shoulder, wearing a moss-green cloak over a simple linen dress and scuffed leather boots, kneeling on soft emerald moss in a serene sun-dappled forest; soft overhead light filters through tall oaks, casting moving leaf shadows and the scene is textured with damp bark, tiny wildflowers, and dewy spiderwebs. Ambient forest sounds—distant birdsong, a gentle breeze rustling leaves, a nearby creek's quiet trickle—form a calm background; her soft footsteps on moss and the quiet inhale of her breath punctuate the space as she is experimenting with magic, tracing a small sigil in the air with a trembling finger while murmuring an incantation, \"Focus... steady,\" (soft, hesitant, low) and tiny golden motes are gathering at her fingertip with a faint bell-like chime and gentle electrical crackle. She is tilting her head, squinting in concentration, then is pushing her palm forward and a warm pale orb of light is lifting from her hand, spinning slowly as leaves and dust begin orbiting it; the orb is emitting a subtle harmonic drone and delicate tinkling SFX as it is stabilizing. Eveline is exhaling sharply, \"It's... it's working,\" (breathless, surprised, slightly high) and she is leaning closer, eyes widening and mouth parting in visible awe while small vines at her feet are curling toward the light. She is straightening with a small, stunned smile, whispering, \"I can do this,\" (quiet, resolute, calm) as the forest ambience swells—a soft wind gust, a brief chorus of birds—and the orb is pulsing once with a warm luminous beat that is bathing her face in soft gold, capturing the brief, clear moment of realization and possibility."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60416", "video_prompt": "Style: cartoon. In a static medium-wide shot, a cartoon girl of about 10 sits on a smooth gray stone ledge at the base of a narrow waterfall, legs dangling and hands resting on the rock; she wears a teal hoodie and denim shorts, brown shoulder-length hair with a small side braid, and a soft, contented expression as she looks out at the view of a tree-lined valley beyond. Soft midday light filters through green leaves, the waterfall is drawn with clear blue-green water and white froth, and a fine mist makes tiny droplets bead on her knees; a light breeze moves her braid and the hem of her hoodie. The water is continuously audible as a steady, low roar; higher, crisp splashes hit the stones and a few bright bird calls and a faint rustle of leaves sit beneath the falls. She shifts her weight, tucks her braid behind her ear, blinks, and exhales slowly as the mist lands on her face (subtle wet SFX). Girl (soft, reflective, slow): \"It's so peaceful here...\" (her voice is intimate, slightly breathy; waterfall audio ducks gently while she speaks). She smiles, presses her palms to the stone, and says with a small laugh, Girl (light, quick): \"Maybe I should come back with a sketchbook.\" After her last word the waterfall volume returns to full, droplets patter, and the scene holds with distant birds and steady water ambience."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60454", "video_prompt": "In a static medium-wide shot, a woman in her late 20s with wind-tousled dark hair is sitting on a smooth driftwood log at the sandy shore, wearing a light gray knit sweater and worn jeans, facing the sea as soft late-afternoon light washes the scene and low sun glances off the wet sand. Small waves are lapping and pulling at the shoreline, seafoam is pulsing around scattered pebbles, and a cool breeze is ruffling her hair and sweater; distant gull calls punctuate the air and a low foghorn is sounding far off. She is watching the horizon, fingers lightly tapping the log, then draws a slow breath and says in a quiet, steady voice, \"Calmer today, huh?\" She glances down at a phone in her lap; an off-screen voice, warm and slightly amused, replies quickly, \"Yeah — you found the quiet spot.\" She lets out a small smile, tucks hair behind her ear, answers softly, \"I needed this,\" and turns her gaze back to the sea while the rhythmic hiss of waves and gentle wind continue under the brief exchange, with occasional seabird calls and a faint distant dog bark adding texture to the seaside ambience."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60489", "video_prompt": "In a medium close-up, Alexander Lukashenko, a white man in his early 70s with short gray hair and a clean-shaven face, wearing a dark navy suit and a striped tie, sits at a low podium against an out-of-focus backdrop showing a red-and-green flag; soft overhead light and a subtle rim light outline his shoulders as the scene focuses on his face. He is smiling, then leans forward slightly, then tilts his head back and throws it back as he bursts into a short, throaty laugh—his shoulders shaking and eyes crinkling—while one hand comes up briefly to his mouth and then drops to the podium. The audio layer presents a close, gravelly laugh in the foreground, layered with light applause and murmured reactions from an off-screen audience, the faint rustle of papers, and a soft microphone rustle; as the laugh begins he utters one dry, measured line in a low, even tone, \"Well, that's unexpected,\" (low, dry, slow), then follows with a quick, breathy chuckle (deep, warm) that overlaps the room noise. The camera remains steady in the medium close-up, capturing the sequence of smile, voiced remark, and sudden laugh while ambient sounds — soft clapping, hushed voices, a single camera shutter click — sit behind the laugh and then settle as the laughter fades."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60498", "video_prompt": "A static close-up frames a Spider-Man action figure standing on a worn wooden shelf, painted red and blue suit with glossy plastic highlights and visible articulated joints; soft overhead light casts mild reflections across the molded webbing and a shallow depth of field blurs the background books. He is slowly tilting his head to the left as his white eye lenses curve upward into a clear smiling expression, then he raises his right hand in a short, friendly wave while a crisp plastic joint click and a faint squeak register with each motion. Room ambience carries a distant city hum and soft indoor air movement; as his lenses tighten into the smile a brief, gentle synth twinkle accents the moment. Spider-Man (playful, slightly high-pitched, quick) says, \"Hey—ready?\" as he lifts his hand, the voice timed to the click of the wrist joint; he pauses, leans forward a hair, and the painted smile seems to deepen with a tiny plastic rasp. Spider-Man (warm, amused, medium pace) follows, \"You always are, right?\" while he tilts his head back and his shoulder rotates with a soft mechanical click. Spider-Man (confident, brisk, low) finishes, \"Okay—on three,\" as he nods once and the light catches a small scuff on the painted chest; a short, bright chime punctuates the final grin and the ambient city hum continues under the closing moment."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60508", "video_prompt": "A matte black Porsche 911 is cruising along a narrow, leaf-strewn forest road in the evening, its low headlights cutting through dim blue-gray dusk while long cool shadows from pine and birch stretch across the pavement; soft, warm red taillights smear over wet leaves as the car moves, and small sprays of leaves lift and scatter behind the rear tires. The engine is emitting a low, measured purr that rises briefly as the driver downshifts while negotiating a gentle curve, tires softly crunching on damp leaves and loose gravel beneath the chassis; occasional low branches brush the hood and send a faint metallic whisper along the bodywork. Ambient sound is a quiet forest layer—wind rustling needles and leaves in the canopy, a distant bird call, and the echo of the engine bouncing between tree trunks—then the subtle click of the turn signal and the whisper of airflow as the Porsche arcs past, remaining alone in the calm evening woods."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60528", "video_prompt": "A quaint village nestled among rolling hills and surrounded by peaceful trees is bathed in a sky of orange and pink as the sun begins to set, soft late-afternoon light warming stone cottages and small gardens while gentle rustling of leaves and the sweet melodies of birds fill the air. After two seconds, the screen slowly zooms in, then over the next three seconds the camera slowly zooms in and pans, offering a picturesque view of the clustered homes before gracefully pointing toward a nearby pond that is shimmering in the warm sunlight; the pond's surface mirrors the colored sky and surrounding trees with glinting highlights moving across the water. As the focus narrows on the pond, three vibrant orange fishes are swimming in graceful arcs just below the surface, their scales glistening as sunlight catches them while they glide in small, coordinated loops and send soft ripples outward. Audio is layered to match the motion: continuous leaf rustle and clear bird song in the foreground, then gentle water lapping and faint, synchronized splashes as each fish breaks the surface of a ripple, keeping the overall soundscape calm and harmonious with the village remaining quiet in the background."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60532", "video_prompt": "A medium close-up of a young woman in her late 20s with short black hair tucked behind her ear, wearing a light gray blouse, sitting at a small wooden table at a street-side food stall as soft late-afternoon light falls across a glossy ceramic bowl of mie ayam—yellow noodles piled with shredded braised chicken, chopped scallions and a few green vegetables, dark sauce pooling at the bottom; she is lifting a tangle of noodles with wooden chopsticks, twirling them once, then slurping them into her mouth while her free hand is scooping broth and chicken with a metal spoon, her eyes closing briefly and a small smile forming. Ambient street audio is present—distant traffic hum, a motorbike idling, low vendor chatter and occasional clink of dishes—while close SFX highlight wooden chopsticks tapping porcelain, a wet noodle slurp, the soft clink of spoon on bowl and a quiet breath; a gentle instrumental with soft percussive tones plays under the ambient mix. As she sets the chopsticks down she says in a warm, measured voice, \"Looks comforting,\" pauses and glances around, then in a soft, content voice adds, \"Just what I needed,\" and reaches for another bite."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60538", "video_prompt": "A static medium-wide shot frames a low, moss-covered rock ledge and the cool pool below, soft afternoon light filtering through a leafy canopy and casting dappled patches across dark green water; the rock is slick with thin rivulets and small, pale leaves cluster at the edge. A small, smooth pebble is teetering on the lip, then tumbles and plunges into the cool pool below, breaking the mirror surface and sending concentric ripples outward while tiny droplets arc upward and scatter. Under the surface a brief ring of bubbles is rising and dissolving, and as the ripple reaches floating leaves they bob and shift. Audio: an immediate, sharp splash on impact, followed by spreading, gentle slaps of water against stone, a soft underwater gurgle as bubbles release, faint steady trickle from the rock face, distant birds calling and a low insect hum; after the initial splash the water calms and small droplets patter on leaves while the echo of the impact fades."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60546", "video_prompt": "A happy child of about eight years old with short curly hair and subtle freckles is sitting cross-legged on a small red picnic blanket in a sunlit backyard, wearing a striped t-shirt and denim shorts, holding a small soprano ukulele with a light wood grain and glossy finish; warm late-afternoon light is casting soft shadows and a few leaves are drifting in a light breeze. The child is smiling and is strumming a bright, bouncy chord progression with the right hand while the left hand is forming a simple C-to-G chord change, the ukulele strings ringing with clear plucked tones; the child is gently swaying and bobbing their head as they sing one short, cheerful line in a light, singing voice, Kid (singing, bright, slightly high): \"Sunshine on my day, play along, hey!\"—they finish the phrase with a quick, playful flourish across the strings, then laugh and look up, Kid (delighted, breathy): \"Again!\" as they tap the uke body with their thumb and launch into another upbeat strum. Audio includes close, intimate ukulele sound with crisp string attack and soft resonance, the child's voice layered slightly forward, faint backyard ambience—distant birdsong, a soft breeze through leaves, a muted lawnmower far off—and the small SFX of fingers on wood and a quick giggle; everything is synchronized so the vocal lines match the visible strums and the final tap coincides with the child's delighted exclamation."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60571", "video_prompt": "In a wide, low-angle shot from the highway shoulder, three super-fast sports cars are racing side-by-side along a wet night highway, a red low-slung coupe with a white racing stripe on the left is surging forward, a matte black aggressive supercar in the center is pulling slightly ahead with sharp LED accents cutting through light mist, and a metallic blue targa on the right is darting forward; glossy body panels catch and reflect neon from the big city glowing in the distance, soft sodium streetlamps pool on the slick asphalt, and heat shimmer ripples above each exhaust. As they accelerate, engine roars dominate the mix— the red coupe emits a high, razor-edge rev, the black car answers with a deep, throaty growl, and the blue car adds a fast turbo whistle—interleaved with rapid upshifts and short throttle blips; tires hiss and briefly squeal during tight lane corrections, small pebbles crack against wheel wells, and a sharp whoosh of wind whistles past side mirrors. Ahead, the distant city skyline of clustered towers and billboard neon provides a low, constant urban hum with faint car horns and a muted subway rumble beneath, while the Doppler shift bends each engine pitch as the cars edge forward and then tuck in, motion blur streaking headlights and building lights to emphasize raw speed for the five-second burst."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60572", "video_prompt": "Style: hyper realism 4k. A static tight close-up of a single candle sitting in an aged brass candlestick on a dark wooden surface, the creamy white wax is melting into a glossy pool that is slowly running down the stem while the blackened wick is holding a steady amber flame; 4k detail reveals the fine char texture of the wick, tiny beads of molten wax, and subtle fingerprints and patina on the brass. Soft warm light from the flame is casting gentle reflections on the candlestick and subtle, narrow shadows on the wood as a faint draft moves through the frame, making the flame flicker and lean; the wick crackles quietly and a thin metallic spark flicks at the tip as the flame thins, then a short airy whoosh is heard exactly as the flame collapses and goes out. A slender plume of blue-gray smoke is rising in a slow spiral from the still-glowing ember, the glowing tip dimming to a dull black nub while a few droplets of settling wax produce soft, muted clicks; ambient room hush underlies the moment with a low, unobtrusive room tone and a distant faint creak, and the residual sound of the extinguish — a soft breath-like hiss and the last faint pop — lingers as the scene holds on the quiet candlestick and the smoke disperses."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60589", "video_prompt": "A tight close-up of a brown-gray tabby cat with distinct black stripes and a faint white chin, lying on a soft cream knit blanket under soft side light from a nearby window; the cat blinks slowly, then leans in and presses its cheek and forehead into the blanket in a series of gentle, deliberate nuzzles while its whiskers brush the fabric and its ears flick slightly. Its green eyes remain half-closed and its shoulders shift forward as it rubs, then tilts its head and repeats another short, purposeful rub; the fur has a soft, slightly glossy texture and small tufts move with each motion. Ambient indoor room noise is quiet—a low, steady HVAC hum and distant muted street sound—then a low, steady purr begins as the cat first nuzzles, rising into a warm, continuous rumble synchronized with the rubbing. Each contact is accompanied by a soft fabric rustle and a faint breathy exhale; after the final nuzzle the cat releases and emits a brief, contented trill before settling, the purr continuing gently under the room ambience."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60601", "video_prompt": "In a medium close-up, a couple in their late 20s is sitting across a small round wooden table in a cozy cafe, the woman with shoulder-length dark hair and a cream sweater holding a white ceramic cup, the man with short brown hair, light stubble, and a navy button-down resting his forearms on the table; soft warm yellow pendant lights overhead are reflecting gently on their faces and creating small highlights on the cup and the glass vase between them, while the background is softly blurred with other patrons and warm wood tones. Low indie acoustic music is playing quietly from the cafe speakers as steady ambient murmur, occasional silverware clinks, and a distant espresso machine hiss form a continuous soundbed. She is smiling and laughing, tucks a strand of hair behind her ear, and says in a warm, laugh-tinged voice, \"I'm really glad we came out tonight,\" with a short pause and a quick smile toward him; a soft clink occurs as he sets his cup down. He is smiling back, leaning in slightly, and replies in a soft, even voice, \"Me too—it's been nice,\" pausing to meet her eyes while a chair scrapes gently in the background. She is glancing down at her cup, then looking up with a brighter smile and says in a quiet, upbeat voice, \"Let's do this again,\" as the music swells a fraction and the cafe ambience continues beneath their voices, leaving them both smiling and holding eye contact."}
{"id": "vidprom_semantic_unique_gpt-5-mini-2025-08-07_60632", "video_prompt": "A wide view of a lush forest of mixed hardwoods and fruit-bearing trees rising beside a calm lake, soft late-afternoon light filtering through green leaves and dappling mossy trunks. Low branches are heavy with red apples and yellow pears, their glossy skins catching the light as a gentle breeze is moving through the canopy—leaves are rustling and small clusters of fruit are swaying while a few water droplets bead on leaf edges. At the lake edge, reeds are bending slightly and the grey-blue surface is forming narrow ripples where the wind touches it; a dragonfly is skimming the water and a small fish is breaking the surface with a soft plop, sending a brief circular ripple outward. Ambient audio is layered and synced with action: the wind is creating a steady soft rustle through leaves, clear birdsong (quick sparrow-like chirps and a distant thrush) is punctuating the treetops, a low insect buzz is underscoring the scene, and gentle water lapping is meeting the pebbled shore; when a fruit brushes a branch a muted thud and the faint crunch of leaf litter register. The scene feels tranquil but active, natural motions continuing throughout the short clip."}

gradio_local_demo_ltx2_3/ui.py

import os
import re
import time
from copy import deepcopy

import gradio as gr

from fastvideo.configs.sample.base import SamplingParam
from fastvideo.entrypoints.video_generator import VideoGenerator

from .config import (
    DEFAULT_FPS,
    DEFAULT_HEIGHT,
    DEFAULT_NEGATIVE_PROMPT,
    DEFAULT_NUM_FRAMES,
    DEFAULT_NUM_INFERENCE_STEPS,
    DEFAULT_WIDTH,
    MODEL_ID,
    MODEL_PATH_MAPPING,
    OUTPUT_DIR,
    setup_model_environment,
)
from .examples import load_example_prompts
from .prompt_rewrite import maybe_enhance_prompt
from .rendering import (
    create_timing_display,
    create_timing_placeholder,
    render_completed_clips,
    render_error_message,
    render_input_image_status,
    render_prompt_blocked_message,
    _record_session_clip,
)
from .safety import get_prompt_safety_check

def create_gradio_interface(default_params: dict[str, SamplingParam], generators: dict[str, VideoGenerator]):
    def _sanitize_filename_component(name: str) -> str:
        sanitized = re.sub(r'[\\/:*?"<>|]', "", name)
        sanitized = sanitized.strip().strip(".")
        sanitized = re.sub(r"\s+", "_", sanitized)
        return sanitized or "video"

    def generate_video(
        prompt, model_selection, input_image=None
    ):
        model_path = MODEL_PATH_MAPPING.get(model_selection, MODEL_ID)
        setup_model_environment(model_path)
        try:
            generator = generators[model_path]
            params = deepcopy(default_params[model_path])

            params.prompt = prompt
            params.seed = default_params[model_path].seed
            params.guidance_scale = default_params[model_path].guidance_scale
            params.num_frames = int(default_params[model_path].num_frames)
            params.height = int(default_params[model_path].height)
            params.width = int(default_params[model_path].width)
            params.fps = DEFAULT_FPS
            params.num_inference_steps = DEFAULT_NUM_INFERENCE_STEPS
            params.save_video = True
            params.return_frames = False
            params.output_path = ""
            params.negative_prompt = default_params[model_path].negative_prompt
            params.image_path = input_image or None

            OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
            safe_prompt = _sanitize_filename_component(prompt[:80])
            video_filename = f"{safe_prompt}_{int(time.time() * 1000)}.mp4"
            output_path = str(OUTPUT_DIR / video_filename)
            params.output_path = output_path
            start_time = time.perf_counter()
            result = generator.generate_video(
                prompt=prompt,
                output_path=output_path,
                fps=DEFAULT_FPS,
                seed=int(params.seed),
                save_video=True,
                return_frames=False,
                guidance_scale=float(params.guidance_scale),
                height=int(params.height),
                width=int(params.width),
                num_frames=int(params.num_frames),
                num_inference_steps=DEFAULT_NUM_INFERENCE_STEPS,
                negative_prompt=params.negative_prompt,
                image_path=params.image_path,
                ltx2_image_crf=0.0
            )
            wall_time = time.perf_counter() - start_time
            generation_time = (
                result.get("generation_time")
                if isinstance(result, dict) else None
            )
            e2e_latency = (
                result.get("e2e_latency")
                if isinstance(result, dict) else None
            )
            if generation_time is None:
                generation_time = wall_time
            if e2e_latency is None:
                e2e_latency = wall_time
            resolved_output_path = (
                result.get("output_path", output_path)
                if isinstance(result, dict) else output_path
            )
            logging_info = result.get("logging_info", None) if isinstance(result, dict) else None
            if logging_info:
                stage_names = logging_info.get_execution_order()
                stage_execution_times = [
                    logging_info.get_stage_info(stage_name).get("execution_time", 0.0) 
                    for stage_name in stage_names
                ]
            else:
                stage_names = []
                stage_execution_times = []

            return (
                resolved_output_path,
                params.seed,
                params.num_frames,
                float(generation_time),
                float(e2e_latency),
            )

        except Exception as e:
            print(f"An error occurred during local generation: {e}")
            return None, f"Generation failed: {str(e)}", 0, 0.0, 0.0

    examples, example_labels = load_example_prompts()
    curated_prompts = {
        prompt.strip() for prompt in examples if prompt.strip()
    }
    initial_example_label = None

    theme = gr.themes.Base().set(
        button_primary_background_fill="#2563eb",
        button_primary_background_fill_hover="#1d4ed8",
        button_primary_text_color="white",
        slider_color="#2563eb",
        checkbox_background_color_selected="#2563eb",
    )

    def get_default_values(model_name: str):
        model_path = MODEL_PATH_MAPPING.get(model_name)
        if model_path and model_path in default_params:
            params = default_params[model_path]
            return params.height, params.width, params.num_frames
        return DEFAULT_HEIGHT, DEFAULT_WIDTH, DEFAULT_NUM_FRAMES

    init_height, init_width, init_num_frames = get_default_values("FastLTX-2.3")

    def render_generation_badges(model_name: str) -> str:
        _ = model_name
        return """
        <div class="generation-badges">
            <span class="generation-badge">FastLTX-2.3</span>
            <span class="generation-badge">5 sec</span>
            <span class="generation-badge">1080p</span>
            <span class="generation-badge">9:16</span>
        </div>
        """

    with gr.Blocks(title="FastLTX-2.3", theme=theme) as demo:
        completed_clips_state = gr.State([])
        gr.HTML("""
        <div id="hero-shell" class="hero-shell">
            <div id="hero-brand" class="hero-brand">
                <img src="/logo.png" alt="FastVideo logo" id="hero-fastvideo-logo" class="hero-fastvideo-logo" />
                <img src="/nvidia.png" alt="NVIDIA logo" id="hero-nvidia-logo" class="hero-nvidia-logo" />
            </div>
            <div id="hero-title" class="hero-title">Real-Time 1080p Video Generation with FastLTX-2.3 on a single B200</div>
        </div>
        """, elem_id="hero-wrapper")

        with gr.Column(elem_id="app-shell", elem_classes="app-shell"):
            timing_title = gr.HTML(
                "<div class='timing-section-title'>TIMING BREAKDOWN</div>",
                visible=False,
                elem_id="timing-title",
            )
            timing_display = gr.Markdown(
                value=create_timing_placeholder(),
                visible=False,
                elem_id="timing-display",
                elem_classes="timing-display-block",
            )

            with gr.Group(elem_id="stage-card", elem_classes="stage-card"):
                with gr.Row(elem_id="stage-card-header", elem_classes="stage-card-header"):
                    gr.HTML(
                        "<div class='stage-title'>🏎️ Make Video Generation Go Blurrrrrrr 💨</div>"
                    )
                    stage_badges = gr.HTML(
                        render_generation_badges("FastLTX-2.3"),
                        elem_id="stage-badges",
                        elem_classes="stage-badges-wrap",
                    )

                result = gr.Video(
                    label="Generated Video",
                    show_label=False,
                    container=True,
                    visible=True,
                    elem_id="stage-video",
                    elem_classes="stage-video",
                )
                error_output = gr.HTML(visible=False, elem_id="error-output")

            with gr.Group(elem_id="control-card", elem_classes="control-card"):
                gr.HTML(
                    "<div class='control-field-label'>Select an example prompt below or create your own (and optionally add an input image as your first frame)</div>",
                    elem_id="example-dropdown-label",
                )
                example_dropdown = gr.Dropdown(
                    choices=example_labels,
                    show_label=False,
                    value=initial_example_label,
                    interactive=True,
                    allow_custom_value=False,
                    container=False,
                    elem_id="example-dropdown",
                )
                prompt_textbox = gr.Textbox(
                    show_label=False,
                    value="",
                    placeholder="Describe your scene...",
                    max_lines=3,
                    container=False,
                    lines=3,
                    autofocus=True,
                    elem_id="prompt-textbox",
                )

                model_selection = gr.Dropdown(
                    choices=list(MODEL_PATH_MAPPING.keys()),
                    value="FastLTX-2.3",
                    label="Model",
                    interactive=True,
                    visible=len(MODEL_PATH_MAPPING) > 1,
                    elem_id="model-selection",
                )
                input_image = gr.File(
                    show_label=False,
                    file_types=["image"],
                    type="filepath",
                    container=False,
                    elem_id="input-image",
                )
                with gr.Row(
                    elem_id="image-upload-status-row",
                    elem_classes="image-upload-status-row",
                ):
                    image_upload_status = gr.HTML(
                        value=render_input_image_status(None),
                        elem_id="image-upload-status",
                        elem_classes="image-upload-status-wrap",
                    )
                    clear_image_button = gr.Button(
                        "x",
                        variant="secondary",
                        size="sm",
                        visible=False,
                        elem_id="clear-image-button",
                    )

                with gr.Row(elem_id="control-footer-row", elem_classes="control-footer-row"):
                    with gr.Row(elem_id="control-actions-row", elem_classes="control-actions-row"):
                        gr.HTML(
                            """
                            <button
                                type="button"
                                class="upload-image-trigger"
                                aria-label="Upload image"
                                title="Upload image"
                                onclick="(() => { const input = document.querySelector('#input-image input[type=file]'); if (input) { input.value = ''; input.click(); } })()"
                            >
                                <svg
                                    viewBox="0 0 24 24"
                                    fill="none"
                                    xmlns="http://www.w3.org/2000/svg"
                                    aria-hidden="true"
                                >
                                    <path
                                        d="M5 6.5C5 5.67157 5.67157 5 6.5 5H9.2C9.59783 5 9.97936 4.84196 10.2607 4.56066L10.9393 3.88204C11.2206 3.60074 11.6022 3.4427 12 3.4427H13.8C14.1978 3.4427 14.5794 3.60074 14.8607 3.88204L15.5393 4.56066C15.8206 4.84196 16.2022 5 16.6 5H17.5C18.3284 5 19 5.67157 19 6.5V17.5C19 18.3284 18.3284 19 17.5 19H6.5C5.67157 19 5 18.3284 5 17.5V6.5Z"
                                        stroke="currentColor"
                                        stroke-width="1.7"
                                        stroke-linejoin="round"
                                    />
                                    <circle
                                        cx="12"
                                        cy="11.5"
                                        r="3"
                                        stroke="currentColor"
                                        stroke-width="1.7"
                                    />
                                    <path
                                        d="M7.25 16L9.9 13.35C10.2125 13.0375 10.7192 13.0375 11.0317 13.35L12.1 14.4183C12.4125 14.7308 12.9192 14.7308 13.2317 14.4183L14.95 12.7C15.2625 12.3875 15.7692 12.3875 16.0817 12.7L16.75 13.3683"
                                        stroke="currentColor"
                                        stroke-width="1.7"
                                        stroke-linecap="round"
                                        stroke-linejoin="round"
                                    />
                                </svg>
                            </button>
                            """,
                            elem_id="upload-image-trigger",
                        )
                        run_button = gr.Button(
                            "Create",
                            variant="primary",
                            size="lg",
                            elem_id="run-button",
                        )

                with gr.Row(visible=False):
                    height_display = gr.Number(
                        label="Height",
                        value=init_height,
                        interactive=False,
                        container=True,
                    )
                    width_display = gr.Number(
                        label="Width",
                        value=init_width,
                        interactive=False,
                        container=True,
                    )
                    num_frames_display = gr.Number(
                        label="Number of Frames",
                        value=init_num_frames,
                        interactive=False,
                        container=True,
                    )

        with gr.Row(elem_id="completed-clips-header-row", elem_classes="completed-clips-header-row"):
            with gr.Column(scale=4):
                gr.Markdown("## Gallery")
            with gr.Column(scale=1, min_width=170, elem_id="completed-clips-button-column", elem_classes="completed-clips-button-column"):
                clear_clips_button = gr.Button(
                    "Clear My Gallery",
                    variant="secondary",
                    size="sm",
                    min_width=140,
                    elem_id="clear-clips-button",
                )
        completed_clips_status = gr.Markdown(
            "Your completed clips for this browser session will appear here.",
            elem_id="completed-clips-status",
        )
        completed_clips_html = gr.HTML(
            value=render_completed_clips([]),
            elem_id="completed-clips-section",
            elem_classes="completed-clips-section",
        )

        gr.HTML("""
        <style>
        :root {
            --fv-bg: #000000;
            --fv-text: #f5f7fb;
            --fv-muted: #d3daea;
            --fv-border: rgba(68, 88, 128, 0.82);
            --fv-panel: #000000;
            --fv-panel-soft: #000000;
            --fv-chip: linear-gradient(180deg, rgba(10, 19, 38, 0.92), rgba(5, 10, 20, 0.92));
            --fv-surface: rgba(6, 11, 22, 0.94);
            --fv-surface-soft: rgba(8, 12, 24, 0.92);
            --fv-shadow: rgba(0, 0, 0, 0.28);
            --fv-overscroll-shift: 0px;
            --body-background-fill: #000000;
            --body-background-fill-subdued: #000000;
            --background-fill-primary: #000000;
            --background-fill-secondary: #000000;
            --block-background-fill: #000000;
            --block-background-fill-dark: #000000;
            --panel-background-fill: #000000;
            --panel-background-fill-dark: #000000;
            --input-background-fill: #020713;
            --input-background-fill-focus: #020713;
            --fv-hero-side-width: 220px;
            --body-text-color: var(--fv-text) !important;
            --body-text-color-subdued: var(--fv-muted) !important;
        }

        html,
        body,
        #root,
        .gradio-container,
        .main,
        .app {
            background: var(--fv-panel) !important;
            background-color: var(--fv-panel) !important;
            background-image: none !important;
        }

        html {
            background: #000000 !important;
            background-color: #000000 !important;
            background-image: none !important;
            scroll-behavior: smooth;
            overscroll-behavior-y: auto;
        }

        body {
            min-height: 100%;
            width: 100vw !important;
            overflow-x: hidden !important;
            overflow-y: auto !important;
            -webkit-overflow-scrolling: touch;
            background: #000000 !important;
            background-color: #000000 !important;
            background-image: none !important;
            position: relative;
        }

        body::before {
            content: "";
            position: fixed;
            inset: 0;
            background: #000000;
            pointer-events: none;
            z-index: -1;
        }

        .gradio-container {
            width: 100% !important;
            max-width: 100% !important;
            margin: 0 auto !important;
            padding: 14px 18px 32px !important;
            background: var(--fv-panel) !important;
            position: relative;
            isolation: isolate;
        }

        #hero-wrapper,
        #hero-shell,
        #timing-title,
        #timing-display,
        .completed-clips-header-row,
        #completed-clips-status,
        #completed-clips-section {
            transform: translate3d(0, var(--fv-overscroll-shift), 0);
            transition: transform 240ms cubic-bezier(0.22, 1, 0.36, 1);
            will-change: transform;
        }

        .main {
            width: 100% !important;
            max-width: 100% !important;
            margin: 0 auto !important;
            background: var(--fv-panel) !important;
        }

        footer {
            display: none !important;
        }

        .gradio-container::before,
        .gradio-container::after {
            display: none !important;
        }

        .gr-block,
        .gr-form,
        .gr-box,
        .gr-group,
        .gr-panel,
        .block {
            background: transparent !important;
            box-shadow: none !important;
        }

        #root,
        #root > .app,
        #root .main,
        .gradio-container > .main,
        .gradio-container > .main > div,
        .gradio-container > div,
        .contain {
            width: 100% !important;
            max-width: 100% !important;
            background: var(--fv-panel) !important;
            background-color: var(--fv-panel) !important;
            background-image: none !important;
        }

        #hero-wrapper {
            width: 100% !important;
            background: transparent !important;
        }

        #hero-shell {
            display: grid;
            grid-template-columns: var(--fv-hero-side-width) minmax(0, 1fr) var(--fv-hero-side-width);
            align-items: center;
            gap: 18px;
            width: min(1320px, calc(100vw - 72px));
            max-width: 100%;
            min-height: 76px;
            margin: 0 auto 10px;
            padding: 12px 24px;
            border-radius: 22px;
            border: 1px solid var(--fv-border);
            background:
                radial-gradient(circle at center, rgba(24, 81, 200, 0.14), transparent 42%),
                linear-gradient(180deg, rgba(9, 14, 24, 0.94), rgba(4, 7, 14, 0.9));
            box-shadow:
                inset 0 1px 0 rgba(255, 255, 255, 0.04),
                0 16px 38px var(--fv-shadow);
        }

        #hero-shell::after {
            content: "";
            width: var(--fv-hero-side-width);
        }

        #hero-brand {
            display: flex;
            align-items: center;
            justify-content: flex-start;
            width: var(--fv-hero-side-width);
        }

        #hero-fastvideo-logo {
            height: 44px;
            width: auto;
        }

        #hero-nvidia-logo {
            height: 34px;
            width: auto;
        }

        #hero-title {
            grid-column: 2;
            color: var(--fv-text) !important;
            text-align: center;
            font-size: 1rem;
            font-weight: 850;
            line-height: 1.15;
            letter-spacing: 0.01em;
        }

        #app-shell {
            max-width: 900px;
            margin: 0 auto;
            gap: 12px;
            position: relative;
            z-index: 1;
        }

        #timing-title .timing-section-title {
            color: var(--fv-text) !important;
            text-align: center;
            font-size: 0.9rem;
            font-weight: 900;
            letter-spacing: 0.08em;
            margin-bottom: 8px;
        }

        #timing-display {
            margin-bottom: 14px !important;
        }

        #stage-card,
        #control-card {
            border-radius: 22px !important;
            border: 1px solid var(--fv-border) !important;
            box-shadow:
                inset 0 1px 0 rgba(255, 255, 255, 0.03),
                0 16px 38px var(--fv-shadow) !important;
            background: #000000 !important;
        }

        #stage-card {
            padding: 14px !important;
            margin-bottom: 0 !important;
        }

        #control-card {
            padding: 10px !important;
            margin-top: 12px !important;
            margin-bottom: 34px !important;
            position: relative !important;
            z-index: 20 !important;
            overflow: visible !important;
        }

        #stage-card > div,
        #stage-card > .gap,
        #stage-card .gap,
        #stage-card .gr-form,
        #stage-card .gr-box,
        #stage-card .gr-group,
        #stage-card .gr-panel,
        #stage-card .block,
        #stage-card .wrap,
        #control-card > div,
        #control-card > .gap,
        #control-card .gap,
        #control-card .gr-form,
        #control-card .gr-box,
        #control-card .gr-group,
        #control-card .gr-panel,
        #control-card .block,
        #control-card .wrap {
            background: transparent !important;
            box-shadow: none !important;
        }

        #stage-card-header {
            align-items: center !important;
            justify-content: space-between !important;
            gap: 12px !important;
            margin-bottom: 10px !important;
        }

        #stage-card .stage-title {
            color: var(--fv-text) !important;
            font-size: 1.02rem;
            font-weight: 900;
            line-height: 1.25;
        }

        #stage-badges {
            margin: 0 !important;
        }

        .generation-badges {
            display: flex;
            flex-wrap: wrap;
            justify-content: flex-end;
            gap: 8px;
        }

        .generation-badge {
            display: inline-flex;
            align-items: center;
            padding: 6px 10px;
            border-radius: 999px;
            border: 1px solid rgba(102, 122, 160, 0.48);
            background: var(--fv-chip);
            color: var(--fv-muted);
            font-size: 0.78rem;
            font-weight: 700;
            box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.04);
        }

        #stage-video,
        #stage-video .wrap,
        #stage-video video {
            width: 100% !important;
        }

        #stage-video {
            margin: 0 !important;
            padding: 14px !important;
            border-radius: 22px !important;
            overflow: hidden !important;
            background: #04070d !important;
            box-shadow: inset 0 0 0 1px rgba(37, 99, 235, 0.12) !important;
        }

        #stage-video .wrap {
            position: relative !important;
            border-radius: 18px !important;
            overflow: hidden !important;
            margin: 0 !important;
            padding: 0 !important;
            line-height: 0 !important;
            box-shadow: inset 0 0 0 1px rgba(37, 99, 235, 0.1) !important;
            background:
                radial-gradient(circle at center, rgba(18, 57, 140, 0.16), transparent 32%),
                #050913 !important;
        }

        #stage-video .wrap .progress-text,
        #stage-video .wrap .meta-text {
            line-height: 1.2 !important;
            height: auto !important;
        }

        #stage-video video {
            display: block !important;
            width: 100% !important;
            height: auto !important;
            max-height: none !important;
            border-radius: 18px !important;
            object-fit: contain !important;
            overflow: hidden !important;
            background:
                radial-gradient(circle at center, rgba(18, 57, 140, 0.16), transparent 32%),
                #050913 !important;
        }

        #stage-video .download-link,
        #stage-video .download-button,
        #stage-video a[download],
        #stage-video button[aria-label*="download" i],
        #stage-video [title*="download" i] {
            display: none !important;
        }

        #error-output {
            margin-top: 10px !important;
        }

        .stage-error-card {
            border-radius: 18px;
            border: 1px solid rgba(248, 113, 113, 0.35);
            background:
                linear-gradient(180deg, rgba(45, 12, 18, 0.96), rgba(24, 8, 13, 0.96));
            box-shadow:
                inset 0 1px 0 rgba(255, 255, 255, 0.03),
                0 10px 28px rgba(0, 0, 0, 0.18);
            padding: 14px 16px;
            color: #fecaca;
        }

        .stage-error-title {
            color: #fee2e2;
            font-size: 0.82rem;
            font-weight: 800;
            letter-spacing: 0.06em;
            text-transform: uppercase;
            margin-bottom: 8px;
        }

        .stage-error-copy {
            color: #fca5a5;
            font-size: 0.94rem;
            line-height: 1.55;
        }

        #control-card-title {
            margin-bottom: 8px !important;
        }

        #example-dropdown-label {
            margin-bottom: 6px !important;
            background: transparent !important;
        }

        .control-field-label {
            color: var(--fv-text) !important;
            font-size: 0.92rem;
            font-weight: 700;
            line-height: 1.3;
            padding: 0 2px;
        }

        #control-card .gr-form,
        #control-card .gr-group {
            background: transparent !important;
            box-shadow: none !important;
        }

        #example-dropdown,
        #model-selection,
        #input-image {
            margin-bottom: 10px !important;
            background: transparent !important;
        }

        #example-dropdown,
        #model-selection {
            position: relative !important;
            z-index: 21 !important;
        }

        #example-dropdown .block,
        #example-dropdown .gr-form,
        #example-dropdown .gr-box,
        #example-dropdown .gr-group,
        #example-dropdown .gr-panel {
            background: transparent !important;
            box-shadow: none !important;
        }

        #example-dropdown .wrap,
        #model-selection .wrap {
            background:
                linear-gradient(180deg, rgba(10, 16, 31, 0.94), rgba(5, 9, 18, 0.94)) !important;
            border: 1px solid rgba(62, 78, 108, 0.72) !important;
            border-radius: 14px !important;
        }

        #example-dropdown [role="listbox"],
        #example-dropdown ul,
        #example-dropdown .options {
            max-height: 220px !important;
            overflow-y: auto !important;
            overscroll-behavior: contain !important;
            background: #050913 !important;
        }

        #prompt-textbox {
            margin-bottom: 10px !important;
        }

        #prompt-textbox,
        #prompt-textbox > div,
        #prompt-textbox .block,
        #prompt-textbox .gr-form,
        #prompt-textbox .gr-box,
        #prompt-textbox .gr-group,
        #prompt-textbox .gr-panel,
        #prompt-textbox .wrap {
            background: transparent !important;
            box-shadow: none !important;
        }

        #prompt-textbox textarea {
            min-height: 72px !important;
            max-height: 72px !important;
            padding: 12px 14px !important;
            border: 1px solid rgba(62, 78, 108, 0.72) !important;
            border-radius: 14px !important;
            background:
                linear-gradient(180deg, rgba(8, 14, 28, 0.94), rgba(5, 9, 18, 0.94)) !important;
            color: var(--fv-text) !important;
            font-size: 0.92rem !important;
            font-weight: 400 !important;
            line-height: 1.45 !important;
            resize: none !important;
            overflow-y: auto !important;
            scrollbar-width: thin;
            scrollbar-color: rgba(91, 112, 154, 0.8) rgba(8, 14, 28, 0.3);
        }

        #prompt-textbox textarea::placeholder {
            color: rgba(211, 218, 234, 0.58) !important;
            font-weight: 500 !important;
        }

        #prompt-textbox textarea:focus {
            border-color: rgba(93, 124, 188, 0.86) !important;
            box-shadow: inset 0 0 0 1px rgba(46, 102, 255, 0.18) !important;
        }

        #prompt-textbox textarea::-webkit-scrollbar {
            width: 8px;
        }

        #prompt-textbox textarea::-webkit-scrollbar-track {
            background: rgba(8, 14, 28, 0.32);
            border-radius: 999px;
        }

        #prompt-textbox textarea::-webkit-scrollbar-thumb {
            background: rgba(91, 112, 154, 0.8);
            border-radius: 999px;
        }

        #control-card label {
            font-weight: 700 !important;
            color: var(--fv-text) !important;
        }

        #input-image {
            position: absolute !important;
            width: 1px !important;
            height: 1px !important;
            min-height: 0 !important;
            margin: 0 !important;
            padding: 0 !important;
            opacity: 0 !important;
            overflow: hidden !important;
            pointer-events: none !important;
        }

        #input-image .wrap,
        #input-image label {
            margin: 0 !important;
            padding: 0 !important;
            border: 0 !important;
            min-height: 0 !important;
            background: transparent !important;
            box-shadow: none !important;
        }

        #control-footer-row {
            align-items: center !important;
            justify-content: center !important;
            gap: 12px !important;
            margin-top: 2px !important;
        }

        #control-actions-row {
            display: flex !important;
            flex-wrap: nowrap !important;
            align-items: center !important;
            justify-content: center !important;
            gap: 8px !important;
            margin: 0 auto !important;
            width: fit-content !important;
            max-width: 100% !important;
        }

        #control-actions-row > * {
            flex: 0 0 auto !important;
            display: flex !important;
            align-items: center !important;
            align-self: center !important;
            width: auto !important;
            max-width: none !important;
            margin: 0 !important;
        }

        #upload-image-trigger {
            flex: 0 0 auto !important;
            display: flex !important;
            align-items: center !important;
            align-self: center !important;
            width: auto !important;
            margin: 0 !important;
        }

        #upload-image-trigger .upload-image-trigger {
            width: 52px;
            min-width: 52px;
            height: 46px;
            padding: 0;
            display: inline-flex;
            align-items: center;
            justify-content: center;
            border-radius: 10px;
            border: 1px solid rgba(86, 105, 142, 0.6);
            background:
                linear-gradient(180deg, rgba(15, 24, 42, 0.96), rgba(10, 16, 29, 0.96));
            color: transparent !important;
            font-size: 0 !important;
            line-height: 0 !important;
            text-indent: -9999px;
            overflow: hidden;
            position: relative;
            cursor: pointer;
            transform: translateY(2px);
            transition: border-color 160ms ease, transform 160ms ease, background 160ms ease;
        }

        #upload-image-trigger .upload-image-trigger::before {
            content: "";
            width: 24px;
            height: 24px;
            position: absolute;
            left: 50%;
            top: 50%;
            transform: translate(-50%, -50%);
            background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 24 24' fill='none'%3E%3Cpath d='M4.75 5.25C4.75 4.69772 5.19772 4.25 5.75 4.25H10.1C10.3652 4.25 10.6196 4.35536 10.8071 4.54289L11.4571 5.19289C11.6446 5.38043 11.899 5.48579 12.1642 5.48579H18.25C18.8023 5.48579 19.25 5.9335 19.25 6.48579V18.25C19.25 18.8023 18.8023 19.25 18.25 19.25H5.75C5.19771 19.25 4.75 18.8023 4.75 18.25V5.25Z' stroke='%23F5F7FB' stroke-width='1.7' stroke-linejoin='round'/%3E%3Ccircle cx='9.25' cy='9.1' r='1.55' stroke='%23F5F7FB' stroke-width='1.7'/%3E%3Cpath d='M6.5 16.5L10.15 12.85C10.4625 12.5375 10.9692 12.5375 11.2817 12.85L12.15 13.7183C12.4625 14.0308 12.9692 14.0308 13.2817 13.7183L17.5 9.5' stroke='%23F5F7FB' stroke-width='1.7' stroke-linecap='round' stroke-linejoin='round'/%3E%3Cpath d='M16.75 4.5V8.25' stroke='%23F5F7FB' stroke-width='1.7' stroke-linecap='round'/%3E%3Cpath d='M14.875 6.375H18.625' stroke='%23F5F7FB' stroke-width='1.7' stroke-linecap='round'/%3E%3C/svg%3E");
            background-repeat: no-repeat;
            background-position: center;
            background-size: 24px 24px;
        }

        #upload-image-trigger .upload-image-trigger svg {
            display: none !important;
        }

        #upload-image-trigger .upload-image-trigger:hover {
            border-color: rgba(120, 144, 188, 0.78);
            transform: translateY(1px);
            background:
                linear-gradient(180deg, rgba(18, 30, 54, 0.98), rgba(11, 18, 34, 0.98));
        }

        #image-upload-status-row {
            display: flex !important;
            justify-content: center !important;
            align-items: center !important;
            width: 100% !important;
            gap: 3px !important;
            margin: 0 0 10px !important;
        }

        #image-upload-status {
            min-height: 0 !important;
            display: flex !important;
            justify-content: center !important;
            align-items: center !important;
            flex: 0 0 auto !important;
            width: auto !important;
        }

        .image-upload-status {
            display: inline-flex;
            align-items: center;
            padding: 6px 10px;
            border-radius: 999px;
            border: 1px solid rgba(74, 94, 134, 0.52);
            background: rgba(7, 13, 24, 0.92);
            color: var(--fv-muted);
            font-size: 0.82rem;
            font-weight: 700;
            line-height: 1.2;
        }

        #clear-image-button {
            --button-secondary-background-fill: rgba(88, 16, 28, 0.24) !important;
            --button-secondary-background-fill-hover: rgba(120, 22, 38, 0.32) !important;
            --button-secondary-border-color: rgba(248, 113, 113, 0.94) !important;
            --button-secondary-border-color-hover: rgba(252, 165, 165, 1) !important;
            --button-secondary-text-color: #f87171 !important;
            flex: 0 0 auto !important;
            width: auto !important;
            min-width: 0 !important;
            margin-left: -1px !important;
            background: transparent !important;
        }

        #clear-image-button > div {
            background: transparent !important;
            border-radius: 999px !important;
        }

        #clear-image-button button::before,
        #clear-image-button button::after {
            display: none !important;
        }

        #clear-image-button button {
            min-width: 28px !important;
            width: 28px !important;
            height: 28px !important;
            min-height: 28px !important;
            padding: 0 !important;
            border-radius: 999px !important;
            border: 1.5px solid rgba(248, 113, 113, 0.94) !important;
            background-color: rgba(88, 16, 28, 0.24) !important;
            background:
                linear-gradient(180deg, rgba(120, 22, 38, 0.18), rgba(72, 12, 18, 0.28)) !important;
            color: #f87171 !important;
            font-size: 0.95rem !important;
            font-weight: 900 !important;
            line-height: 1 !important;
            text-transform: none !important;
            text-shadow: 0 0 10px rgba(248, 113, 113, 0.18) !important;
            box-shadow:
                inset 0 0 0 1px rgba(248, 113, 113, 0.22),
                inset 0 1px 0 rgba(255, 255, 255, 0.08),
                0 0 0 1px rgba(127, 29, 29, 0.18),
                0 8px 20px rgba(48, 7, 12, 0.18) !important;
            backdrop-filter: blur(18px) saturate(150%);
            -webkit-backdrop-filter: blur(18px) saturate(150%);
            transition:
                border-color 160ms ease,
                background 160ms ease,
                transform 160ms ease,
                color 160ms ease,
                box-shadow 160ms ease !important;
        }

        #clear-image-button button:hover {
            border-color: rgba(252, 165, 165, 1) !important;
            background-color: rgba(120, 22, 38, 0.32) !important;
            background:
                linear-gradient(180deg, rgba(148, 29, 50, 0.24), rgba(88, 16, 28, 0.36)) !important;
            color: #fca5a5 !important;
            transform: translateY(-1px);
            box-shadow:
                inset 0 0 0 1px rgba(252, 165, 165, 0.24),
                inset 0 1px 0 rgba(255, 255, 255, 0.1),
                0 10px 24px rgba(66, 9, 18, 0.24) !important;
        }

        #clear-image-button button:active {
            transform: translateY(0);
        }

        #run-button {
            --button-primary-background-fill: rgba(10, 16, 29, 0.96) !important;
            --button-primary-background-fill-hover: rgba(11, 18, 34, 0.98) !important;
            --button-primary-border-color: rgba(86, 105, 142, 0.6) !important;
            --button-primary-border-color-hover: rgba(120, 144, 188, 0.78) !important;
            --button-primary-text-color: #f5f7fb !important;
            flex: 0 0 auto !important;
            display: flex !important;
            align-items: center !important;
            align-self: center !important;
            width: 220px !important;
            max-width: 220px !important;
            min-width: 132px !important;
            height: 46px !important;
            border-radius: 10px !important;
            background: transparent !important;
            box-shadow: none !important;
            border: 0 !important;
            overflow: visible !important;
        }

        #run-button > div {
            background: transparent !important;
            border-radius: 10px !important;
            box-shadow: none !important;
        }

        #run-button::before,
        #run-button::after,
        #run-button button::before,
        #run-button button::after {
            display: none !important;
        }

        #run-button,
        #run-button button {
            width: 100% !important;
            min-width: 220px !important;
            max-width: 220px !important;
            height: 46px !important;
            min-height: 46px !important;
            padding: 0 18px !important;
            display: inline-flex !important;
            align-items: center !important;
            justify-content: center !important;
            border-radius: 10px !important;
            border: 1px solid rgba(86, 105, 142, 0.6) !important;
            background-color: rgba(10, 16, 29, 0.96) !important;
            background:
                linear-gradient(180deg, rgba(15, 24, 42, 0.96), rgba(10, 16, 29, 0.96)) !important;
            color: #f5f7fb !important;
            font-weight: 700 !important;
            line-height: 1 !important;
            letter-spacing: 0 !important;
            cursor: pointer !important;
            transform: translateY(2px);
            box-shadow:
                inset 0 1px 0 rgba(255, 255, 255, 0.04),
                0 10px 24px rgba(0, 0, 0, 0.18) !important;
            backdrop-filter: blur(18px) saturate(155%);
            -webkit-backdrop-filter: blur(18px) saturate(155%);
            transition:
                border-color 160ms ease,
                transform 160ms ease,
                background 160ms ease,
                box-shadow 160ms ease !important;
        }

        #run-button:hover,
        #run-button button:hover {
            border-color: rgba(120, 144, 188, 0.78) !important;
            background-color: rgba(11, 18, 34, 0.98) !important;
            transform: translateY(1px);
            background:
                linear-gradient(180deg, rgba(18, 30, 54, 0.98), rgba(11, 18, 34, 0.98)) !important;
            box-shadow:
                inset 0 1px 0 rgba(255, 255, 255, 0.08),
                0 12px 28px rgba(8, 17, 38, 0.28) !important;
        }

        #run-button:active,
        #run-button button:active {
            transform: translateY(2px);
        }

        .timing-shell {
            margin: 0 0 8px !important;
        }

        .timing-card {
            background: rgba(5, 11, 18, 0.92) !important;
            border: 1px solid rgba(66, 83, 116, 0.68) !important;
            color: var(--fv-text) !important;
            padding: 14px 10px;
            border-radius: 18px;
            text-align: center;
            min-height: 80px;
            display: flex;
            flex-direction: column;
            justify-content: center;
        }

        .timing-card-highlight {
            background: rgba(5, 11, 18, 0.98) !important;
            border: 1px solid rgba(21, 108, 255, 0.9) !important;
            box-shadow: inset 0 0 0 1px rgba(21, 108, 255, 0.24) !important;
        }

        .performance-card {
            background: rgba(5, 11, 18, 0.92) !important;
            border: 1px solid rgba(66, 83, 116, 0.68) !important;
            color: var(--fv-text) !important;
            padding: 14px 10px;
            border-radius: 18px;
            text-align: center;
        }

        .completed-clips-header-row {
            max-width: 900px !important;
            margin: 8px auto 6px !important;
            align-items: center !important;
        }

        #completed-clips-button-column {
            display: flex !important;
            justify-content: flex-end !important;
            align-items: center !important;
        }

        #clear-clips-button {
            --button-secondary-background-fill: rgba(10, 16, 29, 0.96) !important;
            --button-secondary-background-fill-hover: rgba(11, 18, 34, 0.98) !important;
            --button-secondary-border-color: rgba(86, 105, 142, 0.78) !important;
            --button-secondary-border-color-hover: rgba(120, 144, 188, 0.86) !important;
            --button-secondary-text-color: #f5f7fb !important;
            width: auto !important;
            background: transparent !important;
            border: 0 !important;
            box-shadow: none !important;
        }

        #completed-clips-button-column > *,
        #clear-clips-button > div {
            background: transparent !important;
            border-radius: 12px !important;
            box-shadow: none !important;
        }

        #clear-clips-button::before,
        #clear-clips-button::after,
        #clear-clips-button button::before,
        #clear-clips-button button::after {
            display: none !important;
        }

        #clear-clips-button,
        #clear-clips-button button,
        #completed-clips-button-column button {
            width: auto !important;
            min-width: 140px !important;
            height: 36px !important;
            min-height: 36px !important;
            padding: 0 14px !important;
            display: inline-flex !important;
            align-items: center !important;
            justify-content: center !important;
            border-radius: 10px !important;
            border: 1px solid rgba(86, 105, 142, 0.6) !important;
            background-color: rgba(10, 16, 29, 0.96) !important;
            background:
                linear-gradient(180deg, rgba(15, 24, 42, 0.96), rgba(10, 16, 29, 0.96)) !important;
            color: #f5f7fb !important;
            font-weight: 700 !important;
            line-height: 1 !important;
            cursor: pointer !important;
            transform: translateY(2px);
            box-shadow:
                inset 0 1px 0 rgba(255, 255, 255, 0.04),
                0 10px 24px rgba(0, 0, 0, 0.18) !important;
            backdrop-filter: blur(18px) saturate(155%);
            -webkit-backdrop-filter: blur(18px) saturate(155%);
            transition:
                border-color 160ms ease,
                background 160ms ease,
                transform 160ms ease,
                box-shadow 160ms ease !important;
        }

        #clear-clips-button:hover,
        #clear-clips-button button:hover,
        #completed-clips-button-column button:hover {
            border-color: rgba(120, 144, 188, 0.78) !important;
            background-color: rgba(11, 18, 34, 0.98) !important;
            background:
                linear-gradient(180deg, rgba(18, 30, 54, 0.98), rgba(11, 18, 34, 0.98)) !important;
            transform: translateY(1px);
            box-shadow:
                inset 0 1px 0 rgba(255, 255, 255, 0.08),
                0 12px 28px rgba(8, 17, 38, 0.28) !important;
        }

        #clear-clips-button:active,
        #clear-clips-button button:active,
        #completed-clips-button-column button:active {
            transform: translateY(2px);
        }

        .completed-clips-header-row h2 {
            margin: 0 !important;
            color: var(--fv-text) !important;
            letter-spacing: 0.08em;
            text-transform: uppercase;
            font-size: 0.95rem !important;
        }

        #completed-clips-status,
        #completed-clips-section {
            max-width: 900px;
            margin-left: auto;
            margin-right: auto;
        }

        #completed-clips-status {
            color: var(--fv-text) !important;
        }

        #completed-clips-section {
            border-radius: 20px;
            overflow-x: auto;
            overflow-y: hidden;
            padding-bottom: 10px;
            scrollbar-width: thin;
            scrollbar-color: rgba(91, 112, 154, 0.8) rgba(8, 14, 28, 0.3);
        }

        .completed-clips-grid {
            display: flex;
            flex-wrap: nowrap;
            align-items: stretch;
            gap: 18px;
            margin-top: 4px;
            width: max-content;
            min-width: 100%;
            padding-bottom: 2px;
        }

        #completed-clips-section::-webkit-scrollbar {
            height: 10px;
        }

        #completed-clips-section::-webkit-scrollbar-track {
            background: rgba(8, 14, 28, 0.32);
            border-radius: 999px;
        }

        #completed-clips-section::-webkit-scrollbar-thumb {
            background: rgba(91, 112, 154, 0.8);
            border-radius: 999px;
        }

        .completed-clip-card {
            flex: 0 0 340px;
            width: 340px;
            background:
                radial-gradient(circle at top, rgba(37, 99, 235, 0.16), transparent 38%),
                linear-gradient(180deg, rgba(7, 16, 31, 0.98), rgba(5, 10, 20, 0.98));
            border: 1px solid rgba(96, 165, 250, 0.28);
            border-radius: 22px;
            box-shadow: 0 20px 40px rgba(0, 0, 0, 0.2);
            overflow: hidden;
            padding: 14px;
        }

        .completed-clip-video-shell {
            background: rgba(15, 23, 42, 0.82);
            border: 1px solid rgba(148, 163, 184, 0.24);
            border-radius: 18px;
            overflow: hidden;
        }

        .completed-clip-video {
            display: block;
            width: 100%;
            aspect-ratio: 16 / 9;
            object-fit: cover;
            background: #020617;
        }

        .completed-clip-body {
            padding: 14px 2px 2px;
        }

        .completed-clip-title {
            color: #f8fafc;
            font-size: 1.02rem;
            font-weight: 700;
            line-height: 1.35;
            margin-bottom: 12px;
        }

        .completed-clip-meta {
            display: flex;
            flex-wrap: wrap;
            align-items: center;
            gap: 8px;
            margin-bottom: 12px;
        }

        .completed-clip-badge {
            display: inline-flex;
            align-items: center;
            padding: 6px 10px;
            border-radius: 999px;
            border: 1px solid rgba(148, 163, 184, 0.26);
            background: rgba(15, 23, 42, 0.9);
            color: #dbeafe;
            font-size: 0.82rem;
            font-weight: 600;
        }

        .completed-clip-duration {
            background: rgba(30, 41, 59, 0.94);
            color: #e2e8f0;
        }

        .completed-clip-prompt {
            border: 1px solid rgba(148, 163, 184, 0.18);
            border-radius: 14px;
            background: rgba(8, 15, 29, 0.96);
            overflow: hidden;
        }

        .completed-clip-prompt summary {
            cursor: pointer;
            list-style: none;
            padding: 10px 14px;
            color: #f8fafc;
            font-weight: 600;
        }

        .completed-clip-prompt summary::-webkit-details-marker {
            display: none;
        }

        .completed-clip-prompt div {
            padding: 0 14px 14px;
            color: #cbd5e1;
            line-height: 1.5;
        }

        .completed-clips-empty {
            border: 1px dashed rgba(148, 163, 184, 0.28);
            border-radius: 18px;
            padding: 28px;
            text-align: center;
            background: rgba(15, 23, 42, 0.4);
        }

        .completed-clips-empty-title {
            color: #e2e8f0 !important;
            font-size: 1rem;
            font-weight: 700;
            margin-bottom: 8px;
        }

        .completed-clips-empty-copy {
            color: #94a3b8 !important;
            line-height: 1.5;
        }

        @media (max-width: 980px) {
            #hero-shell {
                width: calc(100vw - 28px);
            }

            #hero-shell {
                grid-template-columns: 1fr;
                justify-items: center;
            }

            #hero-title {
                grid-column: auto;
                font-size: 1rem;
                text-align: center;
            }

            #stage-card-header,
            #control-footer-row {
                flex-direction: column !important;
                align-items: center !important;
            }

            #control-actions-row {
                width: 100% !important;
                justify-content: center !important;
            }

            #upload-image-trigger,
            #upload-image-trigger .upload-image-trigger,
            #run-button {
                width: 100% !important;
            }

            .generation-badges {
                justify-content: flex-start;
            }

        }
        </style>
        """)
        gr.HTML("""
        <script>
        (() => {
            const marker = "data-fv-overscroll-init";
            const root = document.documentElement;
            if (root.getAttribute(marker) === "1") return;
            root.setAttribute(marker, "1");

            const focusPromptTextbox = () => {
                const textarea = document.querySelector("#prompt-textbox textarea");
                if (!textarea) return false;
                if (document.activeElement && document.activeElement !== document.body) return true;
                textarea.focus();
                textarea.setSelectionRange(0, 0);
                return true;
            };

            const focusPromptTextboxWithRetry = (attempts = 10) => {
                if (focusPromptTextbox() || attempts <= 0) return;
                window.setTimeout(() => focusPromptTextboxWithRetry(attempts - 1), 120);
            };

            let releaseTimer = null;

            const applyOverscrollShift = (value) => {
                root.style.setProperty("--fv-overscroll-shift", `${value}px`);
                if (releaseTimer) {
                    window.clearTimeout(releaseTimer);
                }
                releaseTimer = window.setTimeout(() => {
                    root.style.setProperty("--fv-overscroll-shift", "0px");
                }, 160);
            };

            window.addEventListener("wheel", (event) => {
                const scroller = document.scrollingElement || document.documentElement;
                const maxScroll = scroller.scrollHeight - window.innerHeight;
                const atTop = scroller.scrollTop <= 0 && event.deltaY < 0;
                const atBottom = scroller.scrollTop >= maxScroll - 1 && event.deltaY > 0;

                if (!atTop && !atBottom) {
                    return;
                }

                const shift = Math.max(-22, Math.min(22, -event.deltaY * 0.12));
                applyOverscrollShift(shift);
            }, { passive: true });

            window.setTimeout(() => focusPromptTextboxWithRetry(), 120);
        })();
        </script>
        """)

        def on_example_select(example_label):
            if example_label and example_label in example_labels:
                index = example_labels.index(example_label)
                return examples[index]
            return gr.update()

        example_dropdown.change(
            fn=on_example_select,
            inputs=example_dropdown,
            outputs=prompt_textbox,
        )

        def on_input_image_change(input_image):
            has_image = bool(input_image)
            return (
                render_input_image_status(input_image),
                gr.update(visible=has_image),
            )

        input_image.change(
            fn=on_input_image_change,
            inputs=input_image,
            outputs=[image_upload_status, clear_image_button],
        )

        def clear_selected_image():
            return (
                gr.update(value=None),
                render_input_image_status(None),
                gr.update(visible=False),
            )

        clear_image_button.click(
            fn=clear_selected_image,
            inputs=None,
            outputs=[input_image, image_upload_status, clear_image_button],
        )

        def on_model_selection_change(selected_model):
            height, width, num_frames = get_default_values(selected_model)
            return (
                gr.update(value=height),
                gr.update(value=width),
                gr.update(value=num_frames),
                render_generation_badges(selected_model),
            )

        model_selection.change(
            fn=on_model_selection_change,
            inputs=model_selection,
            outputs=[
                height_display,
                width_display,
                num_frames_display,
                stage_badges,
            ],
        )

        def summarize_clip_status(session_clips):
            session_clips = session_clips or []
            count = len(session_clips)
            if count == 0:
                status = "Your creations for this browser session will appear here."
            elif count == 1:
                status = "1 creation saved for this browser session."
            else:
                status = f"{count} creations saved for this browser session."
            return status

        def load_session_gallery(session_clips=None):
            session_clips = session_clips or []
            return (
                render_completed_clips(session_clips),
                summarize_clip_status(session_clips),
            )

        def clear_session_gallery():
            return (
                render_completed_clips([]),
                "Your creations for this browser session were cleared.",
                [],
            )

        def handle_generation(
            model_selection,
            prompt,
            input_image,
            session_clips=None,
        ):
            session_clips = session_clips or []
            normalized_prompt = prompt.strip()
            if not normalized_prompt:
                message = "Prompt is empty."
                gr.Warning(message)
                return (
                    gr.update(value=None, visible=True),
                    gr.update(
                        visible=True,
                        value=render_error_message(message),
                    ),
                    gr.update(visible=False, value=create_timing_placeholder()),
                    gr.update(visible=False),
                    render_completed_clips(session_clips),
                    summarize_clip_status(session_clips),
                    session_clips,
                )

            safety_check = get_prompt_safety_check(normalized_prompt)
            if safety_check.blocked:
                message = safety_check.message or "Prompt was blocked."
                gr.Warning(message)
                return (
                    gr.update(value=None, visible=True),
                    gr.update(
                        visible=True,
                        value=render_prompt_blocked_message(
                            message,
                            safety_check.category,
                        ),
                    ),
                    gr.update(visible=False, value=create_timing_placeholder()),
                    gr.update(visible=False),
                    render_completed_clips(session_clips),
                    summarize_clip_status(session_clips),
                    session_clips,
                )

            try:
                prompt_for_generation = maybe_enhance_prompt(
                    normalized_prompt,
                    curated_prompts,
                )
            except RuntimeError as error:
                message = str(error)
                gr.Warning(message)
                return (
                    gr.update(value=None, visible=True),
                    gr.update(
                        visible=True,
                        value=render_error_message(message),
                    ),
                    gr.update(visible=False, value=create_timing_placeholder()),
                    gr.update(visible=False),
                    render_completed_clips(session_clips),
                    summarize_clip_status(session_clips),
                    session_clips,
                )

            if prompt_for_generation != normalized_prompt:
                enhanced_safety_check = get_prompt_safety_check(
                    prompt_for_generation
                )
                if enhanced_safety_check.blocked:
                    message = (
                        "Prompt enhancement produced text that was blocked by "
                        "the safety filter. Please revise the prompt and try "
                        "again."
                    )
                    gr.Warning(message)
                    return (
                        gr.update(value=None, visible=True),
                        gr.update(
                            visible=True,
                            value=render_prompt_blocked_message(
                                message,
                                enhanced_safety_check.category,
                            ),
                        ),
                        gr.update(
                            visible=False,
                            value=create_timing_placeholder(),
                        ),
                        gr.update(visible=False),
                        render_completed_clips(session_clips),
                        summarize_clip_status(session_clips),
                        session_clips,
                    )

            result_path, seed_or_error, num_frames, generation_time, e2e_latency = generate_video(
                prompt_for_generation, model_selection, input_image
            )
            timing_details = create_timing_display(
                inference_time=generation_time,
                total_time=e2e_latency,
                stage_execution_times=[],
                num_frames=num_frames,
            )
            if result_path and os.path.exists(result_path):
                session_clips = _record_session_clip(
                    session_clips,
                    output_path=result_path,
                    prompt=prompt_for_generation,
                    model_name=model_selection,
                    num_frames=num_frames,
                    generation_time=generation_time,
                )
                return (
                    gr.update(value=result_path, visible=True),
                    gr.update(visible=False),
                    gr.update(visible=True, value=timing_details),
                    gr.update(visible=True),
                    render_completed_clips(session_clips),
                    summarize_clip_status(session_clips),
                    session_clips,
                )
            else:
                return (
                    gr.update(value=None, visible=True),
                    gr.update(
                        visible=True,
                        value=render_error_message(str(seed_or_error)),
                    ),
                    gr.update(visible=False, value=create_timing_placeholder()),
                    gr.update(visible=False),
                    render_completed_clips(session_clips),
                    summarize_clip_status(session_clips),
                    session_clips,
                )

        demo.load(
            fn=load_session_gallery,
            outputs=[completed_clips_html, completed_clips_status],
        )

        clear_clips_button.click(
            fn=clear_session_gallery,
            outputs=[
                completed_clips_html,
                completed_clips_status,
                completed_clips_state,
            ],
            queue=False,
        )

        run_button.click(
            fn=handle_generation,
            inputs=[
                model_selection,
                prompt_textbox,
                input_image,
                completed_clips_state,
            ],
            outputs=[
                result,
                error_output,
                timing_display,
                timing_title,
                completed_clips_html,
                completed_clips_status,
                completed_clips_state,
            ],
            concurrency_limit=1,
            show_progress_on=result,
            queue=False,
        )

    return demo

gradio_local_demo_matrixgame2.py

import argparse
import asyncio
import os
import time

import gradio as gr
import torch
import uvicorn
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import HTMLResponse, FileResponse

from fastvideo.entrypoints.streaming_generator import StreamingVideoGenerator
from fastvideo.models.dits.matrixgame2.utils import expand_action_to_frames


VARIANT_CONFIG = {
    "Matrix-Game-2.0-Base": {
        "model_path": "FastVideo/Matrix-Game-2.0-Base-Distilled-Diffusers",
        "keyboard_dim": 4,
        "mode": "universal",
        "image_url": "https://raw.githubusercontent.com/SkyworkAI/Matrix-Game/main/Matrix-Game-2/demo_images/universal/0000.png",
    },
    "Matrix-Game-2.0-GTA": {
        "model_path": "FastVideo/Matrix-Game-2.0-GTA-Distilled-Diffusers",
        "keyboard_dim": 2,
        "mode": "gta_drive",
        "image_url": "https://raw.githubusercontent.com/SkyworkAI/Matrix-Game/main/Matrix-Game-2/demo_images/gta_drive/0000.png",
    },
    "Matrix-Game-2.0-TempleRun": {
        "model_path": "FastVideo/Matrix-Game-2.0-TempleRun-Distilled-Diffusers",
        "keyboard_dim": 7,
        "mode": "templerun",
        "image_url": "https://raw.githubusercontent.com/SkyworkAI/Matrix-Game/main/Matrix-Game-2/demo_images/temple_run/0000.png",
    },
}

MODEL_PATH_MAPPING = {
    name: config["model_path"] for name, config in VARIANT_CONFIG.items()
}


CAM_VALUE = 0.1
KEYBOARD_MAP_UNIVERSAL = {
    "W (Forward)": [1, 0, 0, 0],
    "S (Back)": [0, 1, 0, 0],
    "A (Left)": [0, 0, 1, 0],
    "D (Right)": [0, 0, 0, 1],
    "Q (Stop)": [0, 0, 0, 0],
}
KEYBOARD_MAP_GTA = {
    "W (Forward)": [1, 0],
    "S (Back)": [0, 1],
    "Q (Stop)": [0, 0],
}
KEYBOARD_MAP_TEMPLERUN = {
    "Q (Run)": [1, 0, 0, 0, 0, 0, 0],
    "W (Jump)": [0, 1, 0, 0, 0, 0, 0],
    "S (Slide)": [0, 0, 1, 0, 0, 0, 0],
    "Z (Turn Left)": [0, 0, 0, 1, 0, 0, 0],
    "C (Turn Right)": [0, 0, 0, 0, 1, 0, 0],
    "A (Left)": [0, 0, 0, 0, 0, 1, 0],
    "D (Right)": [0, 0, 0, 0, 0, 0, 1],
}


CAMERA_MAP_UNIVERSAL = {
    "U (Center)": [0, 0],
    "I (Up)": [CAM_VALUE, 0],
    "K (Down)": [-CAM_VALUE, 0],
    "J (Left)": [0, -CAM_VALUE],
    "L (Right)": [0, CAM_VALUE],
}
CAMERA_MAP_GTA = {
    "Q (Straight)": [0, 0],
    "A (Steer Left)": [0, -CAM_VALUE],
    "D (Steer Right)": [0, CAM_VALUE],
}

def setup_model_environment(model_path: str) -> None:
    # if "fullattn" in model_path.lower():
    #     os.environ["FASTVIDEO_ATTENTION_BACKEND"] = "FLASH_ATTN"
    # else:
    #     os.environ["FASTVIDEO_ATTENTION_BACKEND"] = "VIDEO_SPARSE_ATTN"
    os.environ["FASTVIDEO_ATTENTION_BACKEND"] = "FLASH_ATTN"
    os.environ["FASTVIDEO_STAGE_LOGGING"] = "1"

def create_timing_display(inference_time, total_time, stage_execution_times, num_frames):
    dit_denoising_time = f"{stage_execution_times[5]:.2f}s" if len(stage_execution_times) > 5 else "N/A"

    timing_html = f"""
    <div style="margin: 10px 0;">
        <h3 style="text-align: center; margin-bottom: 10px;">⏱️ Timing Breakdown</h3>
        <div style="display: grid; grid-template-columns: repeat(5, 1fr); gap: 10px; margin-bottom: 10px;">
            <div class="timing-card timing-card-highlight">
                <div style="font-size: 20px;">🚀</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">DiT Denoising</div>
                <div style="font-size: 16px; color: #ffa200; font-weight: bold;">{dit_denoising_time}</div>
            </div>
            <div class="timing-card">
                <div style="font-size: 20px;">🧠</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">E2E (w. vae/text encoder)</div>
                <div style="font-size: 16px; color: #2563eb;">{inference_time:.2f}s</div>
            </div>
            <div class="timing-card">
                <div style="font-size: 20px;">🎬</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">Video Encoding</div>
                <div style="font-size: 16px; color: #dc2626;">N/A</div>
            </div>
            <div class="timing-card">
                <div style="font-size: 20px;">🌐</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">Network Transfer</div>
                <div style="font-size: 16px; color: #059669;">N/A</div>
            </div>
            <div class="timing-card">
                <div style="font-size: 20px;">📊</div>
                <div style="font-weight: bold; margin: 3px 0; font-size: 14px;">Total Processing</div>
                <div style="font-size: 18px; color: #0277bd;">{total_time:.2f}s</div>
            </div>
        </div>"""

    if inference_time > 0:
        fps = num_frames / inference_time
        timing_html += f"""
        <div class="performance-card" style="margin-top: 15px;">
            <span style="font-weight: bold;">Generation Speed: </span>
            <span style="font-size: 18px; color: #6366f1; font-weight: bold;">{fps:.1f} frames/second</span>
        </div>"""

    return timing_html + "</div>"

def get_action_tensors(mode: str, keyboard_key: str, mouse_key: str | None):
    if mode == "universal":
        keyboard = torch.tensor(KEYBOARD_MAP_UNIVERSAL.get(keyboard_key, [0, 0, 0, 0])).cuda()
        mouse = torch.tensor(CAMERA_MAP_UNIVERSAL.get(mouse_key, [0, 0])).cuda()
    elif mode == "gta_drive":
        keyboard = torch.tensor(KEYBOARD_MAP_GTA.get(keyboard_key, [0, 0])).cuda()
        mouse = torch.tensor(CAMERA_MAP_GTA.get(mouse_key, [0, 0])).cuda()
    elif mode == "templerun":
        keyboard = torch.tensor(KEYBOARD_MAP_TEMPLERUN.get(keyboard_key, [1, 0, 0, 0, 0, 0, 0])).cuda()
        mouse = None
    else:
        raise ValueError(f"Unknown mode: {mode}")

    return {"keyboard": keyboard, "mouse": mouse}

def create_gradio_interface(generators: dict[str, StreamingVideoGenerator], loaded_model_name: str):
    initial_config = VARIANT_CONFIG.get(loaded_model_name, VARIANT_CONFIG["Matrix-Game-2.0-Base"])
    initial_mode = initial_config["mode"]

    if initial_mode == "universal":
        initial_kb_choices = list(KEYBOARD_MAP_UNIVERSAL.keys())
        initial_mouse_choices = list(CAMERA_MAP_UNIVERSAL.keys())
        initial_mouse_visible = True
    elif initial_mode == "gta_drive":
        initial_kb_choices = list(KEYBOARD_MAP_GTA.keys())
        initial_mouse_choices = list(CAMERA_MAP_GTA.keys())
        initial_mouse_visible = True
    else:  # templerun
        initial_kb_choices = list(KEYBOARD_MAP_TEMPLERUN.keys())
        initial_mouse_choices = []
        initial_mouse_visible = False

    theme = gr.themes.Base().set(
        button_primary_background_fill="#2563eb",
        button_primary_background_fill_hover="#1d4ed8",
        button_primary_text_color="white",
        slider_color="#2563eb",
        checkbox_background_color_selected="#2563eb",
    )

    with gr.Blocks(title="FastVideo - Matrix Game 2.0", theme=theme) as demo:
        game_state = gr.State({
            "initialized": False,
            "current_model": None,
            "block_idx": 0,
            "max_blocks": 50,
        })

        # Header
        gr.Image("assets/full.svg", show_label=False, container=False, height=80)

        gr.HTML("""
        <div style="text-align: center; margin-bottom: 10px;">
            <p style="font-size: 18px;"> Make Video Generation Go Blurrrrrrr </p>
            <p style="font-size: 18px;"> <a href="https://github.com/hao-ai-lab/FastVideo/tree/main" target="_blank">Code</a> | <a href="https://hao-ai-lab.github.io/blogs/fastvideo_post_training/" target="_blank">Blog</a> | <a href="https://hao-ai-lab.github.io/FastVideo/" target="_blank">Docs</a>  </p>
        </div>
        """)

        with gr.Accordion("🎥 What Is FastVideo?", open=False):
            gr.HTML("""
            <div style="padding: 20px; line-height: 1.6;">
                <p style="font-size: 16px; margin-bottom: 15px;">
                    FastVideo is an inference and post-training framework for diffusion models. It features an end-to-end unified pipeline for accelerating diffusion models, starting from data preprocessing to model training, finetuning, distillation, and inference. FastVideo is designed to be modular and extensible, allowing users to easily add new optimizations and techniques. Whether it is training-free optimizations or post-training optimizations, FastVideo has you covered.
                </p>
            </div>
            """)

        # Model Selection
        with gr.Row():
            model_selection = gr.Dropdown(
                choices=[loaded_model_name],
                value=loaded_model_name,
                label="Select Model",
                interactive=False
            )


        # Main Layout
        with gr.Row(equal_height=True, elem_classes="main-content-row"):
            with gr.Column(scale=1, elem_classes="advanced-options-column"):
                with gr.Group():
                    gr.HTML("<div style='margin: 0 0 15px 0; text-align: center; font-size: 16px;'>Game Controls</div>")

                    with gr.Group():
                        gr.HTML("<div style='font-size: 14px; margin-bottom: 5px; font-weight: bold;'>🎮 Keyboard Control</div>")
                        keyboard_action = gr.Radio(
                            choices=initial_kb_choices,
                            value=initial_kb_choices[0] if initial_kb_choices else None,
                            label="Movement",
                            show_label=False,
                            interactive=True
                        )

                    with gr.Group(visible=initial_mouse_visible) as mouse_group:
                        gr.HTML("<div style='font-size: 14px; margin-bottom: 5px; font-weight: bold;'>🖱️ Mouse/Camera Control</div>")
                        mouse_action = gr.Radio(
                            choices=initial_mouse_choices if initial_mouse_visible else [],
                            value=initial_mouse_choices[0] if initial_mouse_choices else None,
                            label="Camera",
                            show_label=False,
                            interactive=True
                        )

                    with gr.Row():
                        action_btn = gr.Button("Start", variant="primary")
                        stop_btn = gr.Button("Stop", variant="stop")

                    gr.HTML("<div style='margin-top: 15px;'></div>")

                    seed = gr.Slider(
                        label="Seed",
                        minimum=0,
                        maximum=1000000,
                        step=1,
                        value=1024,
                    )
                    randomize_seed = gr.Checkbox(label="Randomize seed", value=False)
                    seed_output = gr.Number(label="Used Seed")

                    block_counter = gr.Textbox(label="Progress", value="Block: 0 / 50", interactive=False, lines=1)


            # Right Column: Video Output
            with gr.Column(scale=1, elem_classes="video-column"):
                video_output = gr.Video(
                    label="Generated Video",
                    show_label=True,
                    height=466,
                    width=600,
                    container=True,
                    elem_classes="video-component",
                    autoplay=True
                )

        # Styles
        gr.HTML("""
        <style>
        .center-button {
            display: flex !important;
            justify-content: center !important;
            height: 100% !important;
            padding-top: 1.4em !important;
        }

        .gradio-container {
            max-width: 1200px !important;
            margin: 0 auto !important;
        }

        .main {
            max-width: 1200px !important;
            margin: 0 auto !important;
        }

        .gr-form, .gr-box, .gr-group {
            max-width: 1200px !important;
        }

        .gr-video {
            max-width: 500px !important;
            margin: 0 auto !important;
        }

        .main-content-row {
            display: flex !important;
            align-items: flex-start !important;
            min-height: 500px !important;
            gap: 20px !important;
        }

        .advanced-options-column,
        .video-column {
            display: flex !important;
            flex-direction: column !important;
            flex: 1 !important;
            min-height: 400px !important;
            align-items: stretch !important;
        }

        .video-column > * {
            margin-top: 0 !important;
        }

        .video-column .gr-video,
        .video-component {
            margin-top: 0 !important;
            padding-top: 0 !important;
        }

        .video-column .gr-video .gr-form {
            margin-top: 0 !important;
        }

        .advanced-options-column .gr-group,
        .video-column .gr-video {
            margin-top: 0 !important;
            vertical-align: top !important;
        }

        .advanced-options-column > *:last-child,
        .video-column > *:last-child {
            flex-grow: 0 !important;
        }

        @media (max-width: 1400px) {
            .main-content-row {
                min-height: 600px !important;
            }

            .advanced-options-column,
            .video-column {
                min-height: 600px !important;
            }
        }

        @media (max-width: 1200px) {
            .main-content-row {
                flex-direction: column !important;
                align-items: stretch !important;
            }

            .advanced-options-column,
            .video-column {
                min-height: auto !important;
                width: 100% !important;
            }
        }

        .timing-card {
            background: var(--background-fill-secondary) !important;
            border: 1px solid var(--border-color-primary) !important;
            color: var(--body-text-color) !important;
            padding: 10px;
            border-radius: 8px;
            text-align: center;
            min-height: 80px;
            display: flex;
            flex-direction: column;
            justify-content: center;
        }

        .timing-card-highlight {
            background: var(--background-fill-primary) !important;
            border: 2px solid var(--color-accent) !important;
        }

        .performance-card {
            background: var(--background-fill-secondary) !important;
            border: 1px solid var(--border-color-primary) !important;
            color: var(--body-text-color) !important;
            padding: 10px;
            border-radius: 6px;
            text-align: center;
        }

        .gr-number input[readonly] {
            background-color: var(--background-fill-secondary) !important;
            border: 1px solid var(--border-color-primary) !important;
            color: var(--body-text-color-subdued) !important;
            cursor: default !important;
            text-align: center !important;
            font-weight: 500 !important;
        }
        </style>
        """)

        # UI update based on model selection
        def on_model_change(model_name):
            config = VARIANT_CONFIG.get(model_name, VARIANT_CONFIG["Matrix-Game-2.0-Base"])
            mode = config["mode"]

            if mode == "universal":
                kb_choices = list(KEYBOARD_MAP_UNIVERSAL.keys())
                mouse_choices = list(CAMERA_MAP_UNIVERSAL.keys())
                mouse_visible = True
            elif mode == "gta_drive":
                kb_choices = list(KEYBOARD_MAP_GTA.keys())
                mouse_choices = list(CAMERA_MAP_GTA.keys())
                mouse_visible = True
            else:  # templerun
                kb_choices = list(KEYBOARD_MAP_TEMPLERUN.keys())
                mouse_choices = []
                mouse_visible = False

            return (
                gr.update(choices=kb_choices, value=kb_choices[0] if kb_choices else None),
                gr.update(choices=mouse_choices, value=mouse_choices[0] if mouse_choices else None, visible=mouse_visible),
                gr.update(visible=mouse_visible),
            )

        model_selection.change(
            fn=on_model_change,
            inputs=model_selection,
            outputs=[keyboard_action, mouse_action, mouse_group]
        )

        def start_game(model_name, seed_val, randomize, state):
            if randomize:
                seed_val = torch.randint(0, 1000000, (1,)).item()

            config = VARIANT_CONFIG.get(model_name)
            if not config:
                return state, seed_val, "Block: 0 / 50", None, "", gr.update(), gr.update()

            generator = generators.get(config["model_path"])
            if not generator:
                return state, seed_val, "Block: 0 / 50", None, "", gr.update(), gr.update()

            # If already initialized, clean up first
            if state.get("initialized"):
                try:
                    # Clear accumulated frames without saving
                    generator.accumulated_frames = []
                    generator.executor.execute_streaming_clear()
                except Exception as e:
                    print(f"Warning: cleanup error: {e}")

            # Streaming parameters
            num_latent_frames_per_block = 3
            max_blocks = 50
            total_latent_frames = num_latent_frames_per_block * max_blocks
            num_frames = (total_latent_frames - 1) * 4 + 1

            actions = {
                "keyboard": torch.zeros((num_frames, config["keyboard_dim"])),
                "mouse": torch.zeros((num_frames, 2))
            }
            grid_sizes = torch.tensor([150, 44, 80])

            output_dir = os.path.abspath("outputs/matrixgame2")
            os.makedirs(output_dir, exist_ok=True)
            video_path = os.path.join(output_dir, f"video_{int(time.time())}.mp4")

            generator.reset(
                prompt="",
                image_path=config["image_url"],
                mouse_cond=actions["mouse"].unsqueeze(0),
                keyboard_cond=actions["keyboard"].unsqueeze(0),
                grid_sizes=grid_sizes,
                num_frames=num_frames,
                height=352,
                width=640,
                num_inference_steps=50,
                output_path=video_path,
            )

            new_state = {
                "initialized": True,
                "current_model": model_name,
                "block_idx": 0,
                "max_blocks": max_blocks,
                "video_path": video_path,
                "frames_per_block": num_latent_frames_per_block * 4,
                "mode": config["mode"],
                "seed": seed_val,
            }

            return new_state, seed_val, "Block: 0 / 50", None, gr.update(value="Step"), gr.update(interactive=True)

        async def step_game(keyboard_key, mouse_key, model_name, state):
            if not state.get("initialized"):
                return state, state.get("seed", 0), "Block: 0 / 50", None, gr.update(), gr.update()

            # total_start_time = time.time()
            config = VARIANT_CONFIG.get(model_name)
            generator = generators.get(config["model_path"])
            mode = state["mode"]
            frames_per_block = state["frames_per_block"]

            # Parse inputs to tensors
            action = get_action_tensors(mode, keyboard_key, mouse_key)
            keyboard_cond, mouse_cond = expand_action_to_frames(action, frames_per_block)

            # run step async
            # inference_start_time = time.time()
            frames, block_future = await generator.step_async(keyboard_cond, mouse_cond)
            # inference_time = time.time() - inference_start_time

            # wait for block file to be written
            block_path = await asyncio.to_thread(block_future.result) if block_future else None
            state["block_idx"] = generator.block_idx
            block_str = f"Block: {state['block_idx']} / {state['max_blocks']}"

            # total_time = time.time() - total_start_time

            # Timing breakdown
            # timing_html = create_timing_display(inference_time, total_time, [], frames_per_block)

            return state, state.get("seed", 0), block_str, block_path, gr.update(), gr.update()

        def stop_game(model_name, state):
            if not state.get("initialized"):
                return {"initialized": False}, 0, "Block: 0 / 50", None, gr.update(value="Start"), gr.update(interactive=False)

            config = VARIANT_CONFIG.get(model_name)
            generator = generators.get(config["model_path"])

            final_path = state.get("video_path")
            generator.finalize(final_path)

            return {"initialized": False}, state.get("seed", 0), "Block: 0 / 50", final_path, gr.update(value="Start"), gr.update(interactive=False)

        async def handle_action(keyboard_key, mouse_key, model_name, seed_val, randomize, state):
            if not state.get("initialized"):
                return start_game(model_name, seed_val, randomize, state)
            else:
                return await step_game(keyboard_key, mouse_key, model_name, state)

        action_btn.click(
            fn=handle_action,
            inputs=[keyboard_action, mouse_action, model_selection, seed, randomize_seed, game_state],
            outputs=[game_state, seed_output, block_counter, video_output, action_btn, stop_btn]
        )

        stop_btn.click(
            fn=stop_game,
            inputs=[model_selection, game_state],
            outputs=[game_state, seed_output, block_counter, video_output, action_btn, stop_btn]
        )

        gr.HTML("""
        <div style="text-align: center; margin-top: 10px; margin-bottom: 15px;">
            <p style="font-size: 16px; margin: 0;">Note that this demo is meant to showcase Matrix Game's quality and that under a large number of requests, generation speed may be affected.</p>
        </div>
        """)

    return demo


def main():
    parser = argparse.ArgumentParser(description="Matrix Game Gradio Demo")
    parser.add_argument("--model", type=str, default="Matrix-Game-2.0-Base",
                        choices=list(VARIANT_CONFIG.keys()),
                        help="Model variant to load")
    parser.add_argument("--host", type=str, default="0.0.0.0")
    parser.add_argument("--port", type=int, default=7860)
    args = parser.parse_args()

    # Load the selected model
    config = VARIANT_CONFIG[args.model]
    model_path = config["model_path"]

    print(f"Loading model: {model_path}")
    setup_model_environment(model_path)
    generator = StreamingVideoGenerator.from_pretrained(
        model_path,
        num_gpus=1,
        use_fsdp_inference=True,
        dit_cpu_offload=True,
        vae_cpu_offload=False,
        text_encoder_cpu_offload=True,
        pin_cpu_memory=True,
    )

    generators = {model_path: generator}

    demo = create_gradio_interface(generators, args.model)

    print(f"Starting Gradio at http://{args.host}:{args.port}")

    # FastAPI Wrapper
    app = FastAPI()

    @app.get("/logo.png")
    def get_logo():
        return FileResponse(
            "assets/full.svg",
            media_type="image/svg+xml",
            headers={
                "Cache-Control": "public, max-age=3600",
                "Access-Control-Allow-Origin": "*"
            }
        )

    @app.get("/favicon.ico")
    def get_favicon():
        favicon_path = "assets/icon-simple.svg"

        if os.path.exists(favicon_path):
            return FileResponse(
                favicon_path, 
                media_type="image/svg+xml",
                headers={
                    "Cache-Control": "public, max-age=3600",
                    "Access-Control-Allow-Origin": "*"
                }
            )
        else:
            raise HTTPException(status_code=404, detail="Favicon not found")

    @app.get("/", response_class=HTMLResponse)
    def index(request: Request):
        base_url = str(request.base_url).rstrip('/')
        return f"""
        <!DOCTYPE html>
        <html lang="en">
        <head>
            <meta charset="UTF-8" />
            <meta name="viewport" content="width=device-width, initial-scale=1.0" />

            <title>FastVideo - Matrix Game 2.0</title>
            <meta name="title" content="MatrixGame2.0">
            <meta name="description" content="Make video generation go blurrrrrrr">
            <meta name="keywords" content="FastVideo, video generation, AI, machine learning, Matrix Game 2.0">

            <meta property="og:type" content="website">
            <meta property="og:url" content="{base_url}/">
            <meta property="og:title" content="FastVideo - Matrix Game 2.0">
            <meta property="og:description" content="Make video generation go blurrrrrrr">
            <meta property="og:image" content="{base_url}/logo.png">
            <meta property="og:image:width" content="1200">
            <meta property="og:image:height" content="630">
            <meta property="og:site_name" content="MatrixGame2.0">

            <meta property="twitter:card" content="summary_large_image">
            <meta property="twitter:url" content="{base_url}/">
            <meta property="twitter:title" content="MatrixGame2.0">
            <meta property="twitter:description" content="Make video generation go blurrrrrrr">
            <meta property="twitter:image" content="{base_url}/logo.png">
            <link rel="icon" type="image/png" sizes="32x32" href="/favicon.ico">
            <link rel="icon" type="image/png" sizes="16x16" href="/favicon.ico">
            <link rel="apple-touch-icon" href="/favicon.ico">
            <style>
                body, html {{
                    margin: 0;
                    padding: 0;
                    height: 100%;
                    overflow: hidden;
                }}
                iframe {{
                    width: 100%;
                    height: 100vh;
                    border: none;
                }}
            </style>
        </head>
        <body>
            <iframe src="/gradio" width="100%" height="100%" style="border: none;"></iframe>
        </body>
        </html>
        """

    app = gr.mount_gradio_app(
        app, 
        demo, 
        path="/gradio",
        allowed_paths=[os.path.abspath("outputs"), os.path.abspath("fastvideo-logos")]
    )

    uvicorn.run(app, host=args.host, port=args.port)


if __name__ == "__main__":
    main()

prompts_final.txt

A dynamic shot of a sleek black motorcycle accelerating down an empty highway at sunset. The bike's engine roars as it gains speed, smoke trailing from the tires. The rider, wearing a black leather jacket and helmet, leans forward with determination, gripping the handlebars tightly. The camera follows the motorcycle from a distance, capturing the dust kicked up behind it, then zooms in to show the intense focus on the rider's face. The background showcases the endless road stretching into the horizon with vibrant orange and pink hues of the setting sun. Medium shot transitioning to close-up.
A Jedi Master Yoda, recognizable by his green skin, large ears, and wise wrinkles, is performing on a small stage, strumming a guitar with great concentration. Yoda wears a casual robe and sits on a stool, his eyes closed as he plays, fully immersed in the music. The stage is dimly lit with spotlights highlighting Yoda, creating a mystical atmosphere. The background shows a live audience watching intently. Medium close-up shot focusing on Yoda's expressive face and hands moving gracefully over the guitar strings.
A cute, fluffy panda bear is preparing a meal in a cozy, modern kitchen. The panda is standing at a wooden countertop, wearing a white chef’s hat and apron. It skillfully stirs a pot on the stove with one hand while holding a spatula in the other. The kitchen is well-lit, with appliances and cabinets in pastel colors, creating a warm and inviting atmosphere. The panda moves gracefully, with a focused and determined expression, as steam rises from the pot. Medium shot focusing on the panda’s actions at the stove.
In a futuristic Tokyo rooftop during a heavy rainstorm, a robotic DJ stands behind a turntable, spinning vinyl records in a cyberpunk night setting. The robot has metallic, sleek body parts with glowing blue LED lights, and it moves gracefully with the beat. Raindrops create a shimmering effect as they hit the ground and the DJ. The surrounding environment features neon signs, towering skyscrapers, and a dark, misty atmosphere. The camera starts with a wide shot of the city skyline before zooming in on the DJ performing. Sci-fi, fantasy.
A realistic animated scene featuring a polar bear playing a guitar. The polar bear is standing upright, wearing a cozy fur vest and fingerless gloves. It holds the guitar with both hands, strumming the strings with one hand while plucking them with the other, showcasing natural, fluid motions. The polar bear's expressive face shows concentration and joy as it plays. The background is a snowy Arctic landscape with icebergs and a clear blue sky. The scene captures the bear from a mid-shot angle, focusing on its interaction with the guitar.
The scene opens to a breathtaking view of a tranquil ocean horizon at dusk, displaying a vibrant tapestry of oranges, pinks, and purples as the sun sets. In the foreground, tall, swaying palm trees frame the scene, their silhouettes stark against the colorful sky. The ocean itself shimmers with reflections of the sunset, creating a peaceful, almost ethereal atmosphere. A small boat can be seen in the distance, centered on the horizon, adding a sense of scale and solitude to the scene. The waves gently lap the shore, creating faint patterns on the sandy beach, which stretches across the foreground. Above, the sky is dotted with scattered clouds that catch the last light of the day, enhancing the drama and beauty of the scene. The overall mood is serene and contemplative, capturing a perfect moment of nature’s grandeur.
A large, modern semi-truck accelerating down an empty highway, gaining speed with each second. The truck's powerful engine roars as it moves forward, smoke billowing from the tires. The camera starts from a wide shot, capturing the truck in the distance, then smoothly zooms in to follow the vehicle as it speeds up. The truck's headlights illuminate the road ahead, casting a bright glow. The truck driver can be seen through the windshield, focused and determined. The background shows the vast openness of the highway stretching into the horizon under a clear blue sky. Medium to close-up shots of the truck as it accelerates.
Soft blue light pulses from the blade’s rune-etched hilt, illuminating nearby moss-covered roots and ferns. The surrounding trees are tall and gnarled, their branches curling like claws overhead. Fog swirls gently at ground level, parting slightly as a figure in a cloak approaches from the distance. Medium shot slowly zooming toward the sword, emphasizing its mystical aura.
The video opens with a tranquil scene in the heart of a dense forest, emphasizing two large, textured tree trunks in the foreground framing the view. Sunlight filters through the canopy above, casting intricate patterns of light and shadow on the trees and the ground. Between the tree trunks, a clear view of a calm, muddy river unfolds, its surface shimmering under the gentle sunlight. The riverbank is decorated with a variety of small bushes and vibrant foliage, subtly transitioning into the deep greens of tall, leafy plants. In the background, the dense forest looms, filled with dark, towering trees, their branches intertwining to form an intricate canopy. The scene is bathed in the soft glow of the sun, creating a serene and picturesque setting. Occasional sunbeams pierce through the foliage, adding a magical aura to the landscape. The vibrant reds and oranges of the smaller plants add contrast, bringing warmth to the earthy tones of the scenery. Overall, this harmonious blend of natural elements creates a peaceful and idyllic forest setting.
A lone figure stands on a large, moss-covered rock, surrounded by the soft rush of a nearby stream. The figure is wearing white sneakers and shorts, with a plaid shirt that hangs loosely in the breeze. The lighting creates dramatic shadows, enhancing the textures of the rock and the subtle movement of the water below. In the background, a waterfall cascades into the stream, completing this tranquil and serene nature scene.
In an industrial setting, a person leans casually against a railing, exuding a sense of confidence and composure. They are wearing a striking outfit, consisting of a vibrant, patterned jacket over a simple white crop top, creating a bold contrast. The atmosphere is infused with warm, ambient lighting that casts soft shadows on the concrete walls and metallic surfaces. Intricate wiring and pipes form an intricate backdrop, enhancing the urban aesthetic. Their relaxed posture and direct, engaging gaze suggest a sense of ease in this industrial environment. This scene encapsulates a blend of modern fashion and gritty, urban architecture, creating a visually compelling narrative.