TL;DR: Our new real-time inference stack in FastVideo enables Dreamverse, a prototype for a new interface where users can vibe direct their own “multiverse” of videos.
[TODO: Insert demo video]
AI video generation is already good enough to make a convincing clip. But real creative work is not about getting a clip in one shot. It’s about iteration. An idea appears, you test it: keep the subject, change the camera angle, continue the scene, and try again. The problem is that ideas move faster than generations. If every attempt takes minutes, the creative loop breaks; your imagination moves on before the video does.
We think there is a better interface for AI video generation, which is why we created Dreamverse, an interface that enables a new workflow called vibe directing.
Vibe directing is to video what vibe coding is to software. Instead of rewriting giant prompts from scratch, you talk to the system in natural language and steer the video through fast revision. Keep the subject, change the background, slow the camera, or anything else! Rather than jamming everything into a single prompt, iterate with multiple simple prompts.
This kind of workflow is only possible when video generation is done in real-time. Current video generation models like Sora take 1-2 minutes to generate a 5s 1080p clip. We can do it in ~4.55 seconds. In other words, our inference stack in FastVideo can generate a clip faster than you can watch it. This capability completely changes the feel of video generation inference; it stops feeling like a passive experience and starts feeling like directing your own scenes. This allows us to create a longer 30-second scene that unfolds as a chain of these 5-second clips, while keeping a chat window open so you can keep directing in real time.
This matters because serious video creation is almost never perfect on the first try. A shot may look off. Motion may break halfway through. Characters may drift between frames. In addition, creators may have multiple versions of a scene and want to play them out to determine which is better. In practice, creators are constantly making small adjustments and trying again. When revisions are slow, it’s much more difficult to explore many ideas. However, when the next result comes back almost immediately, it becomes possible to quickly try many ideas rather than just one. Better creative work comes from a faster feedback loop, not just a better model.
We think this is where video generation is going: a way to direct the video as it unfolds. The best systems will not just generate impressive clips. They will let people explore ideas at the speed of their imagination.
That is what vibe directing is all about. Step in the Dreamverse today with our demo.
The Team
Core contributors: Will Lin*, Matthew Noto*, Junda Su*, Yechen Xu*, Peiyuan Zhang* (* equal contribution)
Contributors: Shao Duan, Minshen Zhang, Loay Rashid, Kevin Lin
UI: Tina Mai
Tech leads: Will Lin, Hao Zhang
Advisors: Hao Zhang (corresponding), Danyang Zhuo, Eric Xing, Zhengzhong Liu
Learn More
Note: Dreamverse is not yet pushed to the public branch of FastVideo as we are still cleaning up the code.