Try Bernini AI Now
Enter a text prompt or upload an image to generate AI video in your browser. Switch between Bernini AI generation, editing, and other engines from the same interface.
This image will be the starting frame of your video
0 / 20000
Made with Bernini AI
Browse video clips, edits, and stills produced with Bernini AI and the other engines on this platform. See the range of styles and tasks before you start your own.








What Is Bernini AI?
Bernini AI runs Bernini, ByteDance's open framework for video generation and editing, released under the Apache 2.0 license (paper: "Bernini: Latent Semantic Planning for Video Diffusion"). Rather than a single-purpose generator, Bernini handles a full range of tasks in one model — text-to-video, reference-to-video, prompt-based video editing, reference-guided editing, and content insertion, plus image generation and editing. It is built from an MLLM-based semantic planner and a DiT-based renderer, a design that lets the same system both reason about a scene and render it at the pixel level.
What makes Bernini AI distinctive is its division of labor. The MLLM planner, built on Qwen2.5-VL, predicts what the result should look like as a semantic representation and uses chain-of-thought reasoning for complex instructions; the DiT renderer, built on Wan2.2, turns that plan into frames and — for editing tasks — draws on the source video's own features to keep fine detail intact. A Segment-Aware 3D positional encoding (SA-3D RoPE) helps the model keep multiple inputs, such as a source clip and several reference images, cleanly separated. In ByteDance's own benchmarks, Bernini reaches the first tier of leading closed-source models on video editing and leads on subject consistency, though the team notes that raw text-to-video visual quality still trails the strongest closed systems.
Running Bernini yourself takes Hopper-class GPUs such as the H100 and a multi-GPU setup — out of reach for most creators. Bernini AI removes that barrier by bringing the workflow online: generate video from a text prompt, edit an existing clip by describing the change, or drive a new clip from up to five reference images, all in the browser. Alongside Bernini, the platform offers other video and image engines so you can match the right tool to each task — but Bernini AI keeps generation and editing in one place, with no GPU to manage and nothing to install.
Other AI Engines Available Online
Bernini AI leads the workspace for video generation and editing. These additional engines cover formats and tasks beyond Bernini — extra video models, high-resolution image generation, and post-production editing.
Seedance
VideoByteDance's commercial video engine, available here for text-to-video and image-to-video with synchronized audio. A strong alternative when you want a polished, ready-to-publish clip from a single prompt.
Kling
VideoKuaishou's Kling generates multi-shot video across standard and pro modes, handling scene transitions in one prompt. It also powers Motion Control, transferring full-body motion from a reference clip onto a character image.
Veo
VideoGoogle DeepMind's Veo produces short, cinema-grade clips with built-in audio and strong environmental realism. It supports first-and-last-frame control for precise scene bookending.
GPT Image
ImageOpenAI's image model, tuned for accurate text inside the image. The pick when your prompt includes readable labels, logos, or signage that must stay legible in the output.
Flux Pro
ImageBlack Forest Labs' image engine built for speed and throughput across multiple aspect ratios — suited to product shots, social content, and fast iteration.
Nano Banana
ImageGoogle's character-consistency image engine. It accepts multiple reference images to hold a face, outfit, or brand mark steady across a whole series.
Seedream
ImageByteDance's native 4K image engine, producing ultra-high-resolution stills across wide aspect ratios with step-by-step visual reasoning for coherent, detailed scenes.
Runway Gen-4
VideoRunway Gen-4 Aleph for video-to-video editing. Supply footage and a prompt to restyle, recolor, or alter objects while keeping the original motion — built for post-production.
Generation and Editing in One Model
Bernini AI unifies text-to-video, video editing, and reference-driven generation in a single model. Dedicated image engines round out the workspace for design, typography, and high-resolution stills.
AI Video Generator
Bernini AI generates video from a text prompt and edits existing footage from the same workspace — restyle clips, swap or insert objects, and change weather or style. Reference-to-video drives fresh clips from up to five images, and Kling and Veo are on hand for multi-shot sequences and cinema-grade output.
Create VideoAI Image Generator
High-resolution image generation and editing alongside Bernini AI video. GPT Image for accurate in-image text, Seedream for native 4K across wide aspect ratios, Flux for fast iteration, and Nano Banana for consistent characters across a series. Text-to-image and image-to-image side by side.
Create ImageWhy Use Bernini AI
Bernini AI brings ByteDance's video generation and editing model online — one place to create, edit, and reference-drive video without a multi-GPU rig.
Generation and Editing in One Model
Bernini AI is a unified framework that handles text-to-video, reference-to-video, and prompt-based video editing in a single model — no separate tools for creating versus editing. The same Bernini AI workspace takes you from a blank prompt to a finished, edited clip.
Multimodal Reference Control
Bernini AI reasons over text, source images, and source video together. Reference-to-video accepts up to five reference images to anchor a subject, object, or style, while reference-guided editing can swap garments, replace objects, or change materials, weather, and overall look across an existing clip.
Semantic-Planning Architecture
Bernini AI pairs an MLLM-based semantic planner with a DiT-based renderer. The planner, built on Qwen2.5-VL, reasons about what the edit or scene should become in embedding space; the renderer, built on Wan2.2, synthesizes the pixels and uses source-video features to preserve fine detail during editing.
Strong Editing Consistency
In ByteDance's own evaluation, Bernini reaches the first tier of leading closed-source models on video editing, with particular strength in keeping unedited regions stable and preserving subject identity. That makes Bernini AI well suited to targeted edits where everything outside the change has to stay intact.
Run Bernini AI Online — No H100
Bernini is open source under Apache 2.0, but self-hosting calls for Hopper-class GPUs like the H100. Bernini AI runs the workflow in your browser instead — no GPU to rent, nothing to install — so you can generate and edit video from any machine.
How to Use Bernini AI in 3 Steps
From prompt to finished clip in three steps — no GPU, no installation, no prior experience.
Write a prompt or upload reference media
Describe the video you want, or upload what you want to work from — a source clip to edit, or up to five reference images to drive a subject or style. For pure text-to-video, a prompt alone is enough. Bernini AI reads text, image, and video inputs together.
Generate or edit with Bernini AI
Choose your task — text-to-video, reference-to-video, or prompt-based editing of an existing clip. Bernini AI's semantic planner works out the target, then the renderer produces the frames. Want a different look? Adjust the prompt or references and run it again.
Download your video
Generation runs in the cloud and finishes in minutes depending on length and complexity. Download the result when it is ready, with commercial usage on paid plans — ready for social media, advertising, branded content, and client work.
Frequently Asked Questions About Bernini AI
What Bernini AI is, how to use it online, what it can generate and edit, and how it compares to other AI video models.
Start Creating with Bernini AI
Bernini AI brings ByteDance's video generation and editing model online. Generate video from a prompt, edit existing footage, and drive clips from reference images.