Google introduces Lumiere

Google Research has developed a space-time diffusion model called Lumiere for realistic video generation.
Google's Lumiere
Lumiere - Google Research
Space-Time Text-to-Video diffusion model by Google Research.

Lumiere is a new artificial intelligence video model developed by Google Research. It uses a space-time text-to-video diffusion model to generate videos from text prompts and a single reference image. The model can animate the content of an image within a specific user-provided region and generate videos in the target style by utilizing fine-tuned text-to-image model weights.

Lumiere was trained on a dataset of 30 million videos and text captions. It can create consistent, smooth, and realistic movement across a full video clip, addressing the issue of inconsistency of movement that many existing AI video models struggle with.

The model can generate video from different inputs, including text-to-video, which works like a regular image generator and generates a video from a text prompt, and image-to-video. It can also animate specific regions of an image with relative ease, and offer inpainting capabilities such as changing the style of clothing or type of animal featured.


Lumiere uses a "Space-Time U-Net architecture" that generates the entire temporal duration of the video at once through a single pass in the model. This method of generating video deviates from other existing models, which synthesize distant keyframes, making video consistency challenging to achieve.

Despite its advanced capabilities, Lumiere is currently not available to the public. However, the underlying technology may find itself incorporated into branded products, similar to Google's previous AI tools.

