Wan AI

Open and Advanced Large-Scale Video & Image Generative Models by Alibaba Cloud
Wan 2.1 by Wan AI
Wan_AI Creative Drawing_AI Painting_Artificial Intelligence_Large Model
Wan is an AI creative drawing platform under Alibaba, offering capabilities such as text-to-image, image editing, text-to-video, and image-to-video for AI-powered artistic creation.
通义万相_AI创意作画_AI绘画_人工智能-阿里云
通义万相是阿里云通义旗下的AI创意作画平台,可提供AI艺术创作,可支持文生图、图生图、涂鸦作画、虚拟模特、个人写真等多场景的图片创作能力
Wan-AI (Wan-AI)
Org profile for Wan-AI on Hugging Face, the AI community building the future.

Wan 2.1 is an advanced AI model developed by Alibaba for video and image generation, released as an open-source tool. This model stands out in the field as it offers both Chinese and English text generation, making it a versatile option for global use.

Features of Wan 2.1

Multilingual Text Support

Wan 2.1 is capable of generating text in both Chinese and English, enhancing its applicability across different language markets.

Advanced Video Generation

The model supports multiple multimedia tasks such as text-to-video, image-to-video, video editing, text-to-image, and video-to-audio. This makes it a comprehensive tool for creating and editing media content.

Model Specifications

Wan 2.1 comes in multiple versions tailored for various uses:

  • T2V-1.3B: Requires 8.19 GB VRAM and can generate a 5-second 480P video in about 4 minutes, suitable for consumer-grade GPUs.
  • T2V-14B: Utilizes 14 billion parameters for processing large data volumes, leading to higher quality results. It supports video resolutions of 480P and 720P.

Technical Architecture

Wan 2.1 features a 3D causal variational autoencoder (VAE) architecture, which allows encoding and decoding any length of 1080P videos while maintaining historical temporal data integrity2. This is complemented by a space-time attention mechanism that aids in creating realistic motion at 1080p resolution and 30 FPS.

Performance

Wan 2.1 achieves high scores in industry benchmarks like VBench, surpassing other models such as OpenAI's Sora and Google's Veo 2 with a score of 84.7%. This underscores its capability in handling complex motions and maintaining spatial relationships in video sequences.

Open-Source Release

Its open-source nature is a significant milestone, similar to the impact of Stable Diffusion in image generation12. This accessibility encourages a community of developers to innovate and expand its applications, potentially lowering costs for users and contributing to broader AI-driven creativity

About the author
Shinji

Shinji

Evangelist

AI Pill

Take AI 💊 Deep Dive Into The Coming Wave.

AI Pill

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Pill.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.