ByteDance Launches Lance, A Compact 3B Multimodal AI Model That Generates, Edits, And Understands Images And Video In One Framework
Summary
ByteDance launches Lance, a compact 3B-parameter multimodal AI model that handles image and video understanding, generation, and editing in a single unified framework, delivering competitive benchmark performance against larger models while remaining fully open-source on Hugging Face.
Key Points
- ByteDance releases Lance, a 3B-active-parameter unified multimodal model capable of handling image and video understanding, generation, and editing within a single framework.
- Lance is trained from scratch on a 128-A100-GPU budget using a staged multi-task recipe, and delivers competitive benchmark performance against larger models across image generation, image editing, and video generation tasks.
- The model is publicly available on Hugging Face with open-source inference scripts, Gradio demo support, and ready-to-run benchmark tools covering GenEval, DPG, GEdit, and VBench evaluations.