ByteDance Launches Lance, A Compact 3B Multimodal AI Model That Generates, Edits, And Understands Images And Video In One Framework

May 21, 2026
GitHub
Article image for ByteDance Launches Lance, A Compact 3B Multimodal AI Model That Generates, Edits, And Understands Images And Video In One Framework

Summary

ByteDance launches Lance, a compact 3B-parameter multimodal AI model that handles image and video understanding, generation, and editing in a single unified framework, delivering competitive benchmark performance against larger models while remaining fully open-source on Hugging Face.

Key Points

  • ByteDance releases Lance, a 3B-active-parameter unified multimodal model capable of handling image and video understanding, generation, and editing within a single framework.
  • Lance is trained from scratch on a 128-A100-GPU budget using a staged multi-task recipe, and delivers competitive benchmark performance against larger models across image generation, image editing, and video generation tasks.
  • The model is publicly available on Hugging Face with open-source inference scripts, Gradio demo support, and ready-to-run benchmark tools covering GenEval, DPG, GEdit, and VBench evaluations.

Tags

Read Original Article