Ai2 Releases Molmo2: Open-Source Vision-Language Model Capable of Video Understanding and Object Tracking
Ai2 releases Molmo2, a cutting-edge open-source vision-language model capable of video understanding, object tracking, and pointing across single-image, multi-image, and video tasks, with model sizes ranging from 4B to 8B parameters and fast inference support via Hugging Face Transformers and vLLM.