Facebook Research Unveils Tuna-2: A Unified Multimodal AI Model That Ditches Traditional Vision Encoders for Direct Pixel Processing
Facebook Research unveils Tuna-2, a groundbreaking multimodal AI model that ditches traditional vision encoders in favor of direct pixel patch processing, outperforming predecessors on diverse benchmarks while supporting both image understanding and generation tasks in 7B and 2B parameter sizes.