New Open-Source Toolkit Brings Full Multimodal AI Fine-Tuning to Apple Silicon Macs Without NVIDIA Hardware

Apr 08, 2026

GitHub

Article image for New Open-Source Toolkit Brings Full Multimodal AI Fine-Tuning to Apple Silicon Macs Without NVIDIA Hardware

Summary

A new open-source toolkit called Gemma Multimodal Fine-Tuner now lets developers fine-tune Google's Gemma AI models on text, images, and audio directly on Apple Silicon Macs, requiring no NVIDIA hardware and supporting terabyte-scale cloud datasets while keeping data fully private on-device.

Key Points

A new open-source toolkit called Gemma Multimodal Fine-Tuner enables developers to fine-tune Google's Gemma 4 and 3n models on text, images, and audio natively on Apple Silicon Macs using PyTorch and Metal Performance Shaders, with no NVIDIA GPU required.
The toolkit is currently the only solution supporting all three modalities — text, image, and audio — on Apple Silicon, while also offering cloud data streaming from Google Cloud Storage and BigQuery, allowing training on terabyte-scale datasets without filling local storage.
Built on Hugging Face Transformers and PEFT LoRA, the project supports use cases including domain-specific speech recognition, visual question answering, multimodal assistants, and fully private on-device pipelines where data never leaves the user's machine.

New Open-Source Toolkit Brings Full Multimodal AI Fine-Tuning to Apple Silicon Macs Without NVIDIA Hardware

Summary

Key Points

Tags