New Open-Source Toolkit Brings Full Multimodal AI Fine-Tuning to Apple Silicon Macs Without NVIDIA Hardware

Apr 08, 2026
GitHub
Article image for New Open-Source Toolkit Brings Full Multimodal AI Fine-Tuning to Apple Silicon Macs Without NVIDIA Hardware

Summary

A new open-source toolkit called Gemma Multimodal Fine-Tuner now lets developers fine-tune Google's Gemma AI models on text, images, and audio directly on Apple Silicon Macs, requiring no NVIDIA hardware and supporting terabyte-scale cloud datasets while keeping data fully private on-device.

Key Points

  • A new open-source toolkit called Gemma Multimodal Fine-Tuner enables developers to fine-tune Google's Gemma 4 and 3n models on text, images, and audio natively on Apple Silicon Macs using PyTorch and Metal Performance Shaders, with no NVIDIA GPU required.
  • The toolkit is currently the only solution supporting all three modalities — text, image, and audio — on Apple Silicon, while also offering cloud data streaming from Google Cloud Storage and BigQuery, allowing training on terabyte-scale datasets without filling local storage.
  • Built on Hugging Face Transformers and PEFT LoRA, the project supports use cases including domain-specific speech recognition, visual question answering, multimodal assistants, and fully private on-device pipelines where data never leaves the user's machine.

Tags

Read Original Article