From Ollama to MLX - My Journey with Apple's Game-Changing AI Framework

January 18, 2025 • Matt Williams

Today I want to share my journey exploring MLX, Apple’s open source machine learning framework that’s making waves in the AI development community. After spending months building Ollama and helping developers run AI models on all platforms, I’ve been fascinated by how MLX is changing the game for fine-tuning on Macs.

Let me back up a bit. When we were building Ollama, we always had this dream of making fine-tuning as simple as running models. While we never quite got there (at least while I was on the team), MLX has emerged as this incredibly powerful solution that actually delivers on that vision.

What’s particularly interesting about MLX is that it represents a dramatic shift in Apple’s approach to open source. I remember about 6-7 years ago, I had this informal chat at AWS ReInvent about a potential evangelist role at Apple. The person I spoke with – one of Siri’s original creators – was pretty blunt: Apple’s strict communication policies meant I wouldn’t be able to talk publicly about… well, anything. So seeing Apple release MLX on GitHub? That’s kind of mind-blowing.

The beauty of MLX lies in its simplicity. Installation is just a pip install mlx-lm away – though I have to add a caveat here. Python environments can be incredibly fragile (honestly, it’s weird how the AI community has latched onto such a brittle language), so your mileage may vary with the installation process.

A quick note on finding the right resources: while MLX’s main repository (github.com/ml-explore/mlx) has some examples, the real treasure trove for fine-tuning is in a separate repository: mlx-examples. But even there, you need to be careful. Skip the root-level LoRA folder – it’s outdated. Instead, head straight to the LLM-specific LoRA guide in the llms/mlx_lm directory. While this guide provides working examples, think of it more as a teaching tool than a ready-to-use solution. It’s designed to help you understand the fundamentals so you can build your own fine-tuning pipeline. This documentation is your best starting point, offering both up-to-date instructions and a foundation for developing your own implementations.

One thing I’ve learned while working with MLX is that data preparation is crucial. The framework expects JSONL files with very specific formatting – just ‘prompt’ and ‘completion’ fields, nothing else, though there are some other formats you can use as well (https://tvl.st/mlxloradata). I actually hit a wall with this when I first tried using a HuggingFace dataset that had all these extra fields like correctness and helpfulness. MLX just silently failed, which was… fun (?!?!) to debug.

Here’s a pro tip I learned the hard way: use jq for JSON manipulation. When your dataset is too large for VS Code to handle (yeah, that happened), jq becomes your best friend for transforming those files into exactly what MLX needs. For example, to strip out unwanted fields from your JSONL, you can use:

jq -c 'del(.correctness, .helpfulness, .coherence, .complexity, .verbosity)' oldvalid.jsonl > valid.jsonl

And if you need to rename fields to match MLX’s expected format (like changing ‘response’ to ‘completion’), this command does the trick:

jq -c 'if has("response") then {prompt: .prompt, completion: .response} else . end' train.jsonl > train_fixed.jsonl

These commands saved me hours of manual data wrangling.

The actual fine-tuning process in MLX uses something called LoRA (Low-Rank Adaptation). Think of it like this: instead of retraining an entire chef’s knowledge of cooking, you’re just teaching them a few specific recipes. It’s way more efficient and works beautifully on Apple Silicon. When you start training, you’ll see something like “Trainable parameters: 0.071%” – that’s LoRA in action, only adjusting a tiny fraction of the model’s parameters.

I’ve found that MLX’s lazy evaluation approach is particularly clever. It only processes what it needs when it needs it, which means you can handle larger batch sizes and longer sequences without hitting memory walls. It’s the kind of optimization that just works without you having to think about it.

For anyone wanting to try this out, start small. Use —iters 100 instead of the default 1000 when you’re testing things out. Watch your validation loss (it should decrease over time), and don’t be afraid to adjust your batch size if you’re running into memory issues.

Looking back at my time with Ollama and now working with MLX, it’s fascinating to see how Mac-based AI development has evolved. While I still hope Ollama adds easy cross-platform fine-tuning someday, MLX gives us a powerful solution right now for anyone working with Apple Silicon.

I’m still exploring all the possibilities with MLX, and I’d love to hear about your experiences if you try it out. This feels like just the beginning of what’s possible with AI development on Macs, and I’m excited to see where it goes from here.