Is MLX the best Fine Tuning Framework

January 17, 2025

videos

Here is the script:

Ever tried fine-tuning an AI model on your Mac and hit a wall?

Hey, I’m Matt, and after helping build Ollama, I’ve spent countless hours experimenting with AI development on Apple Silicon. Today, I’m diving into MLX - Apple’s native machine learning framework that’s changing how we approach fine-tuning on Macs.

If you’ve been trying to do AI development on your Mac, you know it’s not always straightforward. Maybe you’ve hit memory limits, struggled with compatibility issues, or spent hours trying to get models running efficiently. I’ve been there - it’s actually why we built Ollama in the first place.

In this video, I’ll show you how to use MLX to fine-tune models right on your Mac - no cloud services needed. If you’re interested in practical, no-nonsense AI development on all the platforms, hit subscribe and turn on notifications. I share weekly insights from my experience being on the team that built Ollama, and focusing on real solutions that actually work.

Drop a comment with your biggest Mac AI development challenge - whether it’s fine-tuning, deployment, or anything else. I read every comment and use them to plan future videos.

When we built Ollama, we made sure it ran beautifully on Apple Silicon from day one. But fine-tuning models? That’s been a different story - until MLX came along.

Fine-tuning was actually something we had thought about including in Ollama - to make it as simple as running models. While the Ollama team hasn’t gotten to it yet, MLX now offers exactly the kind of straightforward approach we envisioned.

Let me show you what I mean. The commands are clean and intuitive, making fine-tuning accessible to anyone with a recent Mac. While I’m still hoping Ollama adds easy cross-platform fine-tuning in the future, MLX gives us a powerful solution today.

The existence of MLX itself is quite remarkable. Apple releasing an internal project on GitHub represents a significant shift in their approach to open source. This hits close to home for me - about 6-7 years ago, I had an informal chat at AWS ReInvent about an evangelist role at Apple. The conversation was with one of the original creators of Siri who joined Apple through acquisition.

What stuck with me was his candid warning: despite the public-facing nature of the role, Apple’s strict communication policies would prevent me from speaking publicly about anything - work-related or not. This level of secrecy was deeply ingrained in Apple’s culture.

That’s why MLX’s public release is so fascinating. I’d love to know what internal discussions and shifts in thinking led Apple to share this technology with the world. It suggests an interesting evolution in how Apple approaches developer tools and community engagement.

Before we dive into fine-tuning, let’s get our environment set up. MLX’s installation is surprisingly simple:

pip install mlx-lm

That’s it. No complex dependencies, no hour-long compilation process. This simplicity is one of MLX’s strongest features. That said, it is using Python, so in a lot of cases that is true, but depending on your setup it could take a lot longer. It is so easy to screw up your python environment and I can’t really help you solve that. I constantly find it so strange that this AI community seems to focus on a language that is as brittle and problematic as Python, but there you go.

Before we download our model, let’s talk about how we’re going to fine-tune it. Traditional fine-tuning requires updating all of a model’s parameters - that’s billions of values that need to be adjusted. This is expensive and memory-intensive, especially on consumer hardware like our Macs.

This is where LoRA comes in. Think of your base model like a professional chef who’s great at cooking in general. Instead of completely retraining the chef (which would be expensive and time-consuming), LoRA is like giving them a small set of special recipes and techniques for a specific cuisine. The chef keeps all their core skills but learns to adapt them for your specific needs. This approach uses far less memory and trains much faster, making it perfect for fine-tuning on Apple Silicon.

For this tutorial, I’m using llama3.2 3b Instruct as our base model. I chose this model because it’s a good balance of size and capability, and works well with LoRA fine-tuning on Apple Silicon.

You can find it on Hugging Face. I wanted to be able to use Ollama’s models, but I couldn’t get that to work so I have to stick with the original models on HuggingFace. I really like using the command line tool you can find here for doing that. Put the model in the model directory below wherever you are going to run mlx from. I had problems later on with this as is, so I had to delete the directory called ‘original’ too.

Now let’s talk about preparing our training data. MLX requires two JSONL files in your data directory: train.jsonl and valid.jsonl. Each line must contain a JSON object with exactly two fields: ‘prompt’ and ‘completion’ - no more, no less.

I learned this requirement the hard way. When I first tried fine-tuning with MLX, I used a HuggingFace dataset that had additional fields like correctness, helpfulness, coherence, and verbosity. MLX silently failed without any helpful error messages. My first attempt to fix this using VS Code to rename fields failed because the file was too large.

Let me show you exactly what worked instead…

What saved me was using jq, a command-line JSON tool. One quick command renamed all the original fields labelled “response” to “completion” which is what MLX wants. My second issue? All those extra fields (correctness, helpfulness, etc.) that MLX didn’t want. Again, jq came to the rescue, letting me strip out all the extra metadata. The lesson here? Always use the right tool for JSON, and JSONL, manipulation - it’ll save you hours of debugging cryptic errors.

Now that we’ve got our data formatted correctly and MLX installed, we’re ready to start fine-tuning. Remember how we talked about LoRA earlier? This is where that efficient approach really shines. MLX handles all the complexity of LoRA behind the scenes - when things work, they just work, and we don’t have to mindlessly tap through pages of notebook blocks.

Now we’re ready for the actual fine-tuning. You’ll need your formatted train.jsonl and valid.jsonl files - remember those data formatting issues we just fixed? This is where that careful preparation pays off.

What makes MLX particularly powerful is how it’s built specifically for Apple Silicon. The framework uses something called lazy evaluation - this means it only processes what it needs, when it needs it. Think of it as being really efficient with your Mac’s resources.

Here’s what this means in practice: When you’re fine-tuning a model, MLX can handle larger batch sizes and longer sequences without running into memory issues. The real advantage is that MLX handles all this optimization automatically. You don’t need to worry about memory management or moving things between CPU and GPU - it just works.

When you’re ready to start fine-tuning, the command is straightforward. Remember how we installed mlx-lm earlier? Here’s how we use it:

mlx_lm.lora —model path-to-model —train

By default, this will run for 1000 iterations, which can take a while. Here’s a pro tip: when you’re first testing things out, add —iters 100 to your command. This will run a shorter training session, perfect for making sure everything’s set up correctly.

mlx_lm.lora —model path-to-model —train —iters 100

If you run into memory issues, you might need to adjust the batch size. You can find all these parameters by looking at the source code - MLX keeps things transparent and accessible.

When you start training with LoRA fine-tuning, the first thing you’ll see is this line:

Trainable parameters: 0.071% (2.294M/3212.750M)

This is showing us one of LoRA’s key features - instead of training the entire model, we’re only adjusting a small subset of parameters. In this case, just 0.071% of them. This is what makes LoRA fine-tuning so efficient.

During training, you’ll see validation messages that look like this:“Iter 1: Val loss 2.671, Val took 90.039s”

The validation loss is like giving your chef a pop quiz with new ingredients they haven’t practiced with. If they can make great dishes with these new ingredients, you know they’ve learned the principles, not just memorized specific recipes. You want to see this number gradually decrease. If it starts going up while your training loss keeps going down, your model might be memorizing the training data instead of learning.

You’ll also see more detailed training updates:“Iter 10: Train loss 2.713, Learning Rate 1.000e-05, It/sec 0.091, Tokens/sec 207.285, Trained Tokens 22736, Peak mem 34.508 GB”

The train loss tells you how well your model is learning from your training data. The learning rate controls how much the model adjusts its parameters with each iteration. You can see your training speed in both iterations per second and tokens per second. The trained tokens number shows your total progress, and the peak memory usage helps you monitor if you’re getting close to your Mac’s limits. If that memory number gets too high, you might need to adjust your batch size down.

When training finishes, MLX saves your fine-tuned model. You can start testing it right away to see how it performs. If the results aren’t what you expected, you can always go back and adjust your training data or try different parameters like the learning rate or number of iterations.

Now that we understand how the training process works, let’s talk about training data. While building your own high-quality dataset is usually the best approach for real applications, sometimes you just want to see the fine-tuning process in action first.

That’s where Hugging Face comes in. It’s got thousands of datasets we can use to experiment with fine-tuning. Let me show you how to find datasets that will work well with MLX and get them into the right format.

When you open Hugging Face’s dataset page, you’ll see thousands of options. Don’t worry - we can narrow this down quickly. The search bar at the top is your friend. If you’re interested in something specific, like sentiment analysis, just type that in.

On the left, you’ll see filters for language, size, and task type. These help you find datasets that are actually useful for your project. For example, if we’re fine-tuning for English text completion, we can filter for English language datasets with that specific format.

Once you’ve found a dataset that looks promising, click into it and check out the preview. This lets you see exactly what kind of data you’re working with before downloading anything. Pay attention to the format - remember how specific MLX is about its JSONL structure. But also know that tools like JQ let you tweak it.

Don’t just go for the biggest dataset you can find. I’ve often gotten better results from smaller, cleaner datasets than from huge, messy ones. What matters is having good examples that match what you’re trying to teach the model.

Before downloading anything, scroll down to check the dataset’s license. This is crucial - some datasets are open for any use, while others have restrictions, especially for commercial projects. You don’t want to spend time fine-tuning with data you can’t actually use.

Once you’ve found a dataset you can use, download the files to your machine.

Once you’ve downloaded your dataset, you’ll probably need to convert it to the format MLX expects. Remember those JSONL files we talked about earlier? This is where we need to transform our Hugging Face dataset into that format.

The key is making sure each line is a complete JSON object with just the fields MLX needs. No extra metadata, no nested structures - keep it simple.

Before we jump back into training, take a minute to verify your converted data. Load up a few examples and make sure they look right. This quick check can save you from training issues later.

Now let’s look at how to use mlx_lm with our prepared dataset. The command structure is pretty straightforward - we’ve already seen the basic version earlier:

mlx_lm.lora —model path-to-model —train

But now that we have our dataset ready, let’s look at some additional parameters we might want to use. Remember how we talked about setting iterations to 100 for testing? We can also adjust things like batch size and learning rate depending on our needs.

The main parameters you’ll want to know about are:—fine-tune-type lets you choose between different fine-tuning methods. We’re using LoRA in this example, but MLX also supports QLoRA, DoRA, and full model fine-tuning.

—num-layers controls how many layers of the model you’re fine-tuning. It defaults to 16, but you can adjust this depending on your needs.

When choosing your base model, think about what you’re trying to achieve. Different models are trained for different tasks - make sure you’re starting with one that’s relevant to what you want to do.

Let’s talk about memory management. The batch size is like choosing how many dishes your chef tries to cook simultaneously. A more experienced chef with a bigger kitchen (more memory) can handle more dishes at once, but sometimes it’s better to start small and work your way up. It defaults to 4, but if you run into memory issues, try setting it to 1 or 2.

Another useful parameter is —grad-checkpoint. This trades some speed for lower memory usage, which can be really helpful when working with larger models or datasets.

If you’re still running into memory issues even after reducing batch size, you can try decreasing the number of layers you’re fine-tuning using the —num-layers parameter. This will use less memory but might affect the quality of the fine-tuning.

Remember that MLX needs your data files named specifically as train.jsonl and valid.jsonl in your data directory. You can also add a test.jsonl if you want to evaluate your model after training.

What I appreciate about MLX is how it simplifies the fine-tuning process without sacrificing control over the important parameters. You get the essential features without having to write a lot of extra code.

Now that we’ve covered the basics of fine-tuning with MLX, let’s talk about troubleshooting. In my experience building with MLX, I’ve noticed a few common patterns where things tend to go wrong. Understanding these can save you hours of debugging.

The most frequent issue I see is data formatting inconsistencies. Even one malformed example in your dataset can cause silent failures, and MLX isn’t always great at telling you what went wrong. Next up is mismatched training parameters - what works perfectly on an M2 Ultra might crash immediately on an M1. And finally, there’s the subtle but important problem of validation loss that won’t improve, which usually means something’s wrong with either your data or your training setup.

First, verify your training data format is consistent across all examples. Then check if your validation loss was actually decreasing during training. Sometimes you might need to try different combinations of batch size and learning rate before finding what works best for your specific case.

If you’re ready to try MLX fine-tuning yourself, start with a small dataset and the parameters we’ve discussed. The documentation has more details about advanced options when you need them.

Now that we have our fine-tuned model, let’s use it in Ollama. We’ll need to create a Modelfile - this tells Ollama how to use our model and its adapter.

The Modelfile is simple. We just need two key lines:FROM ./model which is the model we downloaded first, and ADAPTER ./adaptersThis points to the adapter we created with MLX.

The rest of the Modelfile comes from the model’s documentation page on Ollama. Then create the model using the command ‘ollama create myorg/mymodel’. As long as it’s a supported model architecture, it should just work. If not, you may have better success converting to gguf with mlx. You can find more information on this in the MLX documentation.

We’ve done a lot in this video like looking at how to prepare your data, set up MLX, and run fine-tuning with different parameters. The process is straightforward: get your data in the right format, choose your parameters, and let MLX handle the heavy lifting. If any parts about finetuning in general are still confusing, take a look at this video next which explains how finetuning works.

If you try this out yourself, let me know in the comments how it goes. I’d love to hear about what you’re building with MLX. And if you want to see more content about AI development on Apple Silicon, hit subscribe and turn on notifications.

Thanks for watching, and I’ll see you in the next video.