FineTuning with MLX

January 13, 2025

297

rough notes about mlx in preparation for a video

MLX

Apple Silicon only

I asked in last video which fine tune framework to tackle first. Results: 24 for mlx, 6 for unsloth, 4 for axolotl

you have to use hugging face format models. finetuning starting with a gguf model won’t work. you can use them for inference, but not for fine-tuning.

mlx-lm seems to be a sample app first but it’s the best way to get started with MLX for generation and fine-tuning. Support qlora but not with gguf models.

https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md

command to use is mlx_lm.lora. this is https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/lora.py. This call a different file for each model type supported.

supports

Mistral
Llama
Phi2
Mixtral
Qwen2
Gemma
OLMo
MiniCPM
InternLM2

clean up data. jq

❯ jq -c 'del(.correctness, .helpfulness, .coherence, .complexity, .verbosity)' oldvalid.jsonl > valid.jsonl

and

jq -c 'if has("response") then {prompt: .prompt, completion: .response} else . end' train.jsonl > train_fixed.jsonl

mlx_lm.lora \
    --model <path_to_model> \
    --train \
    --data <path_to_data> \
    --iters 600

—fine-tune-type defaults to lora but can set it to full or dora.

—data is path to trainig data. must be a folder that includes train.jsonl and valid.jsonl. or if you pass a hf repo name and the data is already formatted correctly, then you can use that. defaults to having training data in the data folder.

—model is the path to themodel or the org and model on huggingface.

❯ mlx_lm.lora --help
usage: mlx_lm.lora [-h] [--model MODEL] [--train] [--data DATA] 
                   [--fine-tune-type {lora,dora,full}] 
                   [--num-layers NUM_LAYERS]
                   [--batch-size BATCH_SIZE] [--iters ITERS] 
                   [--val-batches VAL_BATCHES] 
                   [--learning-rate LEARNING_RATE]
                   [--steps-per-report STEPS_PER_REPORT] 
                   [--steps-per-eval STEPS_PER_EVAL] 
                   [--resume-adapter-file RESUME_ADAPTER_FILE]
                   [--adapter-path ADAPTER_PATH] 
                   [--save-every SAVE_EVERY] [--test] 
                   [--test-batches TEST_BATCHES] 
                   [--max-seq-length MAX_SEQ_LENGTH]
                   [-c CONFIG] [--grad-checkpoint] [--seed SEED]

LoRA or QLoRA finetuning.

options:
  -h, --help            show this help message and exit
  --model MODEL         The path to the local model directory or     
                        Hugging Face repo.
  --train               Do training
  --data DATA           Directory with {train, valid, test}.jsonl files 
                        or the name of a Hugging Face dataset 
                        (e.g., 'mlx-community/wikisql')
  --fine-tune-type {lora,dora,full}
                        Type of fine-tuning to perform: lora, dora, 
                        or full.
  --num-layers NUM_LAYERS
                        Number of layers to fine-tune. 
                        Default is 16, use -1 for all.
  --batch-size BATCH_SIZE
                        Minibatch size.
  --iters ITERS         Iterations to train for.
  --val-batches VAL_BATCHES
                        Number of validation batches, 
                        -1 uses the entire validation set.
  --learning-rate LEARNING_RATE
                        Adam learning rate.
  --steps-per-report STEPS_PER_REPORT
                        Number of training steps between loss reporting.
  --steps-per-eval STEPS_PER_EVAL
                        Number of training steps between validations.
  --resume-adapter-file RESUME_ADAPTER_FILE
                        Load path to resume training from the given 
                        fine-tuned weights.
  --adapter-path ADAPTER_PATH
                        Save/load path for the fine-tuned weights.
  --save-every SAVE_EVERY
                        Save the model every N iterations.
  --test                Evaluate on the test set after training
  --test-batches TEST_BATCHES
                        Number of test set batches, 
                        -1 uses the entire test set.
  --max-seq-length MAX_SEQ_LENGTH
                        Maximum sequence length.
  -c CONFIG, --config CONFIG
                        A YAML configuration file with the training 
                        options
  --grad-checkpoint     Use gradient checkpointing to reduce memory use.
  --seed SEED           The PRNG seed

ml-explore org on github Ronan Collobert (andresy) Angelos Katharopoulos (angeloskath) Awni Hannun (awni) Jagrit Digani (jagrit06)

3 repos that are updated often: mlx-swift, mlx, and mlx-examples.

MLX is the main framework for Apple Silicon. MIT license

vectors are just arrays

MLX is called an array framework because it is fundamentally built around array operations and computations, similar to NumPy, with several key characteristics:

Core Array Features Array-Centric Design • Arrays are the primary data structure for performing mathematical and computational operations • Operations follow a NumPy-like syntax for array manipulation • Arrays live in shared memory, allowing seamless operations across CPU and GPU Array Operations • Supports standard array computations and mathematical operations • Provides lazy evaluation of array operations until results are needed • Enables composable transformations on arrays for tasks like automatic differentiation and vectorization Framework Purpose The “array framework” designation emphasizes MLX’s primary focus on: • Efficient array computations optimized for Apple Silicon • Array-based machine learning operations and model building • High-performance numerical computing through array operations This array-centric architecture makes MLX particularly suited for machine learning tasks, which heavily rely on array-based mathematical operations and tensor computations.

Lazy computation in MLX provides several significant benefits for users:

Performance Optimization

Memory Efficiency

Arrays are only materialized when results are actually needed, reducing memory usage[1]
Users can instantiate large models without immediately consuming memory until weights are actually used[1]

Computational Efficiency

Unnecessary calculations are avoided if outputs are never used[1]
The framework can optimize the computation graph before execution[2]
Operations can be reordered and reduced when possible through logical plan optimization[4]

Development Benefits

Simplified Programming

Users don’t need to worry about computing outputs that won’t be used[1]
Function transformations like automatic differentiation and vectorization become easier to implement[1]
Computation graphs are built dynamically, avoiding slow recompilations when shapes change[5]

Flexible Device Usage

Operations can be performed on either CPU or GPU without explicit data transfers[2]
Users can choose the execution device (CPU/GPU) at runtime rather than during array creation[5]

Code Maintainability

Clean Implementation

Allows for consistent function signatures while keeping implementations simple[7]
Supports building complex array transformations through composable functions[5]
Enables graph optimizations without complicating the user code[1]

Sources [1] Lazy Evaluation — MLX 0.22.0 documentation https://ml-explore.github.io/mlx/build/html/usage/lazy_evaluation.html [2] Apple MLX: Python Framework for Apple Silicon - Dev-kit https://dev-kit.io/blog/machine-learning/apple-mlx-python [3] What are the advantages of Lazy Evaluation? - Stack Overflow https://stackoverflow.com/questions/2151226/what-are-the-advantages-of-lazy-evaluation [4] What are the benefits of lazy evaluation? : r/haskell - Reddit https://www.reddit.com/r/haskell/comments/19b0q7d/what_are_the_benefits_of_lazy_evaluation/ [5] Apple Open-sources Apple Silicon-Optimized Machine Learning … https://www.infoq.com/news/2023/12/apple-silicon-machine-learning/ [6] Lazy Evaluation: Why is it faster, advantages vs disadvantages … https://stackoverflow.com/questions/24704503/lazy-evaluation-why-is-it-faster-advantages-vs-disadvantages-mechanics-why-i [7] Awni Hannun on X: “A nice ergonomic benefit of lazy computation in … https://twitter.com/awnihannun/status/1800526893080531151

The mlx-examples repository (github.com/ml-explore/mlx-examples) is the primary and more comprehensive source for MLX examples, while the examples in the main MLX repo are more basic. Here’s why they serve different purposes:

Main MLX Repository Examples

Contains basic examples to demonstrate core MLX functionality
Focuses on fundamental concepts like array operations and basic model implementations
Serves as a quick start guide for new users learning the framework basics

MLX Examples Repository

Comprehensive Coverage

Houses a wide variety of standalone, production-ready examples[4]
Includes complex implementations like:
- Text models (LLaMA, Mistral, Mixtral)
- Image generation models
- Speech recognition with Whisper
- Audio compression with EnCodec

Active Development

Regularly updated with new examples and improvements[2]
Contains complete applications and tools like MNISTTrainer and LLMEval
Provides practical implementations that showcase real-world usage

The separation allows the main MLX repository to remain focused on core framework functionality while the examples repository can grow independently with more complex and diverse implementations[5]. This structure makes it easier for users to find relevant examples without cluttering the main framework repository.

Sources [1] Buffers, Streams and Samples — MLX Data 0.0.2 documentation https://ml-explore.github.io/mlx-data/build/html/buffers_streams_samples.html [2] The Dispatch Demo - ml-explore/mlx-swift-examples https://thedispatch.ai/reports/826/ [3] Part 3: Fine-tuning your LLM using the MLX framework - Andy Peatling https://apeatling.com/articles/part-3-fine-tuning-your-llm-using-the-mlx-framework/ [4] ml-explore/mlx-examples - GitHub https://github.com/ml-explore/mlx-examples [5] ml-explore/mlx: MLX: An array framework for Apple silicon - GitHub https://github.com/ml-explore/mlx?tab=readme-ov-file [6] CLIP with MLX — Adopting the advancement in multimodal learning https://unfoldai.com/clip-with-mlx-multimodal/ [7] X-mas Special: Getting started with MLX for Apple Silicon - LinkedIn https://www.linkedin.com/pulse/x-mas-special-getting-started-mlx-apple-silicon-won-bae-suh-87gnc [8] Getting started with Apple MLX | ML_NEWS3 – Weights & Biases https://wandb.ai/byyoung3/ML_NEWS3/reports/Getting-started-with-Apple-MLX—Vmlldzo5Njk5MTk1 [9] mlx-examples/llms/README.md at main - GitHub https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md [10] Day 3: Running the MLX sample app on macOS, iOS and visionOS https://antran.app/2024/mlx_swift_examples/ [11] mlx-examples/llms/mlx_lm/MANAGE.md at main - GitHub https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/MANAGE.md [12] Fine-tuning LLMs with Apple MLX locally - Niklas Heidloff https://heidloff.net/article/apple-mlx-fine-tuning/ [13] Curtis Poe on LinkedIn: GitHub - ml-explore/mlx-examples https://www.linkedin.com/posts/curtispoe_github-ml-exploremlx-examples-examples-activity-7180520608777003008-OSt4 [14] Troubleshooting | Documentation - Swift Package Index https://swiftpackageindex.com/ml-explore/mlx-swift/0.21.2/documentation/mlx/troubleshooting [15] Part 4: Testing and interacting with your fine-tuned LLM https://apeatling.com/articles/part-4-testing-and-interacting-with-your-fine-tuned-llm/ [16] MLX - Day 4: Evaluate LLM Models from CLI with MLX on macOS https://antran.app/2024/mlx_swift_example_cli/ [17] Quick Start Guide — MLX Data 0.0.2 documentation https://ml-explore.github.io/mlx-data/build/html/quick_start.html [18] X-mas Special: Getting started with MLX for Apple Silicon - LinkedIn https://www.linkedin.com/pulse/x-mas-special-getting-started-mlx-apple-silicon-won-bae-suh-87gnc [19] Compilation — MLX 0.22.0 documentation https://ml-explore.github.io/mlx/build/html/usage/compile.html [20] Getting started with Apple MLX | ML_NEWS3 – Weights & Biases https://wandb.ai/byyoung3/ML_NEWS3/reports/Getting-started-with-Apple-MLX—Vmlldzo5Njk5MTk1 [21] Part 3: Fine-tuning your LLM using the MLX framework - Andy Peatling https://apeatling.com/articles/part-3-fine-tuning-your-llm-using-the-mlx-framework/ [22] Articles – Andy Peatling https://apeatling.com/articles/ [23] CLIP with MLX — Adopting the advancement in multimodal learning https://unfoldai.com/clip-with-mlx-multimodal/ [24] ml-explore/mlx: MLX: An array framework for Apple silicon - GitHub https://github.com/ml-explore/mlx?tab=readme-ov-file [25] MLX 0.22.0 documentation https://ml-explore.github.io/mlx/build/html/index.html [26] Day 2: Getting started with MLX Swift Examples - An Tran https://antran.app/2024/mlx_swift_getting_started/ [27] Apple Releases ‘MLX’ - ML Framework for Apple Silicon - Reddit https://www.reddit.com/r/LocalLLaMA/comments/18bwd1y/apple_releases_mlx_ml_framework_for_apple_silicon/ [28] Patrick Loeber on X: “One of my current favorite repos is mlx … https://twitter.com/patloeber/status/1745378383662342472

MLX provides several fine-tuning examples and capabilities, particularly through the mlx-lm package:

Supported Models

MLX supports fine-tuning for multiple model families:

Mistral
LLaMA
Phi2
Mixtral
Qwen2
Gemma
OLMo
MiniCPM
InternLM2[1]

Fine-tuning Methods

Available Approaches

LoRA (Low Rank Adaptation) - default method
QLoRA (Quantized LoRA)
DoRA
Full model fine-tuning[1]

Implementation Example

mlx_lm.lora \
    --model mistralai/Mistral-7B-v0.1 \
    --train \
    --batch-size 1 \
    --num-layers 4 \
    --data wikisql

Performance Considerations

Resource Management

Gradient checkpointing allows trading memory for computation
On M1 Max with 32GB RAM, achieves about 250 tokens-per-second using the WikiSQL dataset[1]

Data Requirements

Requires train.jsonl, valid.jsonl for training
test.jsonl for evaluation
Data should be properly formatted in JSONL format[1][2]

The examples demonstrate practical implementations ranging from basic model adaptation to complex fine-tuning scenarios, making it accessible for users to customize models for specific tasks on Apple Silicon hardware.

Sources [1] Fine-Tuning with LoRA or QLoRA - ml-explore/mlx-examples - GitHub https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md [2] A simple guide to local LLM fine-tuning on a Mac with MLX https://apeatling.com/articles/simple-guide-to-local-llm-fine-tuning-on-a-mac-with-mlx/ [3] ml-explore/mlx-examples - GitHub https://github.com/ml-explore/mlx-examples [4] Fine Tune a model with MLX for Ollama - YouTube https://www.youtube.com/watch?v=3UQ7GY9hNwk [5] Using MLX at Hugging Face https://huggingface.co/docs/hub/en/mlx [6] Part 3: Fine-tuning your LLM using the MLX framework - Andy Peatling https://apeatling.com/articles/part-3-fine-tuning-your-llm-using-the-mlx-framework/ [7] Patrick Loeber on X: “One of my current favorite repos is mlx … https://twitter.com/patloeber/status/1745378383662342472 [8] Apple MLX Fine Tuning: Complete Crash Course for Beginners https://www.youtube.com/watch?v=sI1uKhagm7c [9] Doh! Let’s clear up fine tuning - YouTube https://www.youtube.com/watch?v=qfaK5mYzc4E [10] apeatling/simple-guide-to-mlx-finetuning - GitHub https://github.com/apeatling/simple-guide-to-mlx-finetuning [11] A simple guide to local LLM fine-tuning on a Mac with MLX - Reddit https://www.reddit.com/r/LocalLLaMA/comments/191s7x3/a_simple_guide_to_local_llm_finetuning_on_a_mac/ [12] Fine-tuning LLMs with Apple MLX locally - Niklas Heidloff https://heidloff.net/article/apple-mlx-fine-tuning/

Basic Setup

First install the required package:

pip install mlx-lm

Fine-tuning Process

Data Preparation

Create three JSONL files in your data directory:
- train.jsonl
- valid.jsonl
- test.jsonl (for evaluation)

Basic Fine-tuning Command

mlx_lm.lora \
    --model mistralai/Mistral-7B-v0.1 \
    --train \
    --data <path_to_data> \
    --iters 600

Configuration Options

Memory Optimization

Reduce batch size: --batch-size 1 (default is 4)
Decrease number of layers: --num-layers 4 (default is 16)
Enable gradient checkpointing: --grad-checkpoint

Training Parameters

Choose fine-tuning type: --fine-tune-type [lora|dora|full]
Specify adapter save location: --adapter-path <path>
Resume training: --resume-adapter-file <path_to_adapters.safetensors>

Evaluation and Generation

Test Model

mlx_lm.lora \
    --model <path_to_model> \
    --adapter-path <path_to_adapters> \
    --data <path_to_data> \
    --test

Generate Text

mlx_lm.generate \
    --model <path_to_model> \
    --adapter-path <path_to_adapters> \
    --prompt "<your_prompt>"

Supported Models

LoRA fine-tuning works with:

Mistral
LLaMA
Phi2
Mixtral
Qwen2
Gemma
OLMo
MiniCPM
InternLM2

Sources [1] Fine-Tuning with LoRA or QLoRA - ml-explore/mlx-examples - GitHub https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md [2] Apple MLX Fine Tuning: Complete Crash Course for Beginners https://www.youtube.com/watch?v=sI1uKhagm7c [3] How to correctly save a fine tuned model using apple MLX framework https://stackoverflow.com/questions/78815544/how-to-correctly-save-a-fine-tuned-model-using-apple-mlx-framework [4] Part 3: Fine-tuning your LLM using the MLX framework - Andy Peatling https://apeatling.com/articles/part-3-fine-tuning-your-llm-using-the-mlx-framework/ [5] Lessons learned so far: LoRA Fine tuning on Macbook Air with MLX https://www.reddit.com/r/LocalLLaMA/comments/18wabkc/lessons_learned_so_far_lora_fine_tuning_on/ [6] Using MLX at Hugging Face https://huggingface.co/docs/hub/en/mlx [7] Fine-tuning LLMs with Apple MLX locally - Niklas Heidloff https://heidloff.net/article/apple-mlx-fine-tuning/ [8] Great Python notebook guide to LoRA fine-tuning Gemma with MLX … https://x.com/awnihannun/status/1762873415176798684

Here are the key issues faced during fine-tuning with MLX and their solutions:

Memory Issues

Out of Memory Errors

Reduce batch size to 1 or 2 to minimize memory usage[1]
Keep training data lines under 3500 tokens[1]
Close memory-intensive applications like browsers during training[1]
Use gradient checkpointing with --grad-checkpoint flag to trade memory for computation[4]

Model Size Constraints

Reduce number of LoRA layers using --num-layers 4 (default is 16)[1]
Full model fine-tuning in fp32 requires significant memory, even 64GB may not be enough[5]
Using fp16 requires less memory but can cause stability issues[5]

Numerical Stability

NaN Values in Training

Fine-tuning in fp16 can lead to numerical overflow issues[5]
Consider using fp32 for more stable training, though it’s slower and more memory intensive[5]
Adjust learning rate to a smaller value (e.g., 1e-6) to improve stability[5]

Model Compatibility

Adapter Issues

Quantized models need to be in MLX’s specific format[7]
Standard GGUF files from Ollama aren’t directly compatible[7]
Some models require Hugging Face login to access[7]

Performance Optimization

Training Speed

Larger batch sizes improve speed but require more memory[4]
fp32 training is approximately 10x slower than fp16[5]
On M1 Max with 32GB RAM, expect around 250 tokens-per-second with recommended settings[4]

Data Preparation

Format Requirements

Data must be split into train.jsonl, valid.jsonl, and test.jsonl[8]
Proper formatting of instruction-response pairs is crucial[1]
Maintain consistent formatting across all dataset files[1]

Sources [1] Lessons learned so far: LoRA Fine tuning on Macbook Air with MLX https://www.reddit.com/r/LocalLLaMA/comments/18wabkc/lessons_learned_so_far_lora_fine_tuning_on/ [2] How to correctly save a fine tuned model using apple MLX framework https://stackoverflow.com/questions/78815544/how-to-correctly-save-a-fine-tuned-model-using-apple-mlx-framework [3] A simple guide to local LLM fine-tuning on a Mac with MLX https://apeatling.com/articles/simple-guide-to-local-llm-fine-tuning-on-a-mac-with-mlx/ [4] Fine-Tuning with LoRA or QLoRA - ml-explore/mlx-examples - GitHub https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md [5] [Feature Request] Full-Tuning Example · Issue · ml-explore/mlx … https://github.com/ml-explore/mlx-examples/issues/297 [6] Part 3: Fine-tuning your LLM using the MLX framework - Andy Peatling https://apeatling.com/articles/part-3-fine-tuning-your-llm-using-the-mlx-framework/ [7] Fine Tune a model with MLX for Ollama - YouTube https://www.youtube.com/watch?v=3UQ7GY9hNwk [8] Doh! Let’s clear up fine tuning - YouTube https://www.youtube.com/watch?v=qfaK5mYzc4E [9] Dawei Wang - MLX Framework FAQ Explained - LinkedIn https://www.linkedin.com/posts/win-wang_mlx-framework-faq-explained-model-support-activity-7244708476793323520-3U56 [10] Fine-tuning LLMs with Apple MLX locally - Niklas Heidloff https://heidloff.net/article/apple-mlx-fine-tuning/ [11] A simple guide to local LLM fine-tuning on a Mac with MLX - Reddit https://www.reddit.com/r/LocalLLaMA/comments/191s7x3/a_simple_guide_to_local_llm_finetuning_on_a_mac/ [12] Fine Tune a model with MLX for Ollama - YouTube https://www.youtube.com/watch?v=3UQ7GY9hNwk [13] Apple MLX Fine Tuning: Complete Crash Course for Beginners https://www.youtube.com/watch?v=sI1uKhagm7c [14] apeatling/simple-guide-to-mlx-finetuning - GitHub https://github.com/apeatling/simple-guide-to-mlx-finetuning [15] Issues · apeatling/simple-guide-to-mlx-finetuning - GitHub https://github.com/apeatling/simple-guide-to-mlx-finetuning/issues [16] Beginner’s guide to fine-tuning models using MLX on Apple Silicon. https://www.linkedin.com/posts/win-wang_beginners-guide-to-fine-tuning-models-using-activity-7247245192183853056-BhgX