FineTuning with MLX
rough notes about mlx in preparation for a video
MLX
Apple Silicon only
I asked in last video which fine tune framework to tackle first. Results: 24 for mlx, 6 for unsloth, 4 for axolotl
you have to use hugging face format models. finetuning starting with a gguf model won’t work. you can use them for inference, but not for fine-tuning.
mlx-lm seems to be a sample app first but it’s the best way to get started with MLX for generation and fine-tuning. Support qlora but not with gguf models.
https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md
command to use is mlx_lm.lora. this is https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/lora.py. This call a different file for each model type supported.
supports
- Mistral
- Llama
- Phi2
- Mixtral
- Qwen2
- Gemma
- OLMo
- MiniCPM
- InternLM2
clean up data. jq
❯ jq -c 'del(.correctness, .helpfulness, .coherence, .complexity, .verbosity)' oldvalid.jsonl > valid.jsonl
and
jq -c 'if has("response") then {prompt: .prompt, completion: .response} else . end' train.jsonl > train_fixed.jsonl
mlx_lm.lora \
--model <path_to_model> \
--train \
--data <path_to_data> \
--iters 600
—fine-tune-type defaults to lora
but can set it to full or dora.
—data is path to trainig data. must be a folder that includes train.jsonl and valid.jsonl. or if you pass a hf repo name and the data is already formatted correctly, then you can use that. defaults to having training data in the data folder.
—model is the path to themodel or the org and model on huggingface.
❯ mlx_lm.lora --help
usage: mlx_lm.lora [-h] [--model MODEL] [--train] [--data DATA]
[--fine-tune-type {lora,dora,full}]
[--num-layers NUM_LAYERS]
[--batch-size BATCH_SIZE] [--iters ITERS]
[--val-batches VAL_BATCHES]
[--learning-rate LEARNING_RATE]
[--steps-per-report STEPS_PER_REPORT]
[--steps-per-eval STEPS_PER_EVAL]
[--resume-adapter-file RESUME_ADAPTER_FILE]
[--adapter-path ADAPTER_PATH]
[--save-every SAVE_EVERY] [--test]
[--test-batches TEST_BATCHES]
[--max-seq-length MAX_SEQ_LENGTH]
[-c CONFIG] [--grad-checkpoint] [--seed SEED]
LoRA or QLoRA finetuning.
options:
-h, --help show this help message and exit
--model MODEL The path to the local model directory or
Hugging Face repo.
--train Do training
--data DATA Directory with {train, valid, test}.jsonl files
or the name of a Hugging Face dataset
(e.g., 'mlx-community/wikisql')
--fine-tune-type {lora,dora,full}
Type of fine-tuning to perform: lora, dora,
or full.
--num-layers NUM_LAYERS
Number of layers to fine-tune.
Default is 16, use -1 for all.
--batch-size BATCH_SIZE
Minibatch size.
--iters ITERS Iterations to train for.
--val-batches VAL_BATCHES
Number of validation batches,
-1 uses the entire validation set.
--learning-rate LEARNING_RATE
Adam learning rate.
--steps-per-report STEPS_PER_REPORT
Number of training steps between loss reporting.
--steps-per-eval STEPS_PER_EVAL
Number of training steps between validations.
--resume-adapter-file RESUME_ADAPTER_FILE
Load path to resume training from the given
fine-tuned weights.
--adapter-path ADAPTER_PATH
Save/load path for the fine-tuned weights.
--save-every SAVE_EVERY
Save the model every N iterations.
--test Evaluate on the test set after training
--test-batches TEST_BATCHES
Number of test set batches,
-1 uses the entire test set.
--max-seq-length MAX_SEQ_LENGTH
Maximum sequence length.
-c CONFIG, --config CONFIG
A YAML configuration file with the training
options
--grad-checkpoint Use gradient checkpointing to reduce memory use.
--seed SEED The PRNG seed
ml-explore org on github Ronan Collobert (andresy) Angelos Katharopoulos (angeloskath) Awni Hannun (awni) Jagrit Digani (jagrit06)
3 repos that are updated often: mlx-swift, mlx, and mlx-examples.
MLX is the main framework for Apple Silicon. MIT license
vectors are just arrays
MLX is called an array framework because it is fundamentally built around array operations and computations, similar to NumPy, with several key characteristics:
Core Array Features Array-Centric Design • Arrays are the primary data structure for performing mathematical and computational operations • Operations follow a NumPy-like syntax for array manipulation • Arrays live in shared memory, allowing seamless operations across CPU and GPU Array Operations • Supports standard array computations and mathematical operations • Provides lazy evaluation of array operations until results are needed • Enables composable transformations on arrays for tasks like automatic differentiation and vectorization Framework Purpose The “array framework” designation emphasizes MLX’s primary focus on: • Efficient array computations optimized for Apple Silicon • Array-based machine learning operations and model building • High-performance numerical computing through array operations This array-centric architecture makes MLX particularly suited for machine learning tasks, which heavily rely on array-based mathematical operations and tensor computations.
Lazy computation in MLX provides several significant benefits for users:
Performance Optimization
Memory Efficiency
- Arrays are only materialized when results are actually needed, reducing memory usage[1]
- Users can instantiate large models without immediately consuming memory until weights are actually used[1]
Computational Efficiency
- Unnecessary calculations are avoided if outputs are never used[1]
- The framework can optimize the computation graph before execution[2]
- Operations can be reordered and reduced when possible through logical plan optimization[4]
Development Benefits
Simplified Programming
- Users don’t need to worry about computing outputs that won’t be used[1]
- Function transformations like automatic differentiation and vectorization become easier to implement[1]
- Computation graphs are built dynamically, avoiding slow recompilations when shapes change[5]
Flexible Device Usage
- Operations can be performed on either CPU or GPU without explicit data transfers[2]
- Users can choose the execution device (CPU/GPU) at runtime rather than during array creation[5]
Code Maintainability
Clean Implementation
- Allows for consistent function signatures while keeping implementations simple[7]
- Supports building complex array transformations through composable functions[5]
- Enables graph optimizations without complicating the user code[1]
Sources [1] Lazy Evaluation — MLX 0.22.0 documentation https://ml-explore.github.io/mlx/build/html/usage/lazy_evaluation.html [2] Apple MLX: Python Framework for Apple Silicon - Dev-kit https://dev-kit.io/blog/machine-learning/apple-mlx-python [3] What are the advantages of Lazy Evaluation? - Stack Overflow https://stackoverflow.com/questions/2151226/what-are-the-advantages-of-lazy-evaluation [4] What are the benefits of lazy evaluation? : r/haskell - Reddit https://www.reddit.com/r/haskell/comments/19b0q7d/what_are_the_benefits_of_lazy_evaluation/ [5] Apple Open-sources Apple Silicon-Optimized Machine Learning … https://www.infoq.com/news/2023/12/apple-silicon-machine-learning/ [6] Lazy Evaluation: Why is it faster, advantages vs disadvantages … https://stackoverflow.com/questions/24704503/lazy-evaluation-why-is-it-faster-advantages-vs-disadvantages-mechanics-why-i [7] Awni Hannun on X: “A nice ergonomic benefit of lazy computation in … https://twitter.com/awnihannun/status/1800526893080531151
The mlx-examples repository (github.com/ml-explore/mlx-examples) is the primary and more comprehensive source for MLX examples, while the examples in the main MLX repo are more basic. Here’s why they serve different purposes:
Main MLX Repository Examples
- Contains basic examples to demonstrate core MLX functionality
- Focuses on fundamental concepts like array operations and basic model implementations
- Serves as a quick start guide for new users learning the framework basics
MLX Examples Repository
Comprehensive Coverage
- Houses a wide variety of standalone, production-ready examples[4]
- Includes complex implementations like:
- Text models (LLaMA, Mistral, Mixtral)
- Image generation models
- Speech recognition with Whisper
- Audio compression with EnCodec
Active Development
- Regularly updated with new examples and improvements[2]
- Contains complete applications and tools like MNISTTrainer and LLMEval
- Provides practical implementations that showcase real-world usage
The separation allows the main MLX repository to remain focused on core framework functionality while the examples repository can grow independently with more complex and diverse implementations[5]. This structure makes it easier for users to find relevant examples without cluttering the main framework repository.
Sources [1] Buffers, Streams and Samples — MLX Data 0.0.2 documentation https://ml-explore.github.io/mlx-data/build/html/buffers_streams_samples.html [2] The Dispatch Demo - ml-explore/mlx-swift-examples https://thedispatch.ai/reports/826/ [3] Part 3: Fine-tuning your LLM using the MLX framework - Andy Peatling https://apeatling.com/articles/part-3-fine-tuning-your-llm-using-the-mlx-framework/ [4] ml-explore/mlx-examples - GitHub https://github.com/ml-explore/mlx-examples [5] ml-explore/mlx: MLX: An array framework for Apple silicon - GitHub https://github.com/ml-explore/mlx?tab=readme-ov-file [6] CLIP with MLX — Adopting the advancement in multimodal learning https://unfoldai.com/clip-with-mlx-multimodal/ [7] X-mas Special: Getting started with MLX for Apple Silicon - LinkedIn https://www.linkedin.com/pulse/x-mas-special-getting-started-mlx-apple-silicon-won-bae-suh-87gnc [8] Getting started with Apple MLX | ML_NEWS3 – Weights & Biases https://wandb.ai/byyoung3/ML_NEWS3/reports/Getting-started-with-Apple-MLX—Vmlldzo5Njk5MTk1 [9] mlx-examples/llms/README.md at main - GitHub https://github.com/ml-explore/mlx-examples/blob/main/llms/README.md [10] Day 3: Running the MLX sample app on macOS, iOS and visionOS https://antran.app/2024/mlx_swift_examples/ [11] mlx-examples/llms/mlx_lm/MANAGE.md at main - GitHub https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/MANAGE.md [12] Fine-tuning LLMs with Apple MLX locally - Niklas Heidloff https://heidloff.net/article/apple-mlx-fine-tuning/ [13] Curtis Poe on LinkedIn: GitHub - ml-explore/mlx-examples https://www.linkedin.com/posts/curtispoe_github-ml-exploremlx-examples-examples-activity-7180520608777003008-OSt4 [14] Troubleshooting | Documentation - Swift Package Index https://swiftpackageindex.com/ml-explore/mlx-swift/0.21.2/documentation/mlx/troubleshooting [15] Part 4: Testing and interacting with your fine-tuned LLM https://apeatling.com/articles/part-4-testing-and-interacting-with-your-fine-tuned-llm/ [16] MLX - Day 4: Evaluate LLM Models from CLI with MLX on macOS https://antran.app/2024/mlx_swift_example_cli/ [17] Quick Start Guide — MLX Data 0.0.2 documentation https://ml-explore.github.io/mlx-data/build/html/quick_start.html [18] X-mas Special: Getting started with MLX for Apple Silicon - LinkedIn https://www.linkedin.com/pulse/x-mas-special-getting-started-mlx-apple-silicon-won-bae-suh-87gnc [19] Compilation — MLX 0.22.0 documentation https://ml-explore.github.io/mlx/build/html/usage/compile.html [20] Getting started with Apple MLX | ML_NEWS3 – Weights & Biases https://wandb.ai/byyoung3/ML_NEWS3/reports/Getting-started-with-Apple-MLX—Vmlldzo5Njk5MTk1 [21] Part 3: Fine-tuning your LLM using the MLX framework - Andy Peatling https://apeatling.com/articles/part-3-fine-tuning-your-llm-using-the-mlx-framework/ [22] Articles – Andy Peatling https://apeatling.com/articles/ [23] CLIP with MLX — Adopting the advancement in multimodal learning https://unfoldai.com/clip-with-mlx-multimodal/ [24] ml-explore/mlx: MLX: An array framework for Apple silicon - GitHub https://github.com/ml-explore/mlx?tab=readme-ov-file [25] MLX 0.22.0 documentation https://ml-explore.github.io/mlx/build/html/index.html [26] Day 2: Getting started with MLX Swift Examples - An Tran https://antran.app/2024/mlx_swift_getting_started/ [27] Apple Releases ‘MLX’ - ML Framework for Apple Silicon - Reddit https://www.reddit.com/r/LocalLLaMA/comments/18bwd1y/apple_releases_mlx_ml_framework_for_apple_silicon/ [28] Patrick Loeber on X: “One of my current favorite repos is mlx … https://twitter.com/patloeber/status/1745378383662342472
MLX provides several fine-tuning examples and capabilities, particularly through the mlx-lm package:
Supported Models
MLX supports fine-tuning for multiple model families:
- Mistral
- LLaMA
- Phi2
- Mixtral
- Qwen2
- Gemma
- OLMo
- MiniCPM
- InternLM2[1]
Fine-tuning Methods
Available Approaches
- LoRA (Low Rank Adaptation) - default method
- QLoRA (Quantized LoRA)
- DoRA
- Full model fine-tuning[1]
Implementation Example
mlx_lm.lora \
--model mistralai/Mistral-7B-v0.1 \
--train \
--batch-size 1 \
--num-layers 4 \
--data wikisql
Performance Considerations
Resource Management
- Gradient checkpointing allows trading memory for computation
- On M1 Max with 32GB RAM, achieves about 250 tokens-per-second using the WikiSQL dataset[1]
Data Requirements
- Requires train.jsonl, valid.jsonl for training
- test.jsonl for evaluation
- Data should be properly formatted in JSONL format[1][2]
The examples demonstrate practical implementations ranging from basic model adaptation to complex fine-tuning scenarios, making it accessible for users to customize models for specific tasks on Apple Silicon hardware.
Sources [1] Fine-Tuning with LoRA or QLoRA - ml-explore/mlx-examples - GitHub https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md [2] A simple guide to local LLM fine-tuning on a Mac with MLX https://apeatling.com/articles/simple-guide-to-local-llm-fine-tuning-on-a-mac-with-mlx/ [3] ml-explore/mlx-examples - GitHub https://github.com/ml-explore/mlx-examples [4] Fine Tune a model with MLX for Ollama - YouTube https://www.youtube.com/watch?v=3UQ7GY9hNwk [5] Using MLX at Hugging Face https://huggingface.co/docs/hub/en/mlx [6] Part 3: Fine-tuning your LLM using the MLX framework - Andy Peatling https://apeatling.com/articles/part-3-fine-tuning-your-llm-using-the-mlx-framework/ [7] Patrick Loeber on X: “One of my current favorite repos is mlx … https://twitter.com/patloeber/status/1745378383662342472 [8] Apple MLX Fine Tuning: Complete Crash Course for Beginners https://www.youtube.com/watch?v=sI1uKhagm7c [9] Doh! Let’s clear up fine tuning - YouTube https://www.youtube.com/watch?v=qfaK5mYzc4E [10] apeatling/simple-guide-to-mlx-finetuning - GitHub https://github.com/apeatling/simple-guide-to-mlx-finetuning [11] A simple guide to local LLM fine-tuning on a Mac with MLX - Reddit https://www.reddit.com/r/LocalLLaMA/comments/191s7x3/a_simple_guide_to_local_llm_finetuning_on_a_mac/ [12] Fine-tuning LLMs with Apple MLX locally - Niklas Heidloff https://heidloff.net/article/apple-mlx-fine-tuning/
Basic Setup
First install the required package:
pip install mlx-lm
Fine-tuning Process
Data Preparation
- Create three JSONL files in your data directory:
- train.jsonl
- valid.jsonl
- test.jsonl (for evaluation)
Basic Fine-tuning Command
mlx_lm.lora \
--model mistralai/Mistral-7B-v0.1 \
--train \
--data <path_to_data> \
--iters 600
Configuration Options
Memory Optimization
- Reduce batch size:
--batch-size 1
(default is 4) - Decrease number of layers:
--num-layers 4
(default is 16) - Enable gradient checkpointing:
--grad-checkpoint
Training Parameters
- Choose fine-tuning type:
--fine-tune-type [lora|dora|full]
- Specify adapter save location:
--adapter-path <path>
- Resume training:
--resume-adapter-file <path_to_adapters.safetensors>
Evaluation and Generation
Test Model
mlx_lm.lora \
--model <path_to_model> \
--adapter-path <path_to_adapters> \
--data <path_to_data> \
--test
Generate Text
mlx_lm.generate \
--model <path_to_model> \
--adapter-path <path_to_adapters> \
--prompt "<your_prompt>"
Supported Models
LoRA fine-tuning works with:
- Mistral
- LLaMA
- Phi2
- Mixtral
- Qwen2
- Gemma
- OLMo
- MiniCPM
- InternLM2
Sources [1] Fine-Tuning with LoRA or QLoRA - ml-explore/mlx-examples - GitHub https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md [2] Apple MLX Fine Tuning: Complete Crash Course for Beginners https://www.youtube.com/watch?v=sI1uKhagm7c [3] How to correctly save a fine tuned model using apple MLX framework https://stackoverflow.com/questions/78815544/how-to-correctly-save-a-fine-tuned-model-using-apple-mlx-framework [4] Part 3: Fine-tuning your LLM using the MLX framework - Andy Peatling https://apeatling.com/articles/part-3-fine-tuning-your-llm-using-the-mlx-framework/ [5] Lessons learned so far: LoRA Fine tuning on Macbook Air with MLX https://www.reddit.com/r/LocalLLaMA/comments/18wabkc/lessons_learned_so_far_lora_fine_tuning_on/ [6] Using MLX at Hugging Face https://huggingface.co/docs/hub/en/mlx [7] Fine-tuning LLMs with Apple MLX locally - Niklas Heidloff https://heidloff.net/article/apple-mlx-fine-tuning/ [8] Great Python notebook guide to LoRA fine-tuning Gemma with MLX … https://x.com/awnihannun/status/1762873415176798684
Here are the key issues faced during fine-tuning with MLX and their solutions:
Memory Issues
Out of Memory Errors
- Reduce batch size to 1 or 2 to minimize memory usage[1]
- Keep training data lines under 3500 tokens[1]
- Close memory-intensive applications like browsers during training[1]
- Use gradient checkpointing with
--grad-checkpoint
flag to trade memory for computation[4]
Model Size Constraints
- Reduce number of LoRA layers using
--num-layers 4
(default is 16)[1] - Full model fine-tuning in fp32 requires significant memory, even 64GB may not be enough[5]
- Using fp16 requires less memory but can cause stability issues[5]
Numerical Stability
NaN Values in Training
- Fine-tuning in fp16 can lead to numerical overflow issues[5]
- Consider using fp32 for more stable training, though it’s slower and more memory intensive[5]
- Adjust learning rate to a smaller value (e.g., 1e-6) to improve stability[5]
Model Compatibility
Adapter Issues
- Quantized models need to be in MLX’s specific format[7]
- Standard GGUF files from Ollama aren’t directly compatible[7]
- Some models require Hugging Face login to access[7]
Performance Optimization
Training Speed
- Larger batch sizes improve speed but require more memory[4]
- fp32 training is approximately 10x slower than fp16[5]
- On M1 Max with 32GB RAM, expect around 250 tokens-per-second with recommended settings[4]
Data Preparation
Format Requirements
- Data must be split into train.jsonl, valid.jsonl, and test.jsonl[8]
- Proper formatting of instruction-response pairs is crucial[1]
- Maintain consistent formatting across all dataset files[1]
Sources [1] Lessons learned so far: LoRA Fine tuning on Macbook Air with MLX https://www.reddit.com/r/LocalLLaMA/comments/18wabkc/lessons_learned_so_far_lora_fine_tuning_on/ [2] How to correctly save a fine tuned model using apple MLX framework https://stackoverflow.com/questions/78815544/how-to-correctly-save-a-fine-tuned-model-using-apple-mlx-framework [3] A simple guide to local LLM fine-tuning on a Mac with MLX https://apeatling.com/articles/simple-guide-to-local-llm-fine-tuning-on-a-mac-with-mlx/ [4] Fine-Tuning with LoRA or QLoRA - ml-explore/mlx-examples - GitHub https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md [5] [Feature Request] Full-Tuning Example · Issue · ml-explore/mlx … https://github.com/ml-explore/mlx-examples/issues/297 [6] Part 3: Fine-tuning your LLM using the MLX framework - Andy Peatling https://apeatling.com/articles/part-3-fine-tuning-your-llm-using-the-mlx-framework/ [7] Fine Tune a model with MLX for Ollama - YouTube https://www.youtube.com/watch?v=3UQ7GY9hNwk [8] Doh! Let’s clear up fine tuning - YouTube https://www.youtube.com/watch?v=qfaK5mYzc4E [9] Dawei Wang - MLX Framework FAQ Explained - LinkedIn https://www.linkedin.com/posts/win-wang_mlx-framework-faq-explained-model-support-activity-7244708476793323520-3U56 [10] Fine-tuning LLMs with Apple MLX locally - Niklas Heidloff https://heidloff.net/article/apple-mlx-fine-tuning/ [11] A simple guide to local LLM fine-tuning on a Mac with MLX - Reddit https://www.reddit.com/r/LocalLLaMA/comments/191s7x3/a_simple_guide_to_local_llm_finetuning_on_a_mac/ [12] Fine Tune a model with MLX for Ollama - YouTube https://www.youtube.com/watch?v=3UQ7GY9hNwk [13] Apple MLX Fine Tuning: Complete Crash Course for Beginners https://www.youtube.com/watch?v=sI1uKhagm7c [14] apeatling/simple-guide-to-mlx-finetuning - GitHub https://github.com/apeatling/simple-guide-to-mlx-finetuning [15] Issues · apeatling/simple-guide-to-mlx-finetuning - GitHub https://github.com/apeatling/simple-guide-to-mlx-finetuning/issues [16] Beginner’s guide to fine-tuning models using MLX on Apple Silicon. https://www.linkedin.com/posts/win-wang_beginners-guide-to-fine-tuning-models-using-activity-7247245192183853056-BhgX