What is Model Merging?
Model Merging is a technique for combining multiple fine-tuned models into a single model without additional training. It enables creating specialized models by blending capabilities from different sources.
Why Merge Models?
- Combine Strengths: Merge a code model with a chat model
- No Training Required: Works with just the weights
- Reduce Costs: Skip expensive training runs
- Community Innovation: Build on othersβ work
Merging Methods
1. Linear Interpolation (LERP)
Simple weighted average of weights.
merged = Ξ± * model_a + (1-Ξ±) * model_b
2. SLERP (Spherical Linear Interpolation)
Interpolates along the hypersphere, preserving model geometry.
- Better for models that are far apart in weight space
- Smoother transitions between capabilities
3. TIES (Task Interference-aware Extraction and Specialization)
Intelligent merging that resolves parameter conflicts:
- Trim small magnitude changes
- Resolve sign conflicts
- Merge remaining parameters
4. DARE (Drop And REscale)
Randomly drops delta parameters before merging:
- Reduces interference between models
- Often combined with TIES
5. Passthrough / Frankenmerge
Stack layers from different models:
Layers 0-15: Model A
Layers 16-31: Model B
Popular Merging Tools
| Tool | Features |
|---|---|
| mergekit | Most popular, supports all methods |
| PEFT merge | Merge LoRA adapters |
| LazyMergekit | Web interface for merging |
Merge Recipes
Common successful patterns:
- Base + Instruct: Merge base with instruction-tuned
- Chat + Code: Combine conversational and coding abilities
- DPO + SFT: Layer alignment on top of instruction-tuning
Evaluating Merged Models
Always test merged models on:
- General benchmarks (MMLU, HellaSwag)
- Task-specific tests
- Real-world use cases
Limitations
- β No guarantee of success
- β Can lose capabilities from either model
- β May introduce inconsistencies
- β Hard to predict which method works best
Related Concepts
- LoRA - Efficient fine-tuning for merge sources
- Fine-tuning - Creating models to merge
- Quantization - Compressing merged models