In the previous episode we learned how to give a model information with RAG. But what if we want to change the model itself? Make it speak better in a specific language, adopt a particular style, or perform better in a specialized domain? This is where Fine-tuning comes in.
Three Stages of a Language Model’s Life
Stage 1: Pre-training
The heaviest stage. Companies like Meta train models on trillions of words for months on thousands of GPUs. Cost: ~$30M for a large model. You do not do this — you use ready models.
Stage 2: SFT (Supervised Fine-Tuning)
Teaching the model how to have conversations by showing thousands of good conversation examples.
Stage 3: RLHF
Humans review model responses and provide feedback to improve quality.
When to Fine-tune vs When Not To
Fine-tune when:
- You need a specific style/tone
- You need better performance in a specific language
- You need consistent output format (e.g., always JSON)
- API costs are too high — a small fine-tuned model can replace a large one
Do NOT fine-tune when:
- You just need additional information — use RAG instead
- Good prompting is sufficient
- You lack quality training data
- Information changes frequently — RAG is better
Golden rule: Try Prompt Engineering first. Then RAG. If still insufficient, then fine-tune.
LoRA — The Fine-tuning Revolution
Instead of updating all 7 billion parameters, LoRA adds a small set of extra parameters and trains only those. Like adding a room to a building instead of rebuilding it entirely. Benefits: 16GB instead of 80GB memory, faster training, tiny saved weights.
QLoRA — LoRA Squared
Combines Quantization + LoRA. Fine-tune a 7B model on just 8-10 GB VRAM — even an RTX 3060 works!
Preparing Your Dataset
Quality over quantity. 1,000 high-quality examples beat 100,000 low-quality ones. Use conversation format with system/user/assistant roles.
Common Mistakes
- Overfitting: Model memorizes training data. Solution: diverse data, fewer epochs.
- Catastrophic Forgetting: Model forgets previous knowledge. Solution: use LoRA, low learning rate.
- Bad data quality: Garbage In, Garbage Out.
- No evaluation: Always test after fine-tuning with benchmarks.