Fine-tuning — Customize the Model Your Way

In the previous episode we learned how to give a model information with RAG. But what if we want to change the model itself? Make it speak better in a specific language, adopt a particular style, or perform better in a specialized domain? This is where Fine-tuning comes in.

Three Stages of a Language Model’s Life

Stage 1: Pre-training

The heaviest stage. Companies like Meta train models on trillions of words for months on thousands of GPUs. Cost: ~$30M for a large model. You do not do this — you use ready models.

Stage 2: SFT (Supervised Fine-Tuning)

Teaching the model how to have conversations by showing thousands of good conversation examples.

Stage 3: RLHF

Humans review model responses and provide feedback to improve quality.

When to Fine-tune vs When Not To

Fine-tune when:

You need a specific style/tone
You need better performance in a specific language
You need consistent output format (e.g., always JSON)
API costs are too high — a small fine-tuned model can replace a large one

Do NOT fine-tune when:

You just need additional information — use RAG instead
Good prompting is sufficient
You lack quality training data
Information changes frequently — RAG is better

Golden rule: Try Prompt Engineering first. Then RAG. If still insufficient, then fine-tune.

LoRA — The Fine-tuning Revolution

Instead of updating all 7 billion parameters, LoRA adds a small set of extra parameters and trains only those. Like adding a room to a building instead of rebuilding it entirely. Benefits: 16GB instead of 80GB memory, faster training, tiny saved weights.

QLoRA — LoRA Squared

Combines Quantization + LoRA. Fine-tune a 7B model on just 8-10 GB VRAM — even an RTX 3060 works!

Preparing Your Dataset

Quality over quantity. 1,000 high-quality examples beat 100,000 low-quality ones. Use conversation format with system/user/assistant roles.

Common Mistakes

Overfitting: Model memorizes training data. Solution: diverse data, fewer epochs.
Catastrophic Forgetting: Model forgets previous knowledge. Solution: use LoRA, low learning rate.
Bad data quality: Garbage In, Garbage Out.
No evaluation: Always test after fine-tuning with benchmarks.