Daftar Isi
- Kapan Fine-tune? — Decision framework: prompt vs RAG vs fine-tune
- LoRA — Low-Rank Adaptation — train 0.05% params
- QLoRA — LoRA + 4-bit = run di 24GB GPU
- Data Preparation — Format, quality, quantity
- Kode: LoRA Fine-tuning — Hugging Face + PEFT
- Training Tips — Hyperparameters, overfitting, evaluation
- Tools — Axolotl, Unsloth, LLaMA-Factory
- Ringkasan —
🎯
1. Kapan Fine-tune vs Prompt vs RAG?
Decision framework berdasarkan kebutuhan Anda| Kebutuhan | Solusi Terbaik | Effort | Cost | Contoh |
|---|---|---|---|---|
| LLM perlu data terbaru | RAG | Low | $ | Chatbot dengan docs internal |
| LLM perlu format tertentu | Prompt Engineering | Low | Free | JSON extraction, specific tone |
| LLM perlu tone/style khusus | Fine-tuning | Medium | $$ | Brand voice, domain jargon |
| LLM perlu deep domain expertise | Fine-tune + RAG | High | $$$ | Medical AI, legal assistant |
| Custom model dari nol | Pre-training | Very high | $$$$ | Bahasa langka, niche domain |
🔧
2. LoRA — Low-Rank Adaptation
Hanya train 0.05-1% parameters. Efficient, murah, effective.Full fine-tuning model 8B parameter butuh 8x 80GB GPU (ratusan ribu dollar). LoRA (Hu et al. 2021) menambahkan adapter layers kecil (low-rank matrices) ke attention layers yang sudah ada. Hanya adapter ini yang di-train — original weights tetap frozen. Hasilnya: train hanya 4 juta dari 8 miliar parameters (0.05%), butuh 1 GPU consumer, hasil hampir setara full fine-tune.
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-8B")
# LoRA: tambah adapter kecil ke attention layers
lora_config = LoraConfig(
r=16, # Rank (dimensi adapter)
lora_alpha=32, # Scaling factor
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
lora_dropout=0.05,
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable: 4,194,304 / 8,030,261,248 (0.05%)
# Hanya 4M dari 8B! Bisa di 1 GPU consumer (24GB)
# QLoRA: + 4-bit quantization = lebih hemat lagi!
from transformers import BitsAndBytesConfig
bnb = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype="bfloat16"
)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-8B", quantization_config=bnb
)
# RAM usage: 16GB full → ~6GB with QLoRA!
📊
3. Fine-tuning Tools
Framework yang mempermudah fine-tuning| Tool | Focus | Ease | Features |
|---|---|---|---|
| Hugging Face + PEFT | General purpose LoRA | Medium | Most flexible, full control |
| Unsloth | Speed-optimized fine-tuning | Easy | 2x faster, auto-optimization |
| Axolotl | Config-driven fine-tuning | Easy | YAML config, many formats |
| LLaMA-Factory | Chinese/Asian models | Easy | GUI, 100+ models supported |
| OpenAI Fine-tuning | GPT models via API | Very easy | No GPU needed, API-based |
Next: Part 8 — Evaluasi & Benchmark
MMLU, HumanEval, LMSys Arena. Cara mengukur LLM secara objektif.
LLM
Tech Review Desk — Seri Belajar LLM
Sumber: Sebastian Raschka, Anthropic, OpenAI, Hugging Face, LLMOrbit, DeepSeek technical reports.