📝 Artikel ini ditulis dalam Bahasa Indonesia
Seri Belajar LLM Part 7

Fine-tuning Praktis — LoRA, QLoRA, PEFT

Adaptasi LLM ke domain Anda dengan GPU consumer. LoRA hanya train 0.05% parameters — dari medical assistant ke code reviewer. Part 7 mengajarkan kapan fine-tune vs prompt vs RAG, implementasi LoRA/QLoRA, data preparation, dan evaluasi hasil fine-tuning.

Maret 202630 menit bacaLoRA • QLoRA • PEFT • Fine-tuning • Hugging Face
📚 Seri Belajar LLM:
1 2 3 4 5 6 7 8 9 10

Daftar Isi

  1. Kapan Fine-tune? — Decision framework: prompt vs RAG vs fine-tune
  2. LoRA — Low-Rank Adaptation — train 0.05% params
  3. QLoRA — LoRA + 4-bit = run di 24GB GPU
  4. Data Preparation — Format, quality, quantity
  5. Kode: LoRA Fine-tuning — Hugging Face + PEFT
  6. Training Tips — Hyperparameters, overfitting, evaluation
  7. Tools — Axolotl, Unsloth, LLaMA-Factory
  8. Ringkasan
🎯

1. Kapan Fine-tune vs Prompt vs RAG?

Decision framework berdasarkan kebutuhan Anda
KebutuhanSolusi TerbaikEffortCostContoh
LLM perlu data terbaruRAGLow$Chatbot dengan docs internal
LLM perlu format tertentuPrompt EngineeringLowFreeJSON extraction, specific tone
LLM perlu tone/style khususFine-tuningMedium$$Brand voice, domain jargon
LLM perlu deep domain expertiseFine-tune + RAGHigh$$$Medical AI, legal assistant
Custom model dari nolPre-trainingVery high$$$$Bahasa langka, niche domain
🔧

2. LoRA — Low-Rank Adaptation

Hanya train 0.05-1% parameters. Efficient, murah, effective.

Full fine-tuning model 8B parameter butuh 8x 80GB GPU (ratusan ribu dollar). LoRA (Hu et al. 2021) menambahkan adapter layers kecil (low-rank matrices) ke attention layers yang sudah ada. Hanya adapter ini yang di-train — original weights tetap frozen. Hasilnya: train hanya 4 juta dari 8 miliar parameters (0.05%), butuh 1 GPU consumer, hasil hampir setara full fine-tune.

11_lora_finetune.py
from peft import LoraConfig, get_peft_model from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-8B") # LoRA: tambah adapter kecil ke attention layers lora_config = LoraConfig( r=16, # Rank (dimensi adapter) lora_alpha=32, # Scaling factor target_modules=["q_proj", "v_proj", "k_proj", "o_proj"], lora_dropout=0.05, task_type="CAUSAL_LM" ) model = get_peft_model(model, lora_config) model.print_trainable_parameters() # trainable: 4,194,304 / 8,030,261,248 (0.05%) # Hanya 4M dari 8B! Bisa di 1 GPU consumer (24GB) # QLoRA: + 4-bit quantization = lebih hemat lagi! from transformers import BitsAndBytesConfig bnb = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="bfloat16" ) model = AutoModelForCausalLM.from_pretrained( "meta-llama/Llama-3.2-8B", quantization_config=bnb ) # RAM usage: 16GB full → ~6GB with QLoRA!
📊

3. Fine-tuning Tools

Framework yang mempermudah fine-tuning
ToolFocusEaseFeatures
Hugging Face + PEFTGeneral purpose LoRAMediumMost flexible, full control
UnslothSpeed-optimized fine-tuningEasy2x faster, auto-optimization
AxolotlConfig-driven fine-tuningEasyYAML config, many formats
LLaMA-FactoryChinese/Asian modelsEasyGUI, 100+ models supported
OpenAI Fine-tuningGPT models via APIVery easyNo GPU needed, API-based
LLM
Tech Review Desk — Seri Belajar LLM
Sumber: Sebastian Raschka, Anthropic, OpenAI, Hugging Face, LLMOrbit, DeepSeek technical reports.
rominur@gmail.com  •  t.me/Jekardah_AI