📑 Daftar Isi — Page 9
📑 Table of Contents — Page 9
- Kenapa Transfer Learning? — Jangan mulai dari nol
- Feature Extraction — Pakai model sebagai feature extractor
- Fine-Tuning — Sesuaikan seluruh model ke task baru
- Strategi Fine-Tuning — Freeze layers, differential LR, gradual unfreezing
- Transfer Learning untuk Vision — ResNet, VGG untuk image classification
- Transfer Learning untuk NLP — BERT, GPT untuk text tasks
- Ringkasan & Preview Page 10
- Why Transfer Learning? — Don't start from zero
- Feature Extraction — Use model as feature extractor
- Fine-Tuning — Adapt the entire model to a new task
- Fine-Tuning Strategies — Freeze layers, differential LR, gradual unfreezing
- Transfer Learning for Vision — ResNet, VGG for image classification
- Transfer Learning for NLP — BERT, GPT for text tasks
- Summary & Page 10 Preview
1. Kenapa Transfer Learning?
1. Why Transfer Learning?
Transfer Learning = mengambil model yang sudah dilatih pada dataset besar (ImageNet, Wikipedia, internet) dan menyesuaikannya untuk task spesifik Anda. Ini menghemat waktu, data, dan compute secara drastis — dari berminggu-minggu training menjadi hitungan menit.
Transfer Learning = taking a model pre-trained on a large dataset (ImageNet, Wikipedia, internet) and adapting it for your specific task. This drastically saves time, data, and compute — from weeks of training to minutes.
2. Feature Extraction — Freeze & Extract
2. Feature Extraction — Freeze & Extract
import numpy as np class TransferModel: """Transfer Learning with frozen backbone""" def __init__(self, pretrained_weights, num_classes): # Frozen backbone (pre-trained) — DO NOT update! self.backbone = pretrained_weights # dict of W, b per layer self.frozen = True # New classification head — TRAIN this! feat_dim = 512 # output dim of backbone self.W_head = np.random.randn(feat_dim, num_classes) * 0.01 self.b_head = np.zeros((1, num_classes)) def extract_features(self, X): """Run X through frozen backbone""" h = X for layer in self.backbone: h = np.maximum(0, h @ layer['W'] + layer['b']) return h # features (not trained!) def forward(self, X): features = self.extract_features(X) # frozen logits = features @ self.W_head + self.b_head # trainable exp_l = np.exp(logits - np.max(logits, axis=1, keepdims=True)) return exp_l / exp_l.sum(axis=1, keepdims=True) def train_head(self, X, y_oh, lr=0.01): """Only update classification head""" features = self.extract_features(X) # no grad here probs = self.forward(X) m = X.shape[0] dz = (probs - y_oh) / m # Only update head weights self.W_head -= lr * features.T @ dz self.b_head -= lr * np.sum(dz, axis=0, keepdims=True) # With just 100 labeled samples + pre-trained backbone → 90%+ accuracy!
3. Fine-Tuning Strategies
3. Fine-Tuning Strategies
Fine-tuning yang efektif menggunakan differential learning rate: layer awal (fitur umum) pakai LR kecil, layer akhir (fitur spesifik) pakai LR lebih besar. Gradual unfreezing: mulai train head saja, lalu pelan-pelan unfreeze layer dari akhir ke awal.
Effective fine-tuning uses differential learning rates: early layers (general features) use small LR, later layers (specific features) use larger LR. Gradual unfreezing: start training only the head, then gradually unfreeze layers from end to beginning.
🎓 Rule of Thumb Fine-Tuning:
Data sedikit + domain mirip → Feature extraction (freeze semua, train head saja).
Data sedikit + domain beda → Feature extraction + augmentasi data.
Data banyak + domain mirip → Fine-tune semua dengan LR kecil.
Data banyak + domain beda → Fine-tune, mungkin perlu train dari nol.
🎓 Fine-Tuning Rule of Thumb:
Small data + similar domain → Feature extraction (freeze all, train head only).
Small data + different domain → Feature extraction + data augmentation.
Large data + similar domain → Fine-tune all with small LR.
Large data + different domain → Fine-tune, might need to train from scratch.
4. Ringkasan Page 9
4. Page 9 Summary
| Konsep | Apa Itu | Kapan Pakai |
|---|---|---|
| Transfer Learning | Pakai model pre-trained untuk task baru | Hampir selalu! |
| Feature Extraction | Freeze backbone, train head saja | Data sedikit |
| Fine-Tuning | Unfreeze & train seluruh model | Data cukup banyak |
| Differential LR | LR berbeda per layer group | Fine-tuning lanjutan |
| Gradual Unfreezing | Unfreeze dari akhir ke awal bertahap | Menghindari catastrophic forgetting |
| ResNet/VGG | Pre-trained vision backbones | Image classification |
| BERT/GPT | Pre-trained language models | NLP tasks |
| Concept | What It Is | When to Use |
|---|---|---|
| Transfer Learning | Use pre-trained model for new task | Almost always! |
| Feature Extraction | Freeze backbone, train head only | Small data |
| Fine-Tuning | Unfreeze & train entire model | Enough data available |
| Differential LR | Different LR per layer group | Advanced fine-tuning |
| Gradual Unfreezing | Unfreeze from end to start progressively | Avoid catastrophic forgetting |
| ResNet/VGG | Pre-trained vision backbones | Image classification |
| BERT/GPT | Pre-trained language models | NLP tasks |
Page 8 — Transformer & Attention Mechanism
Final: Page 10 — Capstone Project & What's Next
Gabungkan SEMUA yang sudah dipelajari dalam satu proyek akhir: membangun mini-GPT (language model) dari nol. Plus roadmap: apa yang harus dipelajari selanjutnya. The grand finale!
Final: Page 10 — Capstone Project & What's Next
Combine EVERYTHING we've learned in one final project: building a mini-GPT (language model) from scratch. Plus roadmap: what to learn next. The grand finale!