📝 Artikel ini ditulis dalam Bahasa Indonesia & English
📝 This article is available in English & Bahasa Indonesia

🔄 Tutorial Neural Network — Page 9Neural Network Tutorial — Page 9

Transfer Learning &
Fine-Tuning

Transfer Learning &
Fine-Tuning

Tidak perlu training dari nol! Gunakan model raksasa yang sudah dilatih (ResNet, BERT, GPT) dan sesuaikan untuk task Anda. Page 9 membahas: feature extraction, fine-tuning strategies, learning rate differential, domain adaptation, dan kapan pakai masing-masing pendekatan.

No need to train from scratch! Use giant pre-trained models (ResNet, BERT, GPT) and adapt them for your task. Page 9 covers: feature extraction, fine-tuning strategies, differential learning rates, domain adaptation, and when to use each approach.

📅 MaretMarch 202626 menit baca26 min read
🏷 Transfer LearningFine-TuningResNetBERTFeature ExtractionDomain Adaptation
📚 Seri Tutorial Neural Network:Neural Network Tutorial Series:

📑 Daftar Isi — Page 9

📑 Table of Contents — Page 9

  1. Kenapa Transfer Learning? — Jangan mulai dari nol
  2. Feature Extraction — Pakai model sebagai feature extractor
  3. Fine-Tuning — Sesuaikan seluruh model ke task baru
  4. Strategi Fine-Tuning — Freeze layers, differential LR, gradual unfreezing
  5. Transfer Learning untuk Vision — ResNet, VGG untuk image classification
  6. Transfer Learning untuk NLP — BERT, GPT untuk text tasks
  7. Ringkasan & Preview Page 10
  1. Why Transfer Learning? — Don't start from zero
  2. Feature Extraction — Use model as feature extractor
  3. Fine-Tuning — Adapt the entire model to a new task
  4. Fine-Tuning Strategies — Freeze layers, differential LR, gradual unfreezing
  5. Transfer Learning for Vision — ResNet, VGG for image classification
  6. Transfer Learning for NLP — BERT, GPT for text tasks
  7. Summary & Page 10 Preview
💡

1. Kenapa Transfer Learning?

1. Why Transfer Learning?

Model besar sudah belajar representasi universal — manfaatkan!
Large models already learned universal representations — leverage them!

Transfer Learning = mengambil model yang sudah dilatih pada dataset besar (ImageNet, Wikipedia, internet) dan menyesuaikannya untuk task spesifik Anda. Ini menghemat waktu, data, dan compute secara drastis — dari berminggu-minggu training menjadi hitungan menit.

Transfer Learning = taking a model pre-trained on a large dataset (ImageNet, Wikipedia, internet) and adapting it for your specific task. This drastically saves time, data, and compute — from weeks of training to minutes.

Transfer Learning: Dua Pendekatan / Two Approaches 1. Feature Extraction 2. Fine-Tuning ┌──────────────────┐ ┌──────────────────┐ │ Pre-trained │ ❄️ FROZEN │ Pre-trained │ 🔥 TRAINABLE │ Layers │ │ Layers │ (small LR) │ (ResNet/BERT) │ │ (ResNet/BERT) │ └────────┬─────────┘ └────────┬─────────┘ │ │ ┌────────▼─────────┐ ┌────────▼─────────┐ │ New FC Layer │ 🔥 TRAIN │ New FC Layer │ 🔥 TRAIN │ (your task) │ │ (your task) │ └──────────────────┘ └──────────────────┘ Good: small dataset Good: medium-large dataset Fast, no overfitting risk Better accuracy, slower
🧊

2. Feature Extraction — Freeze & Extract

2. Feature Extraction — Freeze & Extract

Gunakan layer-layer yang sudah dilatih sebagai feature extractor tetap
Use pre-trained layers as a fixed feature extractor
36_feature_extraction.py — Transfer Learning: Feature Extractionpython
import numpy as np

class TransferModel:
    """Transfer Learning with frozen backbone"""

    def __init__(self, pretrained_weights, num_classes):
        # Frozen backbone (pre-trained) — DO NOT update!
        self.backbone = pretrained_weights  # dict of W, b per layer
        self.frozen = True

        # New classification head — TRAIN this!
        feat_dim = 512  # output dim of backbone
        self.W_head = np.random.randn(feat_dim, num_classes) * 0.01
        self.b_head = np.zeros((1, num_classes))

    def extract_features(self, X):
        """Run X through frozen backbone"""
        h = X
        for layer in self.backbone:
            h = np.maximum(0, h @ layer['W'] + layer['b'])
        return h  # features (not trained!)

    def forward(self, X):
        features = self.extract_features(X)  # frozen
        logits = features @ self.W_head + self.b_head  # trainable
        exp_l = np.exp(logits - np.max(logits, axis=1, keepdims=True))
        return exp_l / exp_l.sum(axis=1, keepdims=True)

    def train_head(self, X, y_oh, lr=0.01):
        """Only update classification head"""
        features = self.extract_features(X)  # no grad here
        probs = self.forward(X)
        m = X.shape[0]
        dz = (probs - y_oh) / m
        # Only update head weights
        self.W_head -= lr * features.T @ dz
        self.b_head -= lr * np.sum(dz, axis=0, keepdims=True)

# With just 100 labeled samples + pre-trained backbone → 90%+ accuracy!
🔥

3. Fine-Tuning Strategies

3. Fine-Tuning Strategies

Gradual unfreezing, differential LR, dan best practices
Gradual unfreezing, differential LR, and best practices

Fine-tuning yang efektif menggunakan differential learning rate: layer awal (fitur umum) pakai LR kecil, layer akhir (fitur spesifik) pakai LR lebih besar. Gradual unfreezing: mulai train head saja, lalu pelan-pelan unfreeze layer dari akhir ke awal.

Effective fine-tuning uses differential learning rates: early layers (general features) use small LR, later layers (specific features) use larger LR. Gradual unfreezing: start training only the head, then gradually unfreeze layers from end to beginning.

🎓 Rule of Thumb Fine-Tuning:
Data sedikit + domain mirip → Feature extraction (freeze semua, train head saja).
Data sedikit + domain beda → Feature extraction + augmentasi data.
Data banyak + domain mirip → Fine-tune semua dengan LR kecil.
Data banyak + domain beda → Fine-tune, mungkin perlu train dari nol.

🎓 Fine-Tuning Rule of Thumb:
Small data + similar domain → Feature extraction (freeze all, train head only).
Small data + different domain → Feature extraction + data augmentation.
Large data + similar domain → Fine-tune all with small LR.
Large data + different domain → Fine-tune, might need to train from scratch.

📝

4. Ringkasan Page 9

4. Page 9 Summary

Apa yang sudah kita pelajari
What we learned
KonsepApa ItuKapan Pakai
Transfer LearningPakai model pre-trained untuk task baruHampir selalu!
Feature ExtractionFreeze backbone, train head sajaData sedikit
Fine-TuningUnfreeze & train seluruh modelData cukup banyak
Differential LRLR berbeda per layer groupFine-tuning lanjutan
Gradual UnfreezingUnfreeze dari akhir ke awal bertahapMenghindari catastrophic forgetting
ResNet/VGGPre-trained vision backbonesImage classification
BERT/GPTPre-trained language modelsNLP tasks
ConceptWhat It IsWhen to Use
Transfer LearningUse pre-trained model for new taskAlmost always!
Feature ExtractionFreeze backbone, train head onlySmall data
Fine-TuningUnfreeze & train entire modelEnough data available
Differential LRDifferent LR per layer groupAdvanced fine-tuning
Gradual UnfreezingUnfreeze from end to start progressivelyAvoid catastrophic forgetting
ResNet/VGGPre-trained vision backbonesImage classification
BERT/GPTPre-trained language modelsNLP tasks
← Page Sebelumnya← Previous Page

Page 8 — Transformer & Attention Mechanism

🏆

Final: Page 10 — Capstone Project & What's Next

Gabungkan SEMUA yang sudah dipelajari dalam satu proyek akhir: membangun mini-GPT (language model) dari nol. Plus roadmap: apa yang harus dipelajari selanjutnya. The grand finale!

🏆

Final: Page 10 — Capstone Project & What's Next

Combine EVERYTHING we've learned in one final project: building a mini-GPT (language model) from scratch. Plus roadmap: what to learn next. The grand finale!