Belajar TensorFlow Page 3 — CNN & Image Classification

📑 Daftar Isi — Page 3

📑 Table of Contents — Page 3

Review CNN dari Seri NN — Convolution, pooling, dan transisi ke TensorFlow
Conv2D Deep Dive — Parameters, output shape, dan kenapa CNN efisien
MaxPooling2D & GlobalAveragePooling2D — Dua strategi downsampling
CNN Pertama: CIFAR-10 — Classifier 10 kelas dari nol, ~88% akurasi
Data Augmentation — Keras Layers vs tf.image, +4% boost gratis
Transfer Learning Phase 1 — Frozen backbone + custom head
Transfer Learning Phase 2 — Fine-tuning: unfreeze top layers
Custom Dataset — image_dataset_from_directory + pipeline optimization
Visualisasi CNN — Feature maps, filters, dan Grad-CAM
Proyek: Production Image Classifier 95%+ — Template end-to-end
Ringkasan & Preview Page 4

CNN Review from NN Series — Convolution, pooling, and transition to TensorFlow
Conv2D Deep Dive — Parameters, output shape, and why CNN is efficient
MaxPooling2D & GlobalAveragePooling2D — Two downsampling strategies
First CNN: CIFAR-10 — 10-class classifier from scratch, ~88% accuracy
Data Augmentation — Keras Layers vs tf.image, free +4% boost
Transfer Learning Phase 1 — Frozen backbone + custom head
Transfer Learning Phase 2 — Fine-tuning: unfreeze top layers
Custom Dataset — image_dataset_from_directory + pipeline optimization
CNN Visualization — Feature maps, filters, and Grad-CAM
Project: Production Image Classifier 95%+ — End-to-end template
Summary & Page 4 Preview

🔄

1. Review CNN — Dari Seri Neural Network ke TensorFlow

1. CNN Review — From Neural Network Series to TensorFlow

Di seri NN Page 3, kita buat CNN manual 200+ baris NumPy. Sekarang: 3 baris Keras.

In NN series Page 3, we built CNN manually in 200+ lines of NumPy. Now: 3 lines of Keras.

Di seri Neural Network Page 3, kita mengimplementasi convolution (nested for-loop), padding (np.pad), stride, dan max pooling dari nol menggunakan NumPy — ratusan baris kode yang menyakitkan untuk di-debug. Sekarang dengan TensorFlow Keras, seluruh operasi itu menjadi satu baris: layers.Conv2D(32, (3,3), padding='same'). Namun, konsep yang kita pelajari di seri NN tetap fundamental — tanpa memahami convolution, Anda hanya pengguna API tanpa pemahaman.

In Neural Network series Page 3, we implemented convolution (nested for-loops), padding (np.pad), stride, and max pooling from scratch using NumPy — hundreds of lines painful to debug. Now with TensorFlow Keras, all those operations become one line: layers.Conv2D(32, (3,3), padding='same'). However, the concepts we learned in the NN series remain fundamental — without understanding convolution, you're just an API user without comprehension.

CNN Architecture — Konsep Sama, Syntax Keras / Same Concept, Keras Syntax Input Image Feature Extraction Classification (32×32×3) Conv → BN → ReLU → Conv Flatten/GAP → BN → ReLU → MaxPool Dense(256) RGB pixels → Dropout Dropout(0.5) (repeat 2-4 blocks, Dense(10) filters: 32→64→128) Softmax ↓ [0.01, 0.02, 0.87, ...] → Predicted class: bird NN Series (manual): for i in range(H): for j in range(W): ... (200 lines) TensorFlow (Keras): layers.Conv2D(32, (3,3), padding='same') (1 line) Key Insight: CNN = Feature Extractor (conv layers) + Classifier (dense layers) Transfer Learning = keep extractor, swap classifier!

Berikut rekap singkat konsep CNN yang sudah kita bahas di seri Neural Network:

Here's a quick recap of CNN concepts we covered in the Neural Network series:

Konsep	Apa Itu	NN Series	TensorFlow
Convolution	Filter 3×3 geser di gambar	Nested for-loop	`Conv2D(32, (3,3))`
Padding	Tambah pixel di tepi	`np.pad()`	`padding='same'`
Stride	Lompatan filter per step	Manual index	`strides=(2,2)`
Max Pooling	Ambil max dari window 2×2	Nested for-loop	`MaxPooling2D((2,2))`
Backprop CNN	Gradient convolution	100+ baris manual	Otomatis (GradientTape)

Concept	What It Is	NN Series	TensorFlow
Convolution	3×3 filter slides over image	Nested for-loop	`Conv2D(32, (3,3))`
Padding	Add pixels at edges	`np.pad()`	`padding='same'`
Stride	Filter jump per step	Manual indexing	`strides=(2,2)`
Max Pooling	Take max from 2×2 window	Nested for-loop	`MaxPooling2D((2,2))`
CNN Backprop	Convolution gradients	100+ lines manual	Automatic (GradientTape)

🧱

2. Conv2D Deep Dive — Setiap Parameter Dijelaskan

2. Conv2D Deep Dive — Every Parameter Explained

Memahami filters, kernel_size, strides, padding, dan output shape

Understanding filters, kernel_size, strides, padding, and output shape

13_conv2d_deep_dive.py — Conv2D Parameters Explainedpython

import tensorflow as tf
from tensorflow.keras import layers

# ===========================
# Conv2D — SETIAP parameter dijelaskan
# ===========================
conv = layers.Conv2D(
    filters=32,           # Jumlah filter output (= output channels)
                          # Lebih banyak filters = lebih banyak fitur
                          # Pattern: 32 → 64 → 128 → 256

    kernel_size=(3, 3),   # Ukuran filter: 3×3 (paling umum)
                          # Alternatif: (1,1) untuk channel mixing
                          #             (5,5) untuk receptive field besar
                          #             (7,7) hanya di layer pertama

    strides=(1, 1),       # Geser 1 pixel per step (default)
                          # strides=(2,2) → downsample 2× (replacement MaxPool)

    padding='same',        # 'same': output size = input size (add padding)
                          # 'valid': no padding → output shrinks
                          # Formula: output = (input - kernel + 2*pad) / stride + 1

    activation='relu',     # Activation function (or None + separate layer)

    use_bias=True,         # Add bias per filter (default True)

    kernel_initializer='glorot_uniform',  # He init better for ReLU:
    # kernel_initializer='he_normal',     # recommended with ReLU

    input_shape=(32, 32, 3)  # H, W, C — ONLY in first layer!
)

# ===========================
# Output Shape Calculation
# ===========================
# Input:  (batch, 32, 32, 3)    → 32×32 RGB image
# Conv2D(32, (3,3), padding='same', strides=(1,1))
# Output: (batch, 32, 32, 32)   → 32 feature maps, same spatial size

# Conv2D(64, (3,3), padding='valid', strides=(1,1))
# Output: (batch, 30, 30, 64)   → shrinks by (kernel-1) = 2 per side

# Conv2D(64, (3,3), padding='same', strides=(2,2))
# Output: (batch, 16, 16, 64)   → halved spatial size (like MaxPool!)

# ===========================
# Parameter Count — WHY CNN IS EFFICIENT
# ===========================
# Dense layer: 32×32×3 → 128 neurons
# Params = 3072 × 128 + 128 = 393,344 ← HUGE!

# Conv2D: 32 filters, 3×3, 3 input channels
# Params = (3 × 3 × 3 + 1) × 32 = 896 ← TINY!

# That's 438× fewer parameters!
# Secret: WEIGHT SHARING — same filter applied everywhere

model_demo = tf.keras.Sequential([
    layers.Conv2D(32, (3,3), padding='same', input_shape=(32,32,3)),
    layers.Conv2D(64, (3,3), padding='same'),
    layers.Conv2D(128, (3,3), padding='same'),
])
model_demo.summary()
# Layer 1: (3×3×3+1)×32   =     896 params
# Layer 2: (3×3×32+1)×64  =  18,496 params
# Layer 3: (3×3×64+1)×128 =  73,856 params
# Total: 93,248 — compare Dense equivalent: MILLIONS

🎓 Kenapa 3×3 Filter Dominan?
Dua layer Conv2D 3×3 memiliki receptive field efektif = 5×5, tapi dengan lebih sedikit parameter dan lebih banyak non-linearity (2× ReLU vs 1×). Tiga layer 3×3 = receptive field 7×7 tapi jauh lebih efisien. Ini adalah insight dari paper VGGNet (2014) yang masih berlaku sampai sekarang.

Rule of thumb: Selalu gunakan 3×3. Gunakan 1×1 untuk channel mixing. Gunakan 5×5 atau 7×7 hanya di layer pertama (jika gambar besar).

🎓 Why 3×3 Filters Dominate?
Two 3×3 Conv2D layers have an effective receptive field of 5×5, but with fewer parameters and more non-linearity (2× ReLU vs 1×). Three 3×3 layers = 7×7 receptive field but far more efficient. This is the insight from the VGGNet paper (2014) that still holds today.

Rule of thumb: Always use 3×3. Use 1×1 for channel mixing. Use 5×5 or 7×7 only in the first layer (if images are large).

📐

3. MaxPooling2D & GlobalAveragePooling2D

Dua strategi downsampling: klasik vs modern

Two downsampling strategies: classic vs modern

14_pooling_layers.py — Pooling Strategiespython

from tensorflow.keras import layers
import tensorflow as tf

# ===========================
# 1. MaxPooling2D — classic downsampling
# ===========================
pool = layers.MaxPooling2D(
    pool_size=(2, 2),     # 2×2 window
    strides=(2, 2),       # non-overlapping (default = pool_size)
    padding='valid'        # no padding (default)
)
# Input:  (batch, 32, 32, 64)
# Output: (batch, 16, 16, 64)  → spatial halved, channels same
# Params: 0 (no learnable parameters!)
# Takes MAX value from each 2×2 window → keeps strongest activation

# Contoh visual:
# [[1, 3],    MaxPool 2×2
#  [2, 4]] → output = 4 (maximum)

# ===========================
# 2. AveragePooling2D — alternative
# ===========================
avg_pool = layers.AveragePooling2D((2, 2))
# Same as MaxPool but takes AVERAGE instead of MAX
# Less commonly used — MaxPool usually works better

# ===========================
# 3. GlobalAveragePooling2D — MODERN replacement for Flatten
# ===========================
gap = layers.GlobalAveragePooling2D()
# Input:  (batch, 8, 8, 128)  → 128 feature maps, each 8×8
# Output: (batch, 128)         → average each map to ONE number
# Params: 0

# Compare with Flatten + Dense:
# Flatten: (8,8,128) → (8192,)  → Dense(256): 8192×256 = 2,097,408 params!
# GAP:     (8,8,128) → (128,)   → Dense(256): 128×256  = 32,896 params!
# That's 63× fewer parameters → less overfitting, faster training!

# ===========================
# 4. Modern pattern: Strided Conv instead of MaxPool
# ===========================
# Instead of: Conv → MaxPool
# Modern:     Conv(strides=2)  ← learnable downsampling!
strided = layers.Conv2D(64, (3,3), strides=(2,2), padding='same')
# Output halved, but the downsampling is LEARNED (has params)
# Used in ResNet, EfficientNet, modern architectures

💡 Flatten vs GlobalAveragePooling2D:
Flatten: bagus untuk gambar kecil (28×28 MNIST). Tapi untuk gambar besar (224×224), jumlah parameter classifier meledak.
GAP: jauh lebih efisien, lebih sedikit parameter, acts as regularization. Selalu gunakan GAP untuk gambar > 64×64. Semua model modern (ResNet, EfficientNet, MobileNet) menggunakan GAP.

💡 Flatten vs GlobalAveragePooling2D:
Flatten: fine for small images (28×28 MNIST). But for large images (224×224), classifier parameter count explodes.
GAP: far more efficient, fewer parameters, acts as regularization. Always use GAP for images > 64×64. All modern models (ResNet, EfficientNet, MobileNet) use GAP.

🖼️

4. CNN Pertama: CIFAR-10 Classifier dari Nol

4. First CNN: CIFAR-10 Classifier from Scratch

50,000 gambar training, 10 kategori, 32×32 piksel — target: 88%+ akurasi

50,000 training images, 10 categories, 32×32 pixels — target: 88%+ accuracy

CIFAR-10 adalah dataset benchmark computer vision: 60,000 gambar berwarna 32×32 piksel dibagi dalam 10 kategori (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). Ini jauh lebih menantang dari MNIST karena gambar berwarna, resolusi rendah, dan variasi tinggi.

CIFAR-10 is a benchmark computer vision dataset: 60,000 color images at 32×32 pixels divided into 10 categories (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). This is much more challenging than MNIST because images are colored, low-resolution, and highly varied.

15_cifar10_full.py — Complete CIFAR-10 CNN 🔥python

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

# ===========================
# 1. LOAD & PREPROCESS
# ===========================
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
X_train = X_train.astype('float32') / 255.0  # normalize [0,1]
X_test = X_test.astype('float32') / 255.0
print(f"Train: {X_train.shape}, Test: {X_test.shape}")
# Train: (50000, 32, 32, 3), Test: (10000, 32, 32, 3)

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

# ===========================
# 2. BUILD CNN — 3 Conv Blocks
# Pattern: Conv → BN → ReLU → Conv → BN → ReLU → MaxPool → Dropout
# ===========================
model = keras.Sequential([
    # ── Block 1: 32 filters ──
    layers.Conv2D(32, (3,3), padding='same', kernel_initializer='he_normal',
                  input_shape=(32,32,3)),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Conv2D(32, (3,3), padding='same', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.MaxPooling2D((2,2)),      # 32×32 → 16×16
    layers.Dropout(0.25),

    # ── Block 2: 64 filters ──
    layers.Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.MaxPooling2D((2,2)),      # 16×16 → 8×8
    layers.Dropout(0.25),

    # ── Block 3: 128 filters ──
    layers.Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.MaxPooling2D((2,2)),      # 8×8 → 4×4
    layers.Dropout(0.25),

    # ── Classifier Head ──
    layers.GlobalAveragePooling2D(), # (4,4,128) → (128)
    layers.Dense(256, activation='relu', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])

model.summary()
# Total params: ~430k (bandingkan: VGG16 = 138M, ResNet50 = 25.6M)

# ===========================
# 3. COMPILE
# ===========================
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# ===========================
# 4. TRAIN with callbacks
# ===========================
callbacks = [
    keras.callbacks.EarlyStopping(
        monitor='val_loss', patience=10, restore_best_weights=True),
    keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6),
]

history = model.fit(
    X_train, y_train,
    epochs=100,              # EarlyStopping will stop early
    batch_size=64,
    validation_split=0.1,   # 10% for validation
    callbacks=callbacks,
    verbose=1
)

# ===========================
# 5. EVALUATE
# ===========================
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"\n🎯 Test Accuracy: {test_acc:.1%}")
# 🎯 Test Accuracy: 88.3%
# With augmentation (Section 5): ~92%
# With transfer learning (Section 6): ~95%+

# ===========================
# 6. Per-class accuracy
# ===========================
y_pred = np.argmax(model.predict(X_test), axis=1)
for i, name in enumerate(class_names):
    mask = y_test.flatten() == i
    acc = (y_pred[mask] == i).mean()
    print(f"  {name:12s}: {acc:.1%}")
# airplane    : 90.2%
# cat         : 76.1%  ← hardest! (looks like dog/deer)
# truck       : 93.4%  ← easiest

🎓 Kenapa Pattern Conv → BN → ReLU?
Conv2D: extract features (weighted sum).
BatchNorm: normalize activations → training lebih stabil, LR bisa lebih tinggi.
ReLU: non-linearity → tanpa ini, seluruh network hanyalah linear transformation.
Kenapa BN sebelum ReLU? Masih diperdebatkan. Beberapa paper bilang BN setelah ReLU lebih baik. Dalam praktik, perbedaannya kecil — yang penting konsisten.

🎓 Why the Conv → BN → ReLU Pattern?
Conv2D: extract features (weighted sum).
BatchNorm: normalize activations → more stable training, can use higher LR.
ReLU: non-linearity → without this, the entire network is just a linear transformation.
Why BN before ReLU? Still debated. Some papers say BN after ReLU is better. In practice, the difference is small — consistency matters more.

🔄

5. Data Augmentation — +4% Boost Gratis

5. Data Augmentation — Free +4% Boost

Buat model "melihat" variasi baru dari data yang sama — tanpa data tambahan

Make the model "see" new variations of the same data — no additional data needed

Data augmentation = transformasi random pada gambar training: flip, rotate, zoom, crop, brightness change. Ini memaksa model untuk belajar pola umum (kucing dari sudut manapun = kucing) bukan menghafal gambar spesifik. Efeknya: regularization kuat yang meningkatkan generalisasi.

Data augmentation = random transformations on training images: flip, rotate, zoom, crop, brightness changes. This forces the model to learn general patterns (a cat from any angle = cat) instead of memorizing specific images. Effect: strong regularization that improves generalization.

16_augmentation_complete.py — Semua Metode Augmentationpython

import tensorflow as tf
from tensorflow.keras import layers

# ===========================
# METODE 1: Keras Preprocessing Layers (RECOMMENDED)
# Augmentation INSIDE the model — portable & clean!
# ===========================
data_augmentation = tf.keras.Sequential([
    layers.RandomFlip("horizontal"),            # flip kiri-kanan (50% chance)
    # layers.RandomFlip("horizontal_and_vertical"),  # juga atas-bawah
    layers.RandomRotation(0.1),                # rotate ±10% (±36°)
    layers.RandomZoom(0.1),                    # zoom ±10%
    layers.RandomTranslation(0.1, 0.1),        # shift ±10% H and W
    layers.RandomContrast(0.1),                # contrast ±10%
    layers.RandomBrightness(0.1),              # brightness ±10%
    # layers.RandomCrop(28, 28),                # random crop (if larger)
], name="augmentation")

# Integrate into model (BEST approach)
model_with_aug = tf.keras.Sequential([
    data_augmentation,                          # ← augmentation first!
    layers.Conv2D(32, 3, padding='same', activation='relu',
                  input_shape=(32,32,3)),
    layers.BatchNormalization(),
    layers.MaxPooling2D(),
    # ... rest of CNN
])
# IMPORTANT: Augmentation ONLY runs during training=True!
# During model.predict() and model.evaluate(): skipped automatically.

# ===========================
# METODE 2: tf.image (in tf.data pipeline)
# More control, custom augmentations
# ===========================
def augment_image(image, label):
    # Random flip
    image = tf.image.random_flip_left_right(image)

    # Random brightness & contrast
    image = tf.image.random_brightness(image, max_delta=0.1)
    image = tf.image.random_contrast(image, lower=0.9, upper=1.1)
    image = tf.image.random_saturation(image, lower=0.9, upper=1.1)
    image = tf.image.random_hue(image, max_delta=0.05)

    # Random crop (resize slightly larger, then crop back)
    image = tf.image.resize(image, [36, 36])
    image = tf.image.random_crop(image, size=[32, 32, 3])

    # Clip to valid range
    image = tf.clip_by_value(image, 0.0, 1.0)
    return image, label

# Apply in tf.data pipeline:
# train_ds = (dataset
#     .map(augment_image, num_parallel_calls=AUTOTUNE)
#     .batch(64).prefetch(AUTOTUNE))

# ===========================
# METODE 3: CutMix & MixUp (advanced)
# State-of-the-art augmentation techniques
# ===========================
def cutmix(images, labels, alpha=1.0):
    """CutMix: cut rectangle from one image, paste into another"""
    batch_size = tf.shape(images)[0]
    indices = tf.random.shuffle(tf.range(batch_size))
    shuffled_images = tf.gather(images, indices)
    shuffled_labels = tf.gather(labels, indices)

    lam = tf.random.uniform([], 0, alpha)
    # ... (calculate random box, blend images and labels)
    return mixed_images, mixed_labels

# CutMix + MixUp can add another 1-2% accuracy boost!

🎉 Impact Augmentation di CIFAR-10:
Tanpa augmentation: ~88% akurasi.
Dengan basic augmentation (flip, rotate, zoom): ~92% akurasi.
Itu +4% boost GRATIS tanpa mengubah arsitektur model! Data augmentation adalah teknik regularization paling efektif dan paling mudah untuk computer vision. Selalu gunakan.

🎉 Augmentation Impact on CIFAR-10:
Without augmentation: ~88% accuracy.
With basic augmentation (flip, rotate, zoom): ~92% accuracy.
That's a FREE +4% boost without changing the model architecture! Data augmentation is the most effective and easiest regularization technique for computer vision. Always use it.

🧊

6. Transfer Learning Phase 1 — Frozen Backbone

Manfaatkan model yang sudah belajar dari 14 juta gambar ImageNet

Leverage models that already learned from 14 million ImageNet images

Transfer learning = mengambil model yang sudah dilatih pada dataset besar (ImageNet: 14 juta gambar, 1000 kelas) dan menyesuaikannya untuk tugas Anda. Ini adalah teknik paling powerful di computer vision — bahkan dengan sedikit data (100 gambar per kelas!), Anda bisa mendapat akurasi yang sangat tinggi.

Transfer learning = taking a model already trained on a large dataset (ImageNet: 14 million images, 1000 classes) and adapting it for your task. This is the most powerful technique in computer vision — even with very little data (100 images per class!), you can achieve very high accuracy.

Phase 1: Freeze backbone (jangan update weight-nya), hanya train classifier head baru. Ini cepat dan efektif karena fitur ImageNet sudah sangat bagus untuk kebanyakan tugas vision.

Phase 1: Freeze the backbone (don't update its weights), only train the new classifier head. This is fast and effective because ImageNet features are already great for most vision tasks.

17_transfer_phase1.py — Frozen Backbone + Custom Headpython

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# ===========================
# 1. Load pre-trained backbone (TANPA classifier head)
# ===========================
base_model = keras.applications.MobileNetV2(
    input_shape=(96, 96, 3),   # MobileNet needs >= 32×32
    include_top=False,          # REMOVE ImageNet classifier (1000 classes)
    weights='imagenet'           # load pre-trained weights!
)

# FREEZE all layers — don't update weights!
base_model.trainable = False

print(f"Backbone: {base_model.name}")
print(f"Params: {base_model.count_params():,} (ALL frozen)")
print(f"Output shape: {base_model.output_shape}")
# Backbone: mobilenetv2_1.00_96
# Params: 2,257,984 (ALL frozen)
# Output shape: (None, 3, 3, 1280) — feature maps

# ===========================
# 2. Add preprocessing + custom classifier
# ===========================
model = keras.Sequential([
    # Preprocessing
    layers.Resizing(96, 96),                     # resize 32→96
    layers.Rescaling(1./127.5, offset=-1),       # normalize to [-1, 1]
    # Each model has its own preprocessing!
    # MobileNet: [-1, 1]
    # EfficientNet: [0, 255] (built-in preprocessing)
    # ResNet: caffe-style (BGR, mean subtraction)

    # Frozen feature extractor
    base_model,

    # New classifier head (THIS is what we train)
    layers.GlobalAveragePooling2D(),          # (3,3,1280) → (1280)
    layers.Dropout(0.3),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')     # 10 CIFAR classes
])

# ===========================
# 3. Compile with HIGHER learning rate
# (backbone frozen → only head learns → LR can be aggressive)
# ===========================
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# ===========================
# 4. Train Phase 1
# ===========================
history_p1 = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=64,
    validation_split=0.1
)
# ≈90% accuracy in just 10 epochs! Backbone features are powerful.

print(f"Phase 1 Val Accuracy: {history_p1.history['val_accuracy'][-1]:.1%}")

Model Pre-Trained	Params	Top-1 ImageNet	Ukuran File	Best For
MobileNetV2	3.4M	71.3%	14 MB	Mobile, edge, prototyping
MobileNetV3Large	5.4M	75.6%	22 MB	Better MobileNet
EfficientNetB0	5.3M	77.1%	29 MB	Best accuracy/size ratio ⭐
EfficientNetB3	12.2M	81.6%	48 MB	Higher accuracy, still fast
ResNet50	25.6M	76.0%	98 MB	Classic, well-studied
ResNet152	60.2M	78.3%	232 MB	Higher accuracy ResNet
EfficientNetB7	66.3M	84.3%	256 MB	Maximum accuracy (large)
ConvNeXtTiny	28.6M	82.1%	110 MB	Modern CNN (2022+)

Pre-Trained Model	Params	Top-1 ImageNet	File Size	Best For
MobileNetV2	3.4M	71.3%	14 MB	Mobile, edge, prototyping
MobileNetV3Large	5.4M	75.6%	22 MB	Better MobileNet
EfficientNetB0	5.3M	77.1%	29 MB	Best accuracy/size ratio ⭐
EfficientNetB3	12.2M	81.6%	48 MB	Higher accuracy, still fast
ResNet50	25.6M	76.0%	98 MB	Classic, well-studied
ResNet152	60.2M	78.3%	232 MB	Higher accuracy ResNet
EfficientNetB7	66.3M	84.3%	256 MB	Maximum accuracy (large)
ConvNeXtTiny	28.6M	82.1%	110 MB	Modern CNN (2022+)

🔥

7. Transfer Learning Phase 2 — Fine-Tuning

Unfreeze top layers backbone, train dengan LR sangat rendah → +5% boost

Unfreeze top backbone layers, train with very low LR → +5% boost

18_transfer_phase2.py — Fine-Tuning Top Layerspython

# ===========================
# Phase 2: Unfreeze top layers of backbone
# ===========================

# Unfreeze backbone
base_model.trainable = True

# But freeze all EXCEPT the last 30 layers
print(f"Total layers: {len(base_model.layers)}")  # ~155 layers
for layer in base_model.layers[:-30]:
    layer.trainable = False

# Count trainable vs frozen
trainable = sum(1 for l in base_model.layers if l.trainable)
frozen = sum(1 for l in base_model.layers if not l.trainable)
print(f"Trainable: {trainable}, Frozen: {frozen}")

# CRITICAL: Recompile with MUCH LOWER learning rate!
# If LR is too high → destroys pre-trained weights
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),  # 100× lower!
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train Phase 2
history_p2 = model.fit(
    X_train, y_train,
    epochs=20,
    batch_size=64,
    validation_split=0.1,
    callbacks=[
        keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
        keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=3, min_lr=1e-7)
    ]
)
# ≈95%+ accuracy! Fine-tuning adapts backbone features to your domain.

# ===========================
# Alternative: Gradual Unfreezing (even better for small datasets)
# ===========================
# Step 1: Train head (Phase 1) — 10 epochs, LR=1e-3
# Step 2: Unfreeze last 10 layers — 5 epochs, LR=1e-5
# Step 3: Unfreeze last 30 layers — 5 epochs, LR=5e-6
# Step 4: Unfreeze all — 5 epochs, LR=1e-6
# This gives the model time to adapt gradually, less risk of forgetting.

🎓 Kenapa 2-Phase Training?
Phase 1 (frozen, LR=1e-3): Head baru dimulai dari random weights → butuh LR tinggi untuk belajar cepat. Backbone sudah bagus → tidak perlu diubah dulu.
Phase 2 (fine-tune, LR=1e-5): Head sudah konvergen → sekarang sesuaikan backbone ke domain spesifik Anda. LR rendah agar tidak merusak fitur yang sudah dipelajari dari 14 juta gambar.

Jika langsung fine-tune tanpa Phase 1: Head random + backbone update = chaos. Gradient besar dari random head merusak backbone weights. Selalu train head dulu!

🎓 Why 2-Phase Training?
Phase 1 (frozen, LR=1e-3): New head starts from random weights → needs high LR to learn fast. Backbone is already good → no need to change it yet.
Phase 2 (fine-tune, LR=1e-5): Head has converged → now adapt backbone to your specific domain. Low LR to not destroy features learned from 14 million images.

If you fine-tune without Phase 1: Random head + backbone updates = chaos. Large gradients from random head destroy backbone weights. Always train the head first!

📂

8. Custom Dataset — Load Gambar Sendiri

8. Custom Dataset — Load Your Own Images

Dari folder lokal langsung ke model — semudah satu function call

From local folder straight to model — as easy as one function call

19_custom_dataset.py — Load & Optimize Custom Imagespython

import tensorflow as tf

# ===========================
# Folder Structure (PENTING!):
# data/
#   train/
#     cats/       ← folder name = class label
#       cat001.jpg
#       cat002.jpg
#     dogs/
#       dog001.jpg
#     birds/
#       bird001.jpg
#   test/
#     cats/
#     dogs/
#     birds/
# ===========================

# Load with automatic train/val split
train_ds = tf.keras.utils.image_dataset_from_directory(
    "data/train",
    image_size=(224, 224),     # auto-resize ALL images to this
    batch_size=32,
    label_mode='int',           # integer labels: 0=birds, 1=cats, 2=dogs
    # label_mode='categorical'  # one-hot: [1,0,0], [0,1,0], [0,0,1]
    shuffle=True,
    seed=42,
    validation_split=0.2,       # 80% train, 20% val
    subset='training'
)

val_ds = tf.keras.utils.image_dataset_from_directory(
    "data/train",
    image_size=(224, 224),
    batch_size=32,
    label_mode='int',
    seed=42,
    validation_split=0.2,
    subset='validation'
)

print(f"Classes: {train_ds.class_names}")  # ['birds', 'cats', 'dogs']
print(f"Num classes: {len(train_ds.class_names)}")

# ===========================
# Optimize pipeline (CRITICAL for performance!)
# See Page 4 for deep dive on tf.data
# ===========================
AUTOTUNE = tf.data.AUTOTUNE

train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
# cache() → store in RAM after first epoch
# prefetch() → prepare next batch while GPU trains current

# Now use exactly like built-in dataset:
# model.fit(train_ds, validation_data=val_ds, epochs=20)

# ===========================
# Handle imbalanced classes
# ===========================
import numpy as np
# Count per class
# cats: 5000, dogs: 5000, birds: 500 ← imbalanced!
class_weights = {0: 1.0, 1: 1.0, 2: 10.0}  # upweight rare class
# model.fit(train_ds, class_weight=class_weights)

🔍

9. Visualisasi CNN — Feature Maps & Grad-CAM

9. CNN Visualization — Feature Maps & Grad-CAM

Apa yang sebenarnya "dilihat" CNN di setiap layer?

What does the CNN actually "see" at each layer?

20_visualize_cnn.py — Feature Maps & Grad-CAMpython

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# ===========================
# 1. Extract Feature Maps
# ===========================
# Build a model that outputs intermediate layer activations
conv_outputs = [l.output for l in model.layers if 'conv2d' in l.name]
feature_model = tf.keras.Model(inputs=model.input, outputs=conv_outputs)

# Pass one image through
img = X_test[0:1]  # shape: (1, 32, 32, 3)
all_features = feature_model.predict(img)

# Visualize feature maps from each conv layer
for layer_idx, features in enumerate(all_features[:3]):
    fig, axes = plt.subplots(4, 8, figsize=(16, 8))
    for i, ax in enumerate(axes.flat):
        if i < features.shape[-1]:
            ax.imshow(features[0, :, :, i], cmap='viridis')
        ax.axis('off')
    plt.suptitle(f'Feature Maps — Conv Layer {layer_idx+1}')
    plt.tight_layout()
    plt.show()

# ===========================
# 2. Visualize Learned Filters
# ===========================
filters, biases = model.layers[0].get_weights()
print(f"Filter shape: {filters.shape}")  # (3, 3, 3, 32)

# Normalize filters for visualization
f_min, f_max = filters.min(), filters.max()
filters_norm = (filters - f_min) / (f_max - f_min)

fig, axes = plt.subplots(4, 8, figsize=(12, 6))
for i, ax in enumerate(axes.flat):
    ax.imshow(filters_norm[:, :, :, i])  # 3×3×3 RGB filter
    ax.axis('off')
plt.suptitle('Learned 3×3 Filters (Conv Layer 1)')
plt.show()

# ===========================
# 3. Grad-CAM — where is the model looking?
# ===========================
def grad_cam(model, image, class_idx, last_conv_layer_name):
    """Generate Grad-CAM heatmap"""
    grad_model = tf.keras.Model(
        model.inputs,
        [model.get_layer(last_conv_layer_name).output, model.output]
    )

    with tf.GradientTape() as tape:
        conv_output, predictions = grad_model(image)
        loss = predictions[:, class_idx]

    grads = tape.gradient(loss, conv_output)
    weights = tf.reduce_mean(grads, axis=(1, 2))  # global avg pool gradients
    cam = tf.reduce_sum(weights[:, tf.newaxis, tf.newaxis, :] * conv_output, axis=-1)
    cam = tf.nn.relu(cam)  # only positive contributions
    cam = cam / (tf.reduce_max(cam) + 1e-8)  # normalize
    return cam.numpy()[0]

# Usage:
# heatmap = grad_cam(model, img, class_idx=3, last_conv_layer_name='conv2d_5')
# plt.imshow(img[0]); plt.imshow(heatmap, alpha=0.4, cmap='jet')
# → Shows WHERE the model focuses to make its prediction!

What CNN "Sees" at Each Depth Layer 1-2 (shallow) Layer 3-4 (middle) Layer 5+ (deep) ───────────────── ───────────────── ───────────────── │ / — \ │ edges │ ◯ △ ☐ │ shapes │ 🐱 🐶 │ objects │ ║ ≡ ░ │ textures │ 👁 ◉ ▽ │ parts │ 🚗 ✈ │ categories │ ╲ ╱ ┃ │ gradients │ 🦷 👃 │ components │ 🏠 🌲 │ full scenes Low-level features → Mid-level features → High-level features This hierarchy is why Transfer Learning works! Early layers (edges, textures) are UNIVERSAL — same for all images. Only deep layers need to be fine-tuned for your specific task.

🏆

10. Proyek: Production Image Classifier 95%+

10. Project: Production Image Classifier 95%+

Template end-to-end: data → augmentation → transfer learning → evaluate → save

End-to-end template: data → augmentation → transfer learning → evaluate → save

21_production_classifier.py — Complete Template 🔥python

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# ═══════════════════════════════════════
# 🏆 PRODUCTION IMAGE CLASSIFIER TEMPLATE
# Ganti folder & NUM_CLASSES → done!
# ═══════════════════════════════════════

# 1. CONFIG
IMG_SIZE = (224, 224)
BATCH_SIZE = 32

# 2. LOAD DATA
train_ds = keras.utils.image_dataset_from_directory(
    "data/train", image_size=IMG_SIZE, batch_size=BATCH_SIZE,
    validation_split=0.2, subset="training", seed=42)
val_ds = keras.utils.image_dataset_from_directory(
    "data/train", image_size=IMG_SIZE, batch_size=BATCH_SIZE,
    validation_split=0.2, subset="validation", seed=42)

NUM_CLASSES = len(train_ds.class_names)
print(f"🏷 Classes: {train_ds.class_names} ({NUM_CLASSES})")

# 3. OPTIMIZE PIPELINE
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().prefetch(AUTOTUNE)
val_ds = val_ds.cache().prefetch(AUTOTUNE)

# 4. AUGMENTATION
augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.15),
    layers.RandomZoom(0.1),
    layers.RandomContrast(0.1),
])

# 5. MODEL: EfficientNetB0 backbone
base = keras.applications.EfficientNetB0(
    input_shape=(*IMG_SIZE, 3), include_top=False, weights="imagenet")
base.trainable = False

model = keras.Sequential([
    augmentation,
    layers.Rescaling(1./255),
    base,
    layers.GlobalAveragePooling2D(),
    layers.Dropout(0.3),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(NUM_CLASSES, activation='softmax')
])

# 6. PHASE 1: Train head
model.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"])

model.fit(train_ds, validation_data=val_ds, epochs=15,
          callbacks=[keras.callbacks.EarlyStopping(patience=5,
                     restore_best_weights=True)])

# 7. PHASE 2: Fine-tune
base.trainable = True
for layer in base.layers[:-20]:
    layer.trainable = False

model.compile(
    optimizer=keras.optimizers.Adam(1e-5),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"])

model.fit(train_ds, validation_data=val_ds, epochs=10,
          callbacks=[
              keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True),
              keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=2)])

# 8. SAVE
model.save("production_classifier.keras")
print("🏆 Production classifier ready! Typically 95%+ accuracy.")

🎉 Template Universal!
Script di atas adalah template yang bisa Anda pakai untuk hampir semua tugas image classification. Prosesnya:
1. Siapkan folder: data/train/class_a/, data/train/class_b/, dst.
2. Ganti path folder dan nama akan otomatis terdeteksi.
3. Jalankan script → 95%+ akurasi dengan 500-1000 gambar per kelas!
Transfer learning + augmentation + 2-phase training = formula winning yang konsisten.

🎉 Universal Template!
The script above is a template you can use for almost any image classification task. Process:
1. Prepare folders: data/train/class_a/, data/train/class_b/, etc.
2. Change folder path and class names are auto-detected.
3. Run script → 95%+ accuracy with 500-1000 images per class!
Transfer learning + augmentation + 2-phase training = a consistently winning formula.

📝

11. Ringkasan Page 3

11. Page 3 Summary

Semua yang sudah kita pelajari

Everything we've learned

Konsep	Apa Itu	Kode Kunci
Conv2D	Convolution layer — extract features	`Conv2D(32, (3,3), padding='same')`
MaxPooling2D	Downsampling spatial 2×	`MaxPooling2D((2,2))`
GlobalAvgPool2D	Feature maps → 1 number/channel	`GlobalAveragePooling2D()`
BatchNormalization	Normalize activations per batch	`BatchNormalization()`
Data Augmentation	Variasi random saat training	`RandomFlip, RandomRotation, RandomZoom`
Transfer Learning	Pakai backbone pre-trained	`MobileNetV2(weights='imagenet')`
Phase 1	Freeze backbone, train head (LR=1e-3)	`base.trainable = False`
Phase 2	Unfreeze top layers (LR=1e-5)	`base.layers[:-30].trainable = False`
Custom Dataset	Load dari folder lokal	`image_dataset_from_directory()`
Feature Maps	Visualisasi apa yang CNN lihat	`Model(inputs, [conv.output])`
Grad-CAM	Di mana model fokus	`GradientTape + heatmap`

Concept	What It Is	Key Code
Conv2D	Convolution layer — extract features	`Conv2D(32, (3,3), padding='same')`
MaxPooling2D	2× spatial downsampling	`MaxPooling2D((2,2))`
GlobalAvgPool2D	Feature maps → 1 number/channel	`GlobalAveragePooling2D()`
BatchNormalization	Normalize activations per batch	`BatchNormalization()`
Data Augmentation	Random training variations	`RandomFlip, RandomRotation, RandomZoom`
Transfer Learning	Use pre-trained backbone	`MobileNetV2(weights='imagenet')`
Phase 1	Freeze backbone, train head (LR=1e-3)	`base.trainable = False`
Phase 2	Unfreeze top layers (LR=1e-5)	`base.layers[:-30].trainable = False`
Custom Dataset	Load from local folder	`image_dataset_from_directory()`
Feature Maps	Visualize what CNN sees	`Model(inputs, [conv.output])`
Grad-CAM	Where model focuses	`GradientTape + heatmap`

← Page Sebelumnya← Previous Page

Page 2 — Keras API & Model Building

📘

Coming Next: Page 4 — tf.data Pipeline & Performance

Data loading yang lambat = GPU menganggur 50-90% waktu! Page 4 membahas secara mendalam: tf.data.Dataset API, the golden pattern (shuffle → map → batch → prefetch), cache untuk dataset kecil, TFRecord format untuk dataset besar, parallel data loading dengan num_parallel_calls, mixed precision training (float16 untuk 2× speedup), dan profiling dengan TF Profiler. Optimasi training Anda sampai 10× lebih cepat!

📘

Coming Next: Page 4 — tf.data Pipeline & Performance

Slow data loading = GPU sitting idle 50-90% of the time! Page 4 covers in depth: tf.data.Dataset API, the golden pattern (shuffle → map → batch → prefetch), cache for small datasets, TFRecord format for large datasets, parallel data loading with num_parallel_calls, mixed precision training (float16 for 2× speedup), and profiling with TF Profiler. Optimize your training up to 10× faster!