πŸ“ Artikel ini ditulis dalam Bahasa Indonesia & English
πŸ“ This article is available in English & Bahasa Indonesia

πŸ‘οΈ Belajar TensorFlow β€” Page 3Learn TensorFlow β€” Page 3

CNN &
Image Classification

CNN &
Image Classification

Computer vision dengan TensorFlow. Page 3 membahas secara mendalam: cara kerja Conv2D dan parameter-parameternya, MaxPooling2D dan GlobalAveragePooling2D, membangun CNN CIFAR-10 dari nol (3 conv blocks + BatchNorm + Dropout), data augmentation menggunakan Keras Preprocessing Layers dan tf.image, transfer learning 2-phase dengan MobileNetV2 dan EfficientNetB0, memproses custom dataset dengan image_dataset_from_directory, visualisasi filters dan feature maps, serta proyek end-to-end image classifier 95%+.

Computer vision with TensorFlow. Page 3 covers in depth: how Conv2D works and its parameters, MaxPooling2D and GlobalAveragePooling2D, building a CIFAR-10 CNN from scratch (3 conv blocks + BatchNorm + Dropout), data augmentation using Keras Preprocessing Layers and tf.image, 2-phase transfer learning with MobileNetV2 and EfficientNetB0, processing custom datasets with image_dataset_from_directory, visualizing filters and feature maps, and an end-to-end 95%+ image classifier project.

πŸ“… MaretMarch 2026⏱ 35 menit baca35 min read
🏷 CNNConv2DMaxPoolingAugmentationTransfer LearningMobileNetEfficientNetCIFAR-10
πŸ“š Seri Belajar TensorFlow:Learn TensorFlow Series:

πŸ“‘ Daftar Isi β€” Page 3

πŸ“‘ Table of Contents β€” Page 3

  1. Review CNN dari Seri NN β€” Convolution, pooling, dan transisi ke TensorFlow
  2. Conv2D Deep Dive β€” Parameters, output shape, dan kenapa CNN efisien
  3. MaxPooling2D & GlobalAveragePooling2D β€” Dua strategi downsampling
  4. CNN Pertama: CIFAR-10 β€” Classifier 10 kelas dari nol, ~88% akurasi
  5. Data Augmentation β€” Keras Layers vs tf.image, +4% boost gratis
  6. Transfer Learning Phase 1 β€” Frozen backbone + custom head
  7. Transfer Learning Phase 2 β€” Fine-tuning: unfreeze top layers
  8. Custom Dataset β€” image_dataset_from_directory + pipeline optimization
  9. Visualisasi CNN β€” Feature maps, filters, dan Grad-CAM
  10. Proyek: Production Image Classifier 95%+ β€” Template end-to-end
  11. Ringkasan & Preview Page 4
  1. CNN Review from NN Series β€” Convolution, pooling, and transition to TensorFlow
  2. Conv2D Deep Dive β€” Parameters, output shape, and why CNN is efficient
  3. MaxPooling2D & GlobalAveragePooling2D β€” Two downsampling strategies
  4. First CNN: CIFAR-10 β€” 10-class classifier from scratch, ~88% accuracy
  5. Data Augmentation β€” Keras Layers vs tf.image, free +4% boost
  6. Transfer Learning Phase 1 β€” Frozen backbone + custom head
  7. Transfer Learning Phase 2 β€” Fine-tuning: unfreeze top layers
  8. Custom Dataset β€” image_dataset_from_directory + pipeline optimization
  9. CNN Visualization β€” Feature maps, filters, and Grad-CAM
  10. Project: Production Image Classifier 95%+ β€” End-to-end template
  11. Summary & Page 4 Preview
πŸ”„

1. Review CNN β€” Dari Seri Neural Network ke TensorFlow

1. CNN Review β€” From Neural Network Series to TensorFlow

Di seri NN Page 3, kita buat CNN manual 200+ baris NumPy. Sekarang: 3 baris Keras.
In NN series Page 3, we built CNN manually in 200+ lines of NumPy. Now: 3 lines of Keras.

Di seri Neural Network Page 3, kita mengimplementasi convolution (nested for-loop), padding (np.pad), stride, dan max pooling dari nol menggunakan NumPy β€” ratusan baris kode yang menyakitkan untuk di-debug. Sekarang dengan TensorFlow Keras, seluruh operasi itu menjadi satu baris: layers.Conv2D(32, (3,3), padding='same'). Namun, konsep yang kita pelajari di seri NN tetap fundamental β€” tanpa memahami convolution, Anda hanya pengguna API tanpa pemahaman.

In Neural Network series Page 3, we implemented convolution (nested for-loops), padding (np.pad), stride, and max pooling from scratch using NumPy β€” hundreds of lines painful to debug. Now with TensorFlow Keras, all those operations become one line: layers.Conv2D(32, (3,3), padding='same'). However, the concepts we learned in the NN series remain fundamental β€” without understanding convolution, you're just an API user without comprehension.

CNN Architecture β€” Konsep Sama, Syntax Keras / Same Concept, Keras Syntax Input Image Feature Extraction Classification (32Γ—32Γ—3) Conv β†’ BN β†’ ReLU β†’ Conv Flatten/GAP β†’ BN β†’ ReLU β†’ MaxPool Dense(256) RGB pixels β†’ Dropout Dropout(0.5) (repeat 2-4 blocks, Dense(10) filters: 32β†’64β†’128) Softmax ↓ [0.01, 0.02, 0.87, ...] β†’ Predicted class: bird NN Series (manual): for i in range(H): for j in range(W): ... (200 lines) TensorFlow (Keras): layers.Conv2D(32, (3,3), padding='same') (1 line) Key Insight: CNN = Feature Extractor (conv layers) + Classifier (dense layers) Transfer Learning = keep extractor, swap classifier!

Berikut rekap singkat konsep CNN yang sudah kita bahas di seri Neural Network:

Here's a quick recap of CNN concepts we covered in the Neural Network series:

KonsepApa ItuNN SeriesTensorFlow
ConvolutionFilter 3Γ—3 geser di gambarNested for-loopConv2D(32, (3,3))
PaddingTambah pixel di tepinp.pad()padding='same'
StrideLompatan filter per stepManual indexstrides=(2,2)
Max PoolingAmbil max dari window 2Γ—2Nested for-loopMaxPooling2D((2,2))
Backprop CNNGradient convolution100+ baris manualOtomatis (GradientTape)
ConceptWhat It IsNN SeriesTensorFlow
Convolution3Γ—3 filter slides over imageNested for-loopConv2D(32, (3,3))
PaddingAdd pixels at edgesnp.pad()padding='same'
StrideFilter jump per stepManual indexingstrides=(2,2)
Max PoolingTake max from 2Γ—2 windowNested for-loopMaxPooling2D((2,2))
CNN BackpropConvolution gradients100+ lines manualAutomatic (GradientTape)
🧱

2. Conv2D Deep Dive β€” Setiap Parameter Dijelaskan

2. Conv2D Deep Dive β€” Every Parameter Explained

Memahami filters, kernel_size, strides, padding, dan output shape
Understanding filters, kernel_size, strides, padding, and output shape
13_conv2d_deep_dive.py β€” Conv2D Parameters Explainedpython
import tensorflow as tf
from tensorflow.keras import layers

# ===========================
# Conv2D β€” SETIAP parameter dijelaskan
# ===========================
conv = layers.Conv2D(
    filters=32,           # Jumlah filter output (= output channels)
                          # Lebih banyak filters = lebih banyak fitur
                          # Pattern: 32 β†’ 64 β†’ 128 β†’ 256

    kernel_size=(3, 3),   # Ukuran filter: 3Γ—3 (paling umum)
                          # Alternatif: (1,1) untuk channel mixing
                          #             (5,5) untuk receptive field besar
                          #             (7,7) hanya di layer pertama

    strides=(1, 1),       # Geser 1 pixel per step (default)
                          # strides=(2,2) β†’ downsample 2Γ— (replacement MaxPool)

    padding='same',        # 'same': output size = input size (add padding)
                          # 'valid': no padding β†’ output shrinks
                          # Formula: output = (input - kernel + 2*pad) / stride + 1

    activation='relu',     # Activation function (or None + separate layer)

    use_bias=True,         # Add bias per filter (default True)

    kernel_initializer='glorot_uniform',  # He init better for ReLU:
    # kernel_initializer='he_normal',     # recommended with ReLU

    input_shape=(32, 32, 3)  # H, W, C β€” ONLY in first layer!
)

# ===========================
# Output Shape Calculation
# ===========================
# Input:  (batch, 32, 32, 3)    β†’ 32Γ—32 RGB image
# Conv2D(32, (3,3), padding='same', strides=(1,1))
# Output: (batch, 32, 32, 32)   β†’ 32 feature maps, same spatial size

# Conv2D(64, (3,3), padding='valid', strides=(1,1))
# Output: (batch, 30, 30, 64)   β†’ shrinks by (kernel-1) = 2 per side

# Conv2D(64, (3,3), padding='same', strides=(2,2))
# Output: (batch, 16, 16, 64)   β†’ halved spatial size (like MaxPool!)

# ===========================
# Parameter Count β€” WHY CNN IS EFFICIENT
# ===========================
# Dense layer: 32Γ—32Γ—3 β†’ 128 neurons
# Params = 3072 Γ— 128 + 128 = 393,344 ← HUGE!

# Conv2D: 32 filters, 3Γ—3, 3 input channels
# Params = (3 Γ— 3 Γ— 3 + 1) Γ— 32 = 896 ← TINY!

# That's 438Γ— fewer parameters!
# Secret: WEIGHT SHARING β€” same filter applied everywhere

model_demo = tf.keras.Sequential([
    layers.Conv2D(32, (3,3), padding='same', input_shape=(32,32,3)),
    layers.Conv2D(64, (3,3), padding='same'),
    layers.Conv2D(128, (3,3), padding='same'),
])
model_demo.summary()
# Layer 1: (3Γ—3Γ—3+1)Γ—32   =     896 params
# Layer 2: (3Γ—3Γ—32+1)Γ—64  =  18,496 params
# Layer 3: (3Γ—3Γ—64+1)Γ—128 =  73,856 params
# Total: 93,248 β€” compare Dense equivalent: MILLIONS

πŸŽ“ Kenapa 3Γ—3 Filter Dominan?
Dua layer Conv2D 3Γ—3 memiliki receptive field efektif = 5Γ—5, tapi dengan lebih sedikit parameter dan lebih banyak non-linearity (2Γ— ReLU vs 1Γ—). Tiga layer 3Γ—3 = receptive field 7Γ—7 tapi jauh lebih efisien. Ini adalah insight dari paper VGGNet (2014) yang masih berlaku sampai sekarang.

Rule of thumb: Selalu gunakan 3Γ—3. Gunakan 1Γ—1 untuk channel mixing. Gunakan 5Γ—5 atau 7Γ—7 hanya di layer pertama (jika gambar besar).

πŸŽ“ Why 3Γ—3 Filters Dominate?
Two 3Γ—3 Conv2D layers have an effective receptive field of 5Γ—5, but with fewer parameters and more non-linearity (2Γ— ReLU vs 1Γ—). Three 3Γ—3 layers = 7Γ—7 receptive field but far more efficient. This is the insight from the VGGNet paper (2014) that still holds today.

Rule of thumb: Always use 3Γ—3. Use 1Γ—1 for channel mixing. Use 5Γ—5 or 7Γ—7 only in the first layer (if images are large).

πŸ“

3. MaxPooling2D & GlobalAveragePooling2D

3. MaxPooling2D & GlobalAveragePooling2D

Dua strategi downsampling: klasik vs modern
Two downsampling strategies: classic vs modern
14_pooling_layers.py β€” Pooling Strategiespython
from tensorflow.keras import layers
import tensorflow as tf

# ===========================
# 1. MaxPooling2D β€” classic downsampling
# ===========================
pool = layers.MaxPooling2D(
    pool_size=(2, 2),     # 2Γ—2 window
    strides=(2, 2),       # non-overlapping (default = pool_size)
    padding='valid'        # no padding (default)
)
# Input:  (batch, 32, 32, 64)
# Output: (batch, 16, 16, 64)  β†’ spatial halved, channels same
# Params: 0 (no learnable parameters!)
# Takes MAX value from each 2Γ—2 window β†’ keeps strongest activation

# Contoh visual:
# [[1, 3],    MaxPool 2Γ—2
#  [2, 4]] β†’ output = 4 (maximum)

# ===========================
# 2. AveragePooling2D β€” alternative
# ===========================
avg_pool = layers.AveragePooling2D((2, 2))
# Same as MaxPool but takes AVERAGE instead of MAX
# Less commonly used β€” MaxPool usually works better

# ===========================
# 3. GlobalAveragePooling2D β€” MODERN replacement for Flatten
# ===========================
gap = layers.GlobalAveragePooling2D()
# Input:  (batch, 8, 8, 128)  β†’ 128 feature maps, each 8Γ—8
# Output: (batch, 128)         β†’ average each map to ONE number
# Params: 0

# Compare with Flatten + Dense:
# Flatten: (8,8,128) β†’ (8192,)  β†’ Dense(256): 8192Γ—256 = 2,097,408 params!
# GAP:     (8,8,128) β†’ (128,)   β†’ Dense(256): 128Γ—256  = 32,896 params!
# That's 63Γ— fewer parameters β†’ less overfitting, faster training!

# ===========================
# 4. Modern pattern: Strided Conv instead of MaxPool
# ===========================
# Instead of: Conv β†’ MaxPool
# Modern:     Conv(strides=2)  ← learnable downsampling!
strided = layers.Conv2D(64, (3,3), strides=(2,2), padding='same')
# Output halved, but the downsampling is LEARNED (has params)
# Used in ResNet, EfficientNet, modern architectures

πŸ’‘ Flatten vs GlobalAveragePooling2D:
Flatten: bagus untuk gambar kecil (28Γ—28 MNIST). Tapi untuk gambar besar (224Γ—224), jumlah parameter classifier meledak.
GAP: jauh lebih efisien, lebih sedikit parameter, acts as regularization. Selalu gunakan GAP untuk gambar > 64Γ—64. Semua model modern (ResNet, EfficientNet, MobileNet) menggunakan GAP.

πŸ’‘ Flatten vs GlobalAveragePooling2D:
Flatten: fine for small images (28Γ—28 MNIST). But for large images (224Γ—224), classifier parameter count explodes.
GAP: far more efficient, fewer parameters, acts as regularization. Always use GAP for images > 64Γ—64. All modern models (ResNet, EfficientNet, MobileNet) use GAP.

πŸ–ΌοΈ

4. CNN Pertama: CIFAR-10 Classifier dari Nol

4. First CNN: CIFAR-10 Classifier from Scratch

50,000 gambar training, 10 kategori, 32Γ—32 piksel β€” target: 88%+ akurasi
50,000 training images, 10 categories, 32Γ—32 pixels β€” target: 88%+ accuracy

CIFAR-10 adalah dataset benchmark computer vision: 60,000 gambar berwarna 32Γ—32 piksel dibagi dalam 10 kategori (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). Ini jauh lebih menantang dari MNIST karena gambar berwarna, resolusi rendah, dan variasi tinggi.

CIFAR-10 is a benchmark computer vision dataset: 60,000 color images at 32Γ—32 pixels divided into 10 categories (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). This is much more challenging than MNIST because images are colored, low-resolution, and highly varied.

15_cifar10_full.py β€” Complete CIFAR-10 CNN πŸ”₯python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

# ===========================
# 1. LOAD & PREPROCESS
# ===========================
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
X_train = X_train.astype('float32') / 255.0  # normalize [0,1]
X_test = X_test.astype('float32') / 255.0
print(f"Train: {X_train.shape}, Test: {X_test.shape}")
# Train: (50000, 32, 32, 3), Test: (10000, 32, 32, 3)

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

# ===========================
# 2. BUILD CNN β€” 3 Conv Blocks
# Pattern: Conv β†’ BN β†’ ReLU β†’ Conv β†’ BN β†’ ReLU β†’ MaxPool β†’ Dropout
# ===========================
model = keras.Sequential([
    # ── Block 1: 32 filters ──
    layers.Conv2D(32, (3,3), padding='same', kernel_initializer='he_normal',
                  input_shape=(32,32,3)),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Conv2D(32, (3,3), padding='same', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.MaxPooling2D((2,2)),      # 32Γ—32 β†’ 16Γ—16
    layers.Dropout(0.25),

    # ── Block 2: 64 filters ──
    layers.Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.MaxPooling2D((2,2)),      # 16Γ—16 β†’ 8Γ—8
    layers.Dropout(0.25),

    # ── Block 3: 128 filters ──
    layers.Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Activation('relu'),
    layers.MaxPooling2D((2,2)),      # 8Γ—8 β†’ 4Γ—4
    layers.Dropout(0.25),

    # ── Classifier Head ──
    layers.GlobalAveragePooling2D(), # (4,4,128) β†’ (128)
    layers.Dense(256, activation='relu', kernel_initializer='he_normal'),
    layers.BatchNormalization(),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])

model.summary()
# Total params: ~430k (bandingkan: VGG16 = 138M, ResNet50 = 25.6M)

# ===========================
# 3. COMPILE
# ===========================
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# ===========================
# 4. TRAIN with callbacks
# ===========================
callbacks = [
    keras.callbacks.EarlyStopping(
        monitor='val_loss', patience=10, restore_best_weights=True),
    keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6),
]

history = model.fit(
    X_train, y_train,
    epochs=100,              # EarlyStopping will stop early
    batch_size=64,
    validation_split=0.1,   # 10% for validation
    callbacks=callbacks,
    verbose=1
)

# ===========================
# 5. EVALUATE
# ===========================
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"\n🎯 Test Accuracy: {test_acc:.1%}")
# 🎯 Test Accuracy: 88.3%
# With augmentation (Section 5): ~92%
# With transfer learning (Section 6): ~95%+

# ===========================
# 6. Per-class accuracy
# ===========================
y_pred = np.argmax(model.predict(X_test), axis=1)
for i, name in enumerate(class_names):
    mask = y_test.flatten() == i
    acc = (y_pred[mask] == i).mean()
    print(f"  {name:12s}: {acc:.1%}")
# airplane    : 90.2%
# cat         : 76.1%  ← hardest! (looks like dog/deer)
# truck       : 93.4%  ← easiest

πŸŽ“ Kenapa Pattern Conv β†’ BN β†’ ReLU?
Conv2D: extract features (weighted sum).
BatchNorm: normalize activations β†’ training lebih stabil, LR bisa lebih tinggi.
ReLU: non-linearity β†’ tanpa ini, seluruh network hanyalah linear transformation.
Kenapa BN sebelum ReLU? Masih diperdebatkan. Beberapa paper bilang BN setelah ReLU lebih baik. Dalam praktik, perbedaannya kecil β€” yang penting konsisten.

πŸŽ“ Why the Conv β†’ BN β†’ ReLU Pattern?
Conv2D: extract features (weighted sum).
BatchNorm: normalize activations β†’ more stable training, can use higher LR.
ReLU: non-linearity β†’ without this, the entire network is just a linear transformation.
Why BN before ReLU? Still debated. Some papers say BN after ReLU is better. In practice, the difference is small β€” consistency matters more.

πŸ”„

5. Data Augmentation β€” +4% Boost Gratis

5. Data Augmentation β€” Free +4% Boost

Buat model "melihat" variasi baru dari data yang sama β€” tanpa data tambahan
Make the model "see" new variations of the same data β€” no additional data needed

Data augmentation = transformasi random pada gambar training: flip, rotate, zoom, crop, brightness change. Ini memaksa model untuk belajar pola umum (kucing dari sudut manapun = kucing) bukan menghafal gambar spesifik. Efeknya: regularization kuat yang meningkatkan generalisasi.

Data augmentation = random transformations on training images: flip, rotate, zoom, crop, brightness changes. This forces the model to learn general patterns (a cat from any angle = cat) instead of memorizing specific images. Effect: strong regularization that improves generalization.

16_augmentation_complete.py β€” Semua Metode Augmentationpython
import tensorflow as tf
from tensorflow.keras import layers

# ===========================
# METODE 1: Keras Preprocessing Layers (RECOMMENDED)
# Augmentation INSIDE the model β€” portable & clean!
# ===========================
data_augmentation = tf.keras.Sequential([
    layers.RandomFlip("horizontal"),            # flip kiri-kanan (50% chance)
    # layers.RandomFlip("horizontal_and_vertical"),  # juga atas-bawah
    layers.RandomRotation(0.1),                # rotate Β±10% (Β±36Β°)
    layers.RandomZoom(0.1),                    # zoom Β±10%
    layers.RandomTranslation(0.1, 0.1),        # shift Β±10% H and W
    layers.RandomContrast(0.1),                # contrast Β±10%
    layers.RandomBrightness(0.1),              # brightness Β±10%
    # layers.RandomCrop(28, 28),                # random crop (if larger)
], name="augmentation")

# Integrate into model (BEST approach)
model_with_aug = tf.keras.Sequential([
    data_augmentation,                          # ← augmentation first!
    layers.Conv2D(32, 3, padding='same', activation='relu',
                  input_shape=(32,32,3)),
    layers.BatchNormalization(),
    layers.MaxPooling2D(),
    # ... rest of CNN
])
# IMPORTANT: Augmentation ONLY runs during training=True!
# During model.predict() and model.evaluate(): skipped automatically.

# ===========================
# METODE 2: tf.image (in tf.data pipeline)
# More control, custom augmentations
# ===========================
def augment_image(image, label):
    # Random flip
    image = tf.image.random_flip_left_right(image)

    # Random brightness & contrast
    image = tf.image.random_brightness(image, max_delta=0.1)
    image = tf.image.random_contrast(image, lower=0.9, upper=1.1)
    image = tf.image.random_saturation(image, lower=0.9, upper=1.1)
    image = tf.image.random_hue(image, max_delta=0.05)

    # Random crop (resize slightly larger, then crop back)
    image = tf.image.resize(image, [36, 36])
    image = tf.image.random_crop(image, size=[32, 32, 3])

    # Clip to valid range
    image = tf.clip_by_value(image, 0.0, 1.0)
    return image, label

# Apply in tf.data pipeline:
# train_ds = (dataset
#     .map(augment_image, num_parallel_calls=AUTOTUNE)
#     .batch(64).prefetch(AUTOTUNE))

# ===========================
# METODE 3: CutMix & MixUp (advanced)
# State-of-the-art augmentation techniques
# ===========================
def cutmix(images, labels, alpha=1.0):
    """CutMix: cut rectangle from one image, paste into another"""
    batch_size = tf.shape(images)[0]
    indices = tf.random.shuffle(tf.range(batch_size))
    shuffled_images = tf.gather(images, indices)
    shuffled_labels = tf.gather(labels, indices)

    lam = tf.random.uniform([], 0, alpha)
    # ... (calculate random box, blend images and labels)
    return mixed_images, mixed_labels

# CutMix + MixUp can add another 1-2% accuracy boost!

πŸŽ‰ Impact Augmentation di CIFAR-10:
Tanpa augmentation: ~88% akurasi.
Dengan basic augmentation (flip, rotate, zoom): ~92% akurasi.
Itu +4% boost GRATIS tanpa mengubah arsitektur model! Data augmentation adalah teknik regularization paling efektif dan paling mudah untuk computer vision. Selalu gunakan.

πŸŽ‰ Augmentation Impact on CIFAR-10:
Without augmentation: ~88% accuracy.
With basic augmentation (flip, rotate, zoom): ~92% accuracy.
That's a FREE +4% boost without changing the model architecture! Data augmentation is the most effective and easiest regularization technique for computer vision. Always use it.

🧊

6. Transfer Learning Phase 1 β€” Frozen Backbone

6. Transfer Learning Phase 1 β€” Frozen Backbone

Manfaatkan model yang sudah belajar dari 14 juta gambar ImageNet
Leverage models that already learned from 14 million ImageNet images

Transfer learning = mengambil model yang sudah dilatih pada dataset besar (ImageNet: 14 juta gambar, 1000 kelas) dan menyesuaikannya untuk tugas Anda. Ini adalah teknik paling powerful di computer vision β€” bahkan dengan sedikit data (100 gambar per kelas!), Anda bisa mendapat akurasi yang sangat tinggi.

Transfer learning = taking a model already trained on a large dataset (ImageNet: 14 million images, 1000 classes) and adapting it for your task. This is the most powerful technique in computer vision β€” even with very little data (100 images per class!), you can achieve very high accuracy.

Phase 1: Freeze backbone (jangan update weight-nya), hanya train classifier head baru. Ini cepat dan efektif karena fitur ImageNet sudah sangat bagus untuk kebanyakan tugas vision.

Phase 1: Freeze the backbone (don't update its weights), only train the new classifier head. This is fast and effective because ImageNet features are already great for most vision tasks.

17_transfer_phase1.py β€” Frozen Backbone + Custom Headpython
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# ===========================
# 1. Load pre-trained backbone (TANPA classifier head)
# ===========================
base_model = keras.applications.MobileNetV2(
    input_shape=(96, 96, 3),   # MobileNet needs >= 32Γ—32
    include_top=False,          # REMOVE ImageNet classifier (1000 classes)
    weights='imagenet'           # load pre-trained weights!
)

# FREEZE all layers β€” don't update weights!
base_model.trainable = False

print(f"Backbone: {base_model.name}")
print(f"Params: {base_model.count_params():,} (ALL frozen)")
print(f"Output shape: {base_model.output_shape}")
# Backbone: mobilenetv2_1.00_96
# Params: 2,257,984 (ALL frozen)
# Output shape: (None, 3, 3, 1280) β€” feature maps

# ===========================
# 2. Add preprocessing + custom classifier
# ===========================
model = keras.Sequential([
    # Preprocessing
    layers.Resizing(96, 96),                     # resize 32β†’96
    layers.Rescaling(1./127.5, offset=-1),       # normalize to [-1, 1]
    # Each model has its own preprocessing!
    # MobileNet: [-1, 1]
    # EfficientNet: [0, 255] (built-in preprocessing)
    # ResNet: caffe-style (BGR, mean subtraction)

    # Frozen feature extractor
    base_model,

    # New classifier head (THIS is what we train)
    layers.GlobalAveragePooling2D(),          # (3,3,1280) β†’ (1280)
    layers.Dropout(0.3),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')     # 10 CIFAR classes
])

# ===========================
# 3. Compile with HIGHER learning rate
# (backbone frozen β†’ only head learns β†’ LR can be aggressive)
# ===========================
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-3),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# ===========================
# 4. Train Phase 1
# ===========================
history_p1 = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=64,
    validation_split=0.1
)
# β‰ˆ90% accuracy in just 10 epochs! Backbone features are powerful.

print(f"Phase 1 Val Accuracy: {history_p1.history['val_accuracy'][-1]:.1%}")
Model Pre-TrainedParamsTop-1 ImageNetUkuran FileBest For
MobileNetV23.4M71.3%14 MBMobile, edge, prototyping
MobileNetV3Large5.4M75.6%22 MBBetter MobileNet
EfficientNetB05.3M77.1%29 MBBest accuracy/size ratio ⭐
EfficientNetB312.2M81.6%48 MBHigher accuracy, still fast
ResNet5025.6M76.0%98 MBClassic, well-studied
ResNet15260.2M78.3%232 MBHigher accuracy ResNet
EfficientNetB766.3M84.3%256 MBMaximum accuracy (large)
ConvNeXtTiny28.6M82.1%110 MBModern CNN (2022+)
Pre-Trained ModelParamsTop-1 ImageNetFile SizeBest For
MobileNetV23.4M71.3%14 MBMobile, edge, prototyping
MobileNetV3Large5.4M75.6%22 MBBetter MobileNet
EfficientNetB05.3M77.1%29 MBBest accuracy/size ratio ⭐
EfficientNetB312.2M81.6%48 MBHigher accuracy, still fast
ResNet5025.6M76.0%98 MBClassic, well-studied
ResNet15260.2M78.3%232 MBHigher accuracy ResNet
EfficientNetB766.3M84.3%256 MBMaximum accuracy (large)
ConvNeXtTiny28.6M82.1%110 MBModern CNN (2022+)
πŸ”₯

7. Transfer Learning Phase 2 β€” Fine-Tuning

7. Transfer Learning Phase 2 β€” Fine-Tuning

Unfreeze top layers backbone, train dengan LR sangat rendah β†’ +5% boost
Unfreeze top backbone layers, train with very low LR β†’ +5% boost
18_transfer_phase2.py β€” Fine-Tuning Top Layerspython
# ===========================
# Phase 2: Unfreeze top layers of backbone
# ===========================

# Unfreeze backbone
base_model.trainable = True

# But freeze all EXCEPT the last 30 layers
print(f"Total layers: {len(base_model.layers)}")  # ~155 layers
for layer in base_model.layers[:-30]:
    layer.trainable = False

# Count trainable vs frozen
trainable = sum(1 for l in base_model.layers if l.trainable)
frozen = sum(1 for l in base_model.layers if not l.trainable)
print(f"Trainable: {trainable}, Frozen: {frozen}")

# CRITICAL: Recompile with MUCH LOWER learning rate!
# If LR is too high β†’ destroys pre-trained weights
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=1e-5),  # 100Γ— lower!
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train Phase 2
history_p2 = model.fit(
    X_train, y_train,
    epochs=20,
    batch_size=64,
    validation_split=0.1,
    callbacks=[
        keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
        keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=3, min_lr=1e-7)
    ]
)
# β‰ˆ95%+ accuracy! Fine-tuning adapts backbone features to your domain.

# ===========================
# Alternative: Gradual Unfreezing (even better for small datasets)
# ===========================
# Step 1: Train head (Phase 1) β€” 10 epochs, LR=1e-3
# Step 2: Unfreeze last 10 layers β€” 5 epochs, LR=1e-5
# Step 3: Unfreeze last 30 layers β€” 5 epochs, LR=5e-6
# Step 4: Unfreeze all β€” 5 epochs, LR=1e-6
# This gives the model time to adapt gradually, less risk of forgetting.

πŸŽ“ Kenapa 2-Phase Training?
Phase 1 (frozen, LR=1e-3): Head baru dimulai dari random weights β†’ butuh LR tinggi untuk belajar cepat. Backbone sudah bagus β†’ tidak perlu diubah dulu.
Phase 2 (fine-tune, LR=1e-5): Head sudah konvergen β†’ sekarang sesuaikan backbone ke domain spesifik Anda. LR rendah agar tidak merusak fitur yang sudah dipelajari dari 14 juta gambar.

Jika langsung fine-tune tanpa Phase 1: Head random + backbone update = chaos. Gradient besar dari random head merusak backbone weights. Selalu train head dulu!

πŸŽ“ Why 2-Phase Training?
Phase 1 (frozen, LR=1e-3): New head starts from random weights β†’ needs high LR to learn fast. Backbone is already good β†’ no need to change it yet.
Phase 2 (fine-tune, LR=1e-5): Head has converged β†’ now adapt backbone to your specific domain. Low LR to not destroy features learned from 14 million images.

If you fine-tune without Phase 1: Random head + backbone updates = chaos. Large gradients from random head destroy backbone weights. Always train the head first!

πŸ“‚

8. Custom Dataset β€” Load Gambar Sendiri

8. Custom Dataset β€” Load Your Own Images

Dari folder lokal langsung ke model β€” semudah satu function call
From local folder straight to model β€” as easy as one function call
19_custom_dataset.py β€” Load & Optimize Custom Imagespython
import tensorflow as tf

# ===========================
# Folder Structure (PENTING!):
# data/
#   train/
#     cats/       ← folder name = class label
#       cat001.jpg
#       cat002.jpg
#     dogs/
#       dog001.jpg
#     birds/
#       bird001.jpg
#   test/
#     cats/
#     dogs/
#     birds/
# ===========================

# Load with automatic train/val split
train_ds = tf.keras.utils.image_dataset_from_directory(
    "data/train",
    image_size=(224, 224),     # auto-resize ALL images to this
    batch_size=32,
    label_mode='int',           # integer labels: 0=birds, 1=cats, 2=dogs
    # label_mode='categorical'  # one-hot: [1,0,0], [0,1,0], [0,0,1]
    shuffle=True,
    seed=42,
    validation_split=0.2,       # 80% train, 20% val
    subset='training'
)

val_ds = tf.keras.utils.image_dataset_from_directory(
    "data/train",
    image_size=(224, 224),
    batch_size=32,
    label_mode='int',
    seed=42,
    validation_split=0.2,
    subset='validation'
)

print(f"Classes: {train_ds.class_names}")  # ['birds', 'cats', 'dogs']
print(f"Num classes: {len(train_ds.class_names)}")

# ===========================
# Optimize pipeline (CRITICAL for performance!)
# See Page 4 for deep dive on tf.data
# ===========================
AUTOTUNE = tf.data.AUTOTUNE

train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
# cache() β†’ store in RAM after first epoch
# prefetch() β†’ prepare next batch while GPU trains current

# Now use exactly like built-in dataset:
# model.fit(train_ds, validation_data=val_ds, epochs=20)

# ===========================
# Handle imbalanced classes
# ===========================
import numpy as np
# Count per class
# cats: 5000, dogs: 5000, birds: 500 ← imbalanced!
class_weights = {0: 1.0, 1: 1.0, 2: 10.0}  # upweight rare class
# model.fit(train_ds, class_weight=class_weights)
πŸ”

9. Visualisasi CNN β€” Feature Maps & Grad-CAM

9. CNN Visualization β€” Feature Maps & Grad-CAM

Apa yang sebenarnya "dilihat" CNN di setiap layer?
What does the CNN actually "see" at each layer?
20_visualize_cnn.py β€” Feature Maps & Grad-CAMpython
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# ===========================
# 1. Extract Feature Maps
# ===========================
# Build a model that outputs intermediate layer activations
conv_outputs = [l.output for l in model.layers if 'conv2d' in l.name]
feature_model = tf.keras.Model(inputs=model.input, outputs=conv_outputs)

# Pass one image through
img = X_test[0:1]  # shape: (1, 32, 32, 3)
all_features = feature_model.predict(img)

# Visualize feature maps from each conv layer
for layer_idx, features in enumerate(all_features[:3]):
    fig, axes = plt.subplots(4, 8, figsize=(16, 8))
    for i, ax in enumerate(axes.flat):
        if i < features.shape[-1]:
            ax.imshow(features[0, :, :, i], cmap='viridis')
        ax.axis('off')
    plt.suptitle(f'Feature Maps β€” Conv Layer {layer_idx+1}')
    plt.tight_layout()
    plt.show()

# ===========================
# 2. Visualize Learned Filters
# ===========================
filters, biases = model.layers[0].get_weights()
print(f"Filter shape: {filters.shape}")  # (3, 3, 3, 32)

# Normalize filters for visualization
f_min, f_max = filters.min(), filters.max()
filters_norm = (filters - f_min) / (f_max - f_min)

fig, axes = plt.subplots(4, 8, figsize=(12, 6))
for i, ax in enumerate(axes.flat):
    ax.imshow(filters_norm[:, :, :, i])  # 3Γ—3Γ—3 RGB filter
    ax.axis('off')
plt.suptitle('Learned 3Γ—3 Filters (Conv Layer 1)')
plt.show()

# ===========================
# 3. Grad-CAM β€” where is the model looking?
# ===========================
def grad_cam(model, image, class_idx, last_conv_layer_name):
    """Generate Grad-CAM heatmap"""
    grad_model = tf.keras.Model(
        model.inputs,
        [model.get_layer(last_conv_layer_name).output, model.output]
    )

    with tf.GradientTape() as tape:
        conv_output, predictions = grad_model(image)
        loss = predictions[:, class_idx]

    grads = tape.gradient(loss, conv_output)
    weights = tf.reduce_mean(grads, axis=(1, 2))  # global avg pool gradients
    cam = tf.reduce_sum(weights[:, tf.newaxis, tf.newaxis, :] * conv_output, axis=-1)
    cam = tf.nn.relu(cam)  # only positive contributions
    cam = cam / (tf.reduce_max(cam) + 1e-8)  # normalize
    return cam.numpy()[0]

# Usage:
# heatmap = grad_cam(model, img, class_idx=3, last_conv_layer_name='conv2d_5')
# plt.imshow(img[0]); plt.imshow(heatmap, alpha=0.4, cmap='jet')
# β†’ Shows WHERE the model focuses to make its prediction!
What CNN "Sees" at Each Depth Layer 1-2 (shallow) Layer 3-4 (middle) Layer 5+ (deep) ───────────────── ───────────────── ───────────────── β”‚ / β€” \ β”‚ edges β”‚ β—― β–³ ☐ β”‚ shapes β”‚ 🐱 🐢 β”‚ objects β”‚ β•‘ ≑ β–‘ β”‚ textures β”‚ πŸ‘ β—‰ β–½ β”‚ parts β”‚ πŸš— ✈ β”‚ categories β”‚ β•² β•± ┃ β”‚ gradients β”‚ 🦷 πŸ‘ƒ β”‚ components β”‚ 🏠 🌲 β”‚ full scenes Low-level features β†’ Mid-level features β†’ High-level features This hierarchy is why Transfer Learning works! Early layers (edges, textures) are UNIVERSAL β€” same for all images. Only deep layers need to be fine-tuned for your specific task.
πŸ†

10. Proyek: Production Image Classifier 95%+

10. Project: Production Image Classifier 95%+

Template end-to-end: data β†’ augmentation β†’ transfer learning β†’ evaluate β†’ save
End-to-end template: data β†’ augmentation β†’ transfer learning β†’ evaluate β†’ save
21_production_classifier.py β€” Complete Template πŸ”₯python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# ═══════════════════════════════════════
# πŸ† PRODUCTION IMAGE CLASSIFIER TEMPLATE
# Ganti folder & NUM_CLASSES β†’ done!
# ═══════════════════════════════════════

# 1. CONFIG
IMG_SIZE = (224, 224)
BATCH_SIZE = 32

# 2. LOAD DATA
train_ds = keras.utils.image_dataset_from_directory(
    "data/train", image_size=IMG_SIZE, batch_size=BATCH_SIZE,
    validation_split=0.2, subset="training", seed=42)
val_ds = keras.utils.image_dataset_from_directory(
    "data/train", image_size=IMG_SIZE, batch_size=BATCH_SIZE,
    validation_split=0.2, subset="validation", seed=42)

NUM_CLASSES = len(train_ds.class_names)
print(f"🏷 Classes: {train_ds.class_names} ({NUM_CLASSES})")

# 3. OPTIMIZE PIPELINE
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().prefetch(AUTOTUNE)
val_ds = val_ds.cache().prefetch(AUTOTUNE)

# 4. AUGMENTATION
augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.15),
    layers.RandomZoom(0.1),
    layers.RandomContrast(0.1),
])

# 5. MODEL: EfficientNetB0 backbone
base = keras.applications.EfficientNetB0(
    input_shape=(*IMG_SIZE, 3), include_top=False, weights="imagenet")
base.trainable = False

model = keras.Sequential([
    augmentation,
    layers.Rescaling(1./255),
    base,
    layers.GlobalAveragePooling2D(),
    layers.Dropout(0.3),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(NUM_CLASSES, activation='softmax')
])

# 6. PHASE 1: Train head
model.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"])

model.fit(train_ds, validation_data=val_ds, epochs=15,
          callbacks=[keras.callbacks.EarlyStopping(patience=5,
                     restore_best_weights=True)])

# 7. PHASE 2: Fine-tune
base.trainable = True
for layer in base.layers[:-20]:
    layer.trainable = False

model.compile(
    optimizer=keras.optimizers.Adam(1e-5),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"])

model.fit(train_ds, validation_data=val_ds, epochs=10,
          callbacks=[
              keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True),
              keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=2)])

# 8. SAVE
model.save("production_classifier.keras")
print("πŸ† Production classifier ready! Typically 95%+ accuracy.")

πŸŽ‰ Template Universal!
Script di atas adalah template yang bisa Anda pakai untuk hampir semua tugas image classification. Prosesnya:
1. Siapkan folder: data/train/class_a/, data/train/class_b/, dst.
2. Ganti path folder dan nama akan otomatis terdeteksi.
3. Jalankan script β†’ 95%+ akurasi dengan 500-1000 gambar per kelas!
Transfer learning + augmentation + 2-phase training = formula winning yang konsisten.

πŸŽ‰ Universal Template!
The script above is a template you can use for almost any image classification task. Process:
1. Prepare folders: data/train/class_a/, data/train/class_b/, etc.
2. Change folder path and class names are auto-detected.
3. Run script β†’ 95%+ accuracy with 500-1000 images per class!
Transfer learning + augmentation + 2-phase training = a consistently winning formula.

πŸ“

11. Ringkasan Page 3

11. Page 3 Summary

Semua yang sudah kita pelajari
Everything we've learned
KonsepApa ItuKode Kunci
Conv2DConvolution layer β€” extract featuresConv2D(32, (3,3), padding='same')
MaxPooling2DDownsampling spatial 2Γ—MaxPooling2D((2,2))
GlobalAvgPool2DFeature maps β†’ 1 number/channelGlobalAveragePooling2D()
BatchNormalizationNormalize activations per batchBatchNormalization()
Data AugmentationVariasi random saat trainingRandomFlip, RandomRotation, RandomZoom
Transfer LearningPakai backbone pre-trainedMobileNetV2(weights='imagenet')
Phase 1Freeze backbone, train head (LR=1e-3)base.trainable = False
Phase 2Unfreeze top layers (LR=1e-5)base.layers[:-30].trainable = False
Custom DatasetLoad dari folder lokalimage_dataset_from_directory()
Feature MapsVisualisasi apa yang CNN lihatModel(inputs, [conv.output])
Grad-CAMDi mana model fokusGradientTape + heatmap
ConceptWhat It IsKey Code
Conv2DConvolution layer β€” extract featuresConv2D(32, (3,3), padding='same')
MaxPooling2D2Γ— spatial downsamplingMaxPooling2D((2,2))
GlobalAvgPool2DFeature maps β†’ 1 number/channelGlobalAveragePooling2D()
BatchNormalizationNormalize activations per batchBatchNormalization()
Data AugmentationRandom training variationsRandomFlip, RandomRotation, RandomZoom
Transfer LearningUse pre-trained backboneMobileNetV2(weights='imagenet')
Phase 1Freeze backbone, train head (LR=1e-3)base.trainable = False
Phase 2Unfreeze top layers (LR=1e-5)base.layers[:-30].trainable = False
Custom DatasetLoad from local folderimage_dataset_from_directory()
Feature MapsVisualize what CNN seesModel(inputs, [conv.output])
Grad-CAMWhere model focusesGradientTape + heatmap
← Page Sebelumnya← Previous Page

Page 2 β€” Keras API & Model Building

πŸ“˜

Coming Next: Page 4 β€” tf.data Pipeline & Performance

Data loading yang lambat = GPU menganggur 50-90% waktu! Page 4 membahas secara mendalam: tf.data.Dataset API, the golden pattern (shuffle β†’ map β†’ batch β†’ prefetch), cache untuk dataset kecil, TFRecord format untuk dataset besar, parallel data loading dengan num_parallel_calls, mixed precision training (float16 untuk 2Γ— speedup), dan profiling dengan TF Profiler. Optimasi training Anda sampai 10Γ— lebih cepat!

πŸ“˜

Coming Next: Page 4 β€” tf.data Pipeline & Performance

Slow data loading = GPU sitting idle 50-90% of the time! Page 4 covers in depth: tf.data.Dataset API, the golden pattern (shuffle β†’ map β†’ batch β†’ prefetch), cache for small datasets, TFRecord format for large datasets, parallel data loading with num_parallel_calls, mixed precision training (float16 for 2Γ— speedup), and profiling with TF Profiler. Optimize your training up to 10Γ— faster!