π Daftar Isi β Page 3
π Table of Contents β Page 3
- Review CNN dari Seri NN β Convolution, pooling, dan transisi ke TensorFlow
- Conv2D Deep Dive β Parameters, output shape, dan kenapa CNN efisien
- MaxPooling2D & GlobalAveragePooling2D β Dua strategi downsampling
- CNN Pertama: CIFAR-10 β Classifier 10 kelas dari nol, ~88% akurasi
- Data Augmentation β Keras Layers vs tf.image, +4% boost gratis
- Transfer Learning Phase 1 β Frozen backbone + custom head
- Transfer Learning Phase 2 β Fine-tuning: unfreeze top layers
- Custom Dataset β image_dataset_from_directory + pipeline optimization
- Visualisasi CNN β Feature maps, filters, dan Grad-CAM
- Proyek: Production Image Classifier 95%+ β Template end-to-end
- Ringkasan & Preview Page 4
- CNN Review from NN Series β Convolution, pooling, and transition to TensorFlow
- Conv2D Deep Dive β Parameters, output shape, and why CNN is efficient
- MaxPooling2D & GlobalAveragePooling2D β Two downsampling strategies
- First CNN: CIFAR-10 β 10-class classifier from scratch, ~88% accuracy
- Data Augmentation β Keras Layers vs tf.image, free +4% boost
- Transfer Learning Phase 1 β Frozen backbone + custom head
- Transfer Learning Phase 2 β Fine-tuning: unfreeze top layers
- Custom Dataset β image_dataset_from_directory + pipeline optimization
- CNN Visualization β Feature maps, filters, and Grad-CAM
- Project: Production Image Classifier 95%+ β End-to-end template
- Summary & Page 4 Preview
1. Review CNN β Dari Seri Neural Network ke TensorFlow
1. CNN Review β From Neural Network Series to TensorFlow
Di seri Neural Network Page 3, kita mengimplementasi convolution (nested for-loop), padding (np.pad), stride, dan max pooling dari nol menggunakan NumPy β ratusan baris kode yang menyakitkan untuk di-debug. Sekarang dengan TensorFlow Keras, seluruh operasi itu menjadi satu baris: layers.Conv2D(32, (3,3), padding='same'). Namun, konsep yang kita pelajari di seri NN tetap fundamental β tanpa memahami convolution, Anda hanya pengguna API tanpa pemahaman.
In Neural Network series Page 3, we implemented convolution (nested for-loops), padding (np.pad), stride, and max pooling from scratch using NumPy β hundreds of lines painful to debug. Now with TensorFlow Keras, all those operations become one line: layers.Conv2D(32, (3,3), padding='same'). However, the concepts we learned in the NN series remain fundamental β without understanding convolution, you're just an API user without comprehension.
Berikut rekap singkat konsep CNN yang sudah kita bahas di seri Neural Network:
Here's a quick recap of CNN concepts we covered in the Neural Network series:
| Konsep | Apa Itu | NN Series | TensorFlow |
|---|---|---|---|
| Convolution | Filter 3Γ3 geser di gambar | Nested for-loop | Conv2D(32, (3,3)) |
| Padding | Tambah pixel di tepi | np.pad() | padding='same' |
| Stride | Lompatan filter per step | Manual index | strides=(2,2) |
| Max Pooling | Ambil max dari window 2Γ2 | Nested for-loop | MaxPooling2D((2,2)) |
| Backprop CNN | Gradient convolution | 100+ baris manual | Otomatis (GradientTape) |
| Concept | What It Is | NN Series | TensorFlow |
|---|---|---|---|
| Convolution | 3Γ3 filter slides over image | Nested for-loop | Conv2D(32, (3,3)) |
| Padding | Add pixels at edges | np.pad() | padding='same' |
| Stride | Filter jump per step | Manual indexing | strides=(2,2) |
| Max Pooling | Take max from 2Γ2 window | Nested for-loop | MaxPooling2D((2,2)) |
| CNN Backprop | Convolution gradients | 100+ lines manual | Automatic (GradientTape) |
2. Conv2D Deep Dive β Setiap Parameter Dijelaskan
2. Conv2D Deep Dive β Every Parameter Explained
import tensorflow as tf from tensorflow.keras import layers # =========================== # Conv2D β SETIAP parameter dijelaskan # =========================== conv = layers.Conv2D( filters=32, # Jumlah filter output (= output channels) # Lebih banyak filters = lebih banyak fitur # Pattern: 32 β 64 β 128 β 256 kernel_size=(3, 3), # Ukuran filter: 3Γ3 (paling umum) # Alternatif: (1,1) untuk channel mixing # (5,5) untuk receptive field besar # (7,7) hanya di layer pertama strides=(1, 1), # Geser 1 pixel per step (default) # strides=(2,2) β downsample 2Γ (replacement MaxPool) padding='same', # 'same': output size = input size (add padding) # 'valid': no padding β output shrinks # Formula: output = (input - kernel + 2*pad) / stride + 1 activation='relu', # Activation function (or None + separate layer) use_bias=True, # Add bias per filter (default True) kernel_initializer='glorot_uniform', # He init better for ReLU: # kernel_initializer='he_normal', # recommended with ReLU input_shape=(32, 32, 3) # H, W, C β ONLY in first layer! ) # =========================== # Output Shape Calculation # =========================== # Input: (batch, 32, 32, 3) β 32Γ32 RGB image # Conv2D(32, (3,3), padding='same', strides=(1,1)) # Output: (batch, 32, 32, 32) β 32 feature maps, same spatial size # Conv2D(64, (3,3), padding='valid', strides=(1,1)) # Output: (batch, 30, 30, 64) β shrinks by (kernel-1) = 2 per side # Conv2D(64, (3,3), padding='same', strides=(2,2)) # Output: (batch, 16, 16, 64) β halved spatial size (like MaxPool!) # =========================== # Parameter Count β WHY CNN IS EFFICIENT # =========================== # Dense layer: 32Γ32Γ3 β 128 neurons # Params = 3072 Γ 128 + 128 = 393,344 β HUGE! # Conv2D: 32 filters, 3Γ3, 3 input channels # Params = (3 Γ 3 Γ 3 + 1) Γ 32 = 896 β TINY! # That's 438Γ fewer parameters! # Secret: WEIGHT SHARING β same filter applied everywhere model_demo = tf.keras.Sequential([ layers.Conv2D(32, (3,3), padding='same', input_shape=(32,32,3)), layers.Conv2D(64, (3,3), padding='same'), layers.Conv2D(128, (3,3), padding='same'), ]) model_demo.summary() # Layer 1: (3Γ3Γ3+1)Γ32 = 896 params # Layer 2: (3Γ3Γ32+1)Γ64 = 18,496 params # Layer 3: (3Γ3Γ64+1)Γ128 = 73,856 params # Total: 93,248 β compare Dense equivalent: MILLIONS
π Kenapa 3Γ3 Filter Dominan?
Dua layer Conv2D 3Γ3 memiliki receptive field efektif = 5Γ5, tapi dengan lebih sedikit parameter dan lebih banyak non-linearity (2Γ ReLU vs 1Γ). Tiga layer 3Γ3 = receptive field 7Γ7 tapi jauh lebih efisien. Ini adalah insight dari paper VGGNet (2014) yang masih berlaku sampai sekarang.
Rule of thumb: Selalu gunakan 3Γ3. Gunakan 1Γ1 untuk channel mixing. Gunakan 5Γ5 atau 7Γ7 hanya di layer pertama (jika gambar besar).
π Why 3Γ3 Filters Dominate?
Two 3Γ3 Conv2D layers have an effective receptive field of 5Γ5, but with fewer parameters and more non-linearity (2Γ ReLU vs 1Γ). Three 3Γ3 layers = 7Γ7 receptive field but far more efficient. This is the insight from the VGGNet paper (2014) that still holds today.
Rule of thumb: Always use 3Γ3. Use 1Γ1 for channel mixing. Use 5Γ5 or 7Γ7 only in the first layer (if images are large).
3. MaxPooling2D & GlobalAveragePooling2D
3. MaxPooling2D & GlobalAveragePooling2D
from tensorflow.keras import layers import tensorflow as tf # =========================== # 1. MaxPooling2D β classic downsampling # =========================== pool = layers.MaxPooling2D( pool_size=(2, 2), # 2Γ2 window strides=(2, 2), # non-overlapping (default = pool_size) padding='valid' # no padding (default) ) # Input: (batch, 32, 32, 64) # Output: (batch, 16, 16, 64) β spatial halved, channels same # Params: 0 (no learnable parameters!) # Takes MAX value from each 2Γ2 window β keeps strongest activation # Contoh visual: # [[1, 3], MaxPool 2Γ2 # [2, 4]] β output = 4 (maximum) # =========================== # 2. AveragePooling2D β alternative # =========================== avg_pool = layers.AveragePooling2D((2, 2)) # Same as MaxPool but takes AVERAGE instead of MAX # Less commonly used β MaxPool usually works better # =========================== # 3. GlobalAveragePooling2D β MODERN replacement for Flatten # =========================== gap = layers.GlobalAveragePooling2D() # Input: (batch, 8, 8, 128) β 128 feature maps, each 8Γ8 # Output: (batch, 128) β average each map to ONE number # Params: 0 # Compare with Flatten + Dense: # Flatten: (8,8,128) β (8192,) β Dense(256): 8192Γ256 = 2,097,408 params! # GAP: (8,8,128) β (128,) β Dense(256): 128Γ256 = 32,896 params! # That's 63Γ fewer parameters β less overfitting, faster training! # =========================== # 4. Modern pattern: Strided Conv instead of MaxPool # =========================== # Instead of: Conv β MaxPool # Modern: Conv(strides=2) β learnable downsampling! strided = layers.Conv2D(64, (3,3), strides=(2,2), padding='same') # Output halved, but the downsampling is LEARNED (has params) # Used in ResNet, EfficientNet, modern architectures
π‘ Flatten vs GlobalAveragePooling2D:
Flatten: bagus untuk gambar kecil (28Γ28 MNIST). Tapi untuk gambar besar (224Γ224), jumlah parameter classifier meledak.
GAP: jauh lebih efisien, lebih sedikit parameter, acts as regularization. Selalu gunakan GAP untuk gambar > 64Γ64. Semua model modern (ResNet, EfficientNet, MobileNet) menggunakan GAP.
π‘ Flatten vs GlobalAveragePooling2D:
Flatten: fine for small images (28Γ28 MNIST). But for large images (224Γ224), classifier parameter count explodes.
GAP: far more efficient, fewer parameters, acts as regularization. Always use GAP for images > 64Γ64. All modern models (ResNet, EfficientNet, MobileNet) use GAP.
4. CNN Pertama: CIFAR-10 Classifier dari Nol
4. First CNN: CIFAR-10 Classifier from Scratch
CIFAR-10 adalah dataset benchmark computer vision: 60,000 gambar berwarna 32Γ32 piksel dibagi dalam 10 kategori (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). Ini jauh lebih menantang dari MNIST karena gambar berwarna, resolusi rendah, dan variasi tinggi.
CIFAR-10 is a benchmark computer vision dataset: 60,000 color images at 32Γ32 pixels divided into 10 categories (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). This is much more challenging than MNIST because images are colored, low-resolution, and highly varied.
import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers import numpy as np # =========================== # 1. LOAD & PREPROCESS # =========================== (X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data() X_train = X_train.astype('float32') / 255.0 # normalize [0,1] X_test = X_test.astype('float32') / 255.0 print(f"Train: {X_train.shape}, Test: {X_test.shape}") # Train: (50000, 32, 32, 3), Test: (10000, 32, 32, 3) class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] # =========================== # 2. BUILD CNN β 3 Conv Blocks # Pattern: Conv β BN β ReLU β Conv β BN β ReLU β MaxPool β Dropout # =========================== model = keras.Sequential([ # ββ Block 1: 32 filters ββ layers.Conv2D(32, (3,3), padding='same', kernel_initializer='he_normal', input_shape=(32,32,3)), layers.BatchNormalization(), layers.Activation('relu'), layers.Conv2D(32, (3,3), padding='same', kernel_initializer='he_normal'), layers.BatchNormalization(), layers.Activation('relu'), layers.MaxPooling2D((2,2)), # 32Γ32 β 16Γ16 layers.Dropout(0.25), # ββ Block 2: 64 filters ββ layers.Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal'), layers.BatchNormalization(), layers.Activation('relu'), layers.Conv2D(64, (3,3), padding='same', kernel_initializer='he_normal'), layers.BatchNormalization(), layers.Activation('relu'), layers.MaxPooling2D((2,2)), # 16Γ16 β 8Γ8 layers.Dropout(0.25), # ββ Block 3: 128 filters ββ layers.Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal'), layers.BatchNormalization(), layers.Activation('relu'), layers.Conv2D(128, (3,3), padding='same', kernel_initializer='he_normal'), layers.BatchNormalization(), layers.Activation('relu'), layers.MaxPooling2D((2,2)), # 8Γ8 β 4Γ4 layers.Dropout(0.25), # ββ Classifier Head ββ layers.GlobalAveragePooling2D(), # (4,4,128) β (128) layers.Dense(256, activation='relu', kernel_initializer='he_normal'), layers.BatchNormalization(), layers.Dropout(0.5), layers.Dense(10, activation='softmax') ]) model.summary() # Total params: ~430k (bandingkan: VGG16 = 138M, ResNet50 = 25.6M) # =========================== # 3. COMPILE # =========================== model.compile( optimizer=keras.optimizers.Adam(learning_rate=1e-3), loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) # =========================== # 4. TRAIN with callbacks # =========================== callbacks = [ keras.callbacks.EarlyStopping( monitor='val_loss', patience=10, restore_best_weights=True), keras.callbacks.ReduceLROnPlateau( monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6), ] history = model.fit( X_train, y_train, epochs=100, # EarlyStopping will stop early batch_size=64, validation_split=0.1, # 10% for validation callbacks=callbacks, verbose=1 ) # =========================== # 5. EVALUATE # =========================== test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0) print(f"\nπ― Test Accuracy: {test_acc:.1%}") # π― Test Accuracy: 88.3% # With augmentation (Section 5): ~92% # With transfer learning (Section 6): ~95%+ # =========================== # 6. Per-class accuracy # =========================== y_pred = np.argmax(model.predict(X_test), axis=1) for i, name in enumerate(class_names): mask = y_test.flatten() == i acc = (y_pred[mask] == i).mean() print(f" {name:12s}: {acc:.1%}") # airplane : 90.2% # cat : 76.1% β hardest! (looks like dog/deer) # truck : 93.4% β easiest
π Kenapa Pattern Conv β BN β ReLU?
Conv2D: extract features (weighted sum).
BatchNorm: normalize activations β training lebih stabil, LR bisa lebih tinggi.
ReLU: non-linearity β tanpa ini, seluruh network hanyalah linear transformation.
Kenapa BN sebelum ReLU? Masih diperdebatkan. Beberapa paper bilang BN setelah ReLU lebih baik. Dalam praktik, perbedaannya kecil β yang penting konsisten.
π Why the Conv β BN β ReLU Pattern?
Conv2D: extract features (weighted sum).
BatchNorm: normalize activations β more stable training, can use higher LR.
ReLU: non-linearity β without this, the entire network is just a linear transformation.
Why BN before ReLU? Still debated. Some papers say BN after ReLU is better. In practice, the difference is small β consistency matters more.
5. Data Augmentation β +4% Boost Gratis
5. Data Augmentation β Free +4% Boost
Data augmentation = transformasi random pada gambar training: flip, rotate, zoom, crop, brightness change. Ini memaksa model untuk belajar pola umum (kucing dari sudut manapun = kucing) bukan menghafal gambar spesifik. Efeknya: regularization kuat yang meningkatkan generalisasi.
Data augmentation = random transformations on training images: flip, rotate, zoom, crop, brightness changes. This forces the model to learn general patterns (a cat from any angle = cat) instead of memorizing specific images. Effect: strong regularization that improves generalization.
import tensorflow as tf from tensorflow.keras import layers # =========================== # METODE 1: Keras Preprocessing Layers (RECOMMENDED) # Augmentation INSIDE the model β portable & clean! # =========================== data_augmentation = tf.keras.Sequential([ layers.RandomFlip("horizontal"), # flip kiri-kanan (50% chance) # layers.RandomFlip("horizontal_and_vertical"), # juga atas-bawah layers.RandomRotation(0.1), # rotate Β±10% (Β±36Β°) layers.RandomZoom(0.1), # zoom Β±10% layers.RandomTranslation(0.1, 0.1), # shift Β±10% H and W layers.RandomContrast(0.1), # contrast Β±10% layers.RandomBrightness(0.1), # brightness Β±10% # layers.RandomCrop(28, 28), # random crop (if larger) ], name="augmentation") # Integrate into model (BEST approach) model_with_aug = tf.keras.Sequential([ data_augmentation, # β augmentation first! layers.Conv2D(32, 3, padding='same', activation='relu', input_shape=(32,32,3)), layers.BatchNormalization(), layers.MaxPooling2D(), # ... rest of CNN ]) # IMPORTANT: Augmentation ONLY runs during training=True! # During model.predict() and model.evaluate(): skipped automatically. # =========================== # METODE 2: tf.image (in tf.data pipeline) # More control, custom augmentations # =========================== def augment_image(image, label): # Random flip image = tf.image.random_flip_left_right(image) # Random brightness & contrast image = tf.image.random_brightness(image, max_delta=0.1) image = tf.image.random_contrast(image, lower=0.9, upper=1.1) image = tf.image.random_saturation(image, lower=0.9, upper=1.1) image = tf.image.random_hue(image, max_delta=0.05) # Random crop (resize slightly larger, then crop back) image = tf.image.resize(image, [36, 36]) image = tf.image.random_crop(image, size=[32, 32, 3]) # Clip to valid range image = tf.clip_by_value(image, 0.0, 1.0) return image, label # Apply in tf.data pipeline: # train_ds = (dataset # .map(augment_image, num_parallel_calls=AUTOTUNE) # .batch(64).prefetch(AUTOTUNE)) # =========================== # METODE 3: CutMix & MixUp (advanced) # State-of-the-art augmentation techniques # =========================== def cutmix(images, labels, alpha=1.0): """CutMix: cut rectangle from one image, paste into another""" batch_size = tf.shape(images)[0] indices = tf.random.shuffle(tf.range(batch_size)) shuffled_images = tf.gather(images, indices) shuffled_labels = tf.gather(labels, indices) lam = tf.random.uniform([], 0, alpha) # ... (calculate random box, blend images and labels) return mixed_images, mixed_labels # CutMix + MixUp can add another 1-2% accuracy boost!
π Impact Augmentation di CIFAR-10:
Tanpa augmentation: ~88% akurasi.
Dengan basic augmentation (flip, rotate, zoom): ~92% akurasi.
Itu +4% boost GRATIS tanpa mengubah arsitektur model! Data augmentation adalah teknik regularization paling efektif dan paling mudah untuk computer vision. Selalu gunakan.
π Augmentation Impact on CIFAR-10:
Without augmentation: ~88% accuracy.
With basic augmentation (flip, rotate, zoom): ~92% accuracy.
That's a FREE +4% boost without changing the model architecture! Data augmentation is the most effective and easiest regularization technique for computer vision. Always use it.
6. Transfer Learning Phase 1 β Frozen Backbone
6. Transfer Learning Phase 1 β Frozen Backbone
Transfer learning = mengambil model yang sudah dilatih pada dataset besar (ImageNet: 14 juta gambar, 1000 kelas) dan menyesuaikannya untuk tugas Anda. Ini adalah teknik paling powerful di computer vision β bahkan dengan sedikit data (100 gambar per kelas!), Anda bisa mendapat akurasi yang sangat tinggi.
Transfer learning = taking a model already trained on a large dataset (ImageNet: 14 million images, 1000 classes) and adapting it for your task. This is the most powerful technique in computer vision β even with very little data (100 images per class!), you can achieve very high accuracy.
Phase 1: Freeze backbone (jangan update weight-nya), hanya train classifier head baru. Ini cepat dan efektif karena fitur ImageNet sudah sangat bagus untuk kebanyakan tugas vision.
Phase 1: Freeze the backbone (don't update its weights), only train the new classifier head. This is fast and effective because ImageNet features are already great for most vision tasks.
import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers # =========================== # 1. Load pre-trained backbone (TANPA classifier head) # =========================== base_model = keras.applications.MobileNetV2( input_shape=(96, 96, 3), # MobileNet needs >= 32Γ32 include_top=False, # REMOVE ImageNet classifier (1000 classes) weights='imagenet' # load pre-trained weights! ) # FREEZE all layers β don't update weights! base_model.trainable = False print(f"Backbone: {base_model.name}") print(f"Params: {base_model.count_params():,} (ALL frozen)") print(f"Output shape: {base_model.output_shape}") # Backbone: mobilenetv2_1.00_96 # Params: 2,257,984 (ALL frozen) # Output shape: (None, 3, 3, 1280) β feature maps # =========================== # 2. Add preprocessing + custom classifier # =========================== model = keras.Sequential([ # Preprocessing layers.Resizing(96, 96), # resize 32β96 layers.Rescaling(1./127.5, offset=-1), # normalize to [-1, 1] # Each model has its own preprocessing! # MobileNet: [-1, 1] # EfficientNet: [0, 255] (built-in preprocessing) # ResNet: caffe-style (BGR, mean subtraction) # Frozen feature extractor base_model, # New classifier head (THIS is what we train) layers.GlobalAveragePooling2D(), # (3,3,1280) β (1280) layers.Dropout(0.3), layers.Dense(128, activation='relu'), layers.Dropout(0.2), layers.Dense(10, activation='softmax') # 10 CIFAR classes ]) # =========================== # 3. Compile with HIGHER learning rate # (backbone frozen β only head learns β LR can be aggressive) # =========================== model.compile( optimizer=keras.optimizers.Adam(learning_rate=1e-3), loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) # =========================== # 4. Train Phase 1 # =========================== history_p1 = model.fit( X_train, y_train, epochs=10, batch_size=64, validation_split=0.1 ) # β90% accuracy in just 10 epochs! Backbone features are powerful. print(f"Phase 1 Val Accuracy: {history_p1.history['val_accuracy'][-1]:.1%}")
| Model Pre-Trained | Params | Top-1 ImageNet | Ukuran File | Best For |
|---|---|---|---|---|
| MobileNetV2 | 3.4M | 71.3% | 14 MB | Mobile, edge, prototyping |
| MobileNetV3Large | 5.4M | 75.6% | 22 MB | Better MobileNet |
| EfficientNetB0 | 5.3M | 77.1% | 29 MB | Best accuracy/size ratio β |
| EfficientNetB3 | 12.2M | 81.6% | 48 MB | Higher accuracy, still fast |
| ResNet50 | 25.6M | 76.0% | 98 MB | Classic, well-studied |
| ResNet152 | 60.2M | 78.3% | 232 MB | Higher accuracy ResNet |
| EfficientNetB7 | 66.3M | 84.3% | 256 MB | Maximum accuracy (large) |
| ConvNeXtTiny | 28.6M | 82.1% | 110 MB | Modern CNN (2022+) |
| Pre-Trained Model | Params | Top-1 ImageNet | File Size | Best For |
|---|---|---|---|---|
| MobileNetV2 | 3.4M | 71.3% | 14 MB | Mobile, edge, prototyping |
| MobileNetV3Large | 5.4M | 75.6% | 22 MB | Better MobileNet |
| EfficientNetB0 | 5.3M | 77.1% | 29 MB | Best accuracy/size ratio β |
| EfficientNetB3 | 12.2M | 81.6% | 48 MB | Higher accuracy, still fast |
| ResNet50 | 25.6M | 76.0% | 98 MB | Classic, well-studied |
| ResNet152 | 60.2M | 78.3% | 232 MB | Higher accuracy ResNet |
| EfficientNetB7 | 66.3M | 84.3% | 256 MB | Maximum accuracy (large) |
| ConvNeXtTiny | 28.6M | 82.1% | 110 MB | Modern CNN (2022+) |
7. Transfer Learning Phase 2 β Fine-Tuning
7. Transfer Learning Phase 2 β Fine-Tuning
# =========================== # Phase 2: Unfreeze top layers of backbone # =========================== # Unfreeze backbone base_model.trainable = True # But freeze all EXCEPT the last 30 layers print(f"Total layers: {len(base_model.layers)}") # ~155 layers for layer in base_model.layers[:-30]: layer.trainable = False # Count trainable vs frozen trainable = sum(1 for l in base_model.layers if l.trainable) frozen = sum(1 for l in base_model.layers if not l.trainable) print(f"Trainable: {trainable}, Frozen: {frozen}") # CRITICAL: Recompile with MUCH LOWER learning rate! # If LR is too high β destroys pre-trained weights model.compile( optimizer=keras.optimizers.Adam(learning_rate=1e-5), # 100Γ lower! loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) # Train Phase 2 history_p2 = model.fit( X_train, y_train, epochs=20, batch_size=64, validation_split=0.1, callbacks=[ keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True), keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=3, min_lr=1e-7) ] ) # β95%+ accuracy! Fine-tuning adapts backbone features to your domain. # =========================== # Alternative: Gradual Unfreezing (even better for small datasets) # =========================== # Step 1: Train head (Phase 1) β 10 epochs, LR=1e-3 # Step 2: Unfreeze last 10 layers β 5 epochs, LR=1e-5 # Step 3: Unfreeze last 30 layers β 5 epochs, LR=5e-6 # Step 4: Unfreeze all β 5 epochs, LR=1e-6 # This gives the model time to adapt gradually, less risk of forgetting.
π Kenapa 2-Phase Training?
Phase 1 (frozen, LR=1e-3): Head baru dimulai dari random weights β butuh LR tinggi untuk belajar cepat. Backbone sudah bagus β tidak perlu diubah dulu.
Phase 2 (fine-tune, LR=1e-5): Head sudah konvergen β sekarang sesuaikan backbone ke domain spesifik Anda. LR rendah agar tidak merusak fitur yang sudah dipelajari dari 14 juta gambar.
Jika langsung fine-tune tanpa Phase 1: Head random + backbone update = chaos. Gradient besar dari random head merusak backbone weights. Selalu train head dulu!
π Why 2-Phase Training?
Phase 1 (frozen, LR=1e-3): New head starts from random weights β needs high LR to learn fast. Backbone is already good β no need to change it yet.
Phase 2 (fine-tune, LR=1e-5): Head has converged β now adapt backbone to your specific domain. Low LR to not destroy features learned from 14 million images.
If you fine-tune without Phase 1: Random head + backbone updates = chaos. Large gradients from random head destroy backbone weights. Always train the head first!
8. Custom Dataset β Load Gambar Sendiri
8. Custom Dataset β Load Your Own Images
import tensorflow as tf # =========================== # Folder Structure (PENTING!): # data/ # train/ # cats/ β folder name = class label # cat001.jpg # cat002.jpg # dogs/ # dog001.jpg # birds/ # bird001.jpg # test/ # cats/ # dogs/ # birds/ # =========================== # Load with automatic train/val split train_ds = tf.keras.utils.image_dataset_from_directory( "data/train", image_size=(224, 224), # auto-resize ALL images to this batch_size=32, label_mode='int', # integer labels: 0=birds, 1=cats, 2=dogs # label_mode='categorical' # one-hot: [1,0,0], [0,1,0], [0,0,1] shuffle=True, seed=42, validation_split=0.2, # 80% train, 20% val subset='training' ) val_ds = tf.keras.utils.image_dataset_from_directory( "data/train", image_size=(224, 224), batch_size=32, label_mode='int', seed=42, validation_split=0.2, subset='validation' ) print(f"Classes: {train_ds.class_names}") # ['birds', 'cats', 'dogs'] print(f"Num classes: {len(train_ds.class_names)}") # =========================== # Optimize pipeline (CRITICAL for performance!) # See Page 4 for deep dive on tf.data # =========================== AUTOTUNE = tf.data.AUTOTUNE train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE) val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE) # cache() β store in RAM after first epoch # prefetch() β prepare next batch while GPU trains current # Now use exactly like built-in dataset: # model.fit(train_ds, validation_data=val_ds, epochs=20) # =========================== # Handle imbalanced classes # =========================== import numpy as np # Count per class # cats: 5000, dogs: 5000, birds: 500 β imbalanced! class_weights = {0: 1.0, 1: 1.0, 2: 10.0} # upweight rare class # model.fit(train_ds, class_weight=class_weights)
9. Visualisasi CNN β Feature Maps & Grad-CAM
9. CNN Visualization β Feature Maps & Grad-CAM
import tensorflow as tf import numpy as np import matplotlib.pyplot as plt # =========================== # 1. Extract Feature Maps # =========================== # Build a model that outputs intermediate layer activations conv_outputs = [l.output for l in model.layers if 'conv2d' in l.name] feature_model = tf.keras.Model(inputs=model.input, outputs=conv_outputs) # Pass one image through img = X_test[0:1] # shape: (1, 32, 32, 3) all_features = feature_model.predict(img) # Visualize feature maps from each conv layer for layer_idx, features in enumerate(all_features[:3]): fig, axes = plt.subplots(4, 8, figsize=(16, 8)) for i, ax in enumerate(axes.flat): if i < features.shape[-1]: ax.imshow(features[0, :, :, i], cmap='viridis') ax.axis('off') plt.suptitle(f'Feature Maps β Conv Layer {layer_idx+1}') plt.tight_layout() plt.show() # =========================== # 2. Visualize Learned Filters # =========================== filters, biases = model.layers[0].get_weights() print(f"Filter shape: {filters.shape}") # (3, 3, 3, 32) # Normalize filters for visualization f_min, f_max = filters.min(), filters.max() filters_norm = (filters - f_min) / (f_max - f_min) fig, axes = plt.subplots(4, 8, figsize=(12, 6)) for i, ax in enumerate(axes.flat): ax.imshow(filters_norm[:, :, :, i]) # 3Γ3Γ3 RGB filter ax.axis('off') plt.suptitle('Learned 3Γ3 Filters (Conv Layer 1)') plt.show() # =========================== # 3. Grad-CAM β where is the model looking? # =========================== def grad_cam(model, image, class_idx, last_conv_layer_name): """Generate Grad-CAM heatmap""" grad_model = tf.keras.Model( model.inputs, [model.get_layer(last_conv_layer_name).output, model.output] ) with tf.GradientTape() as tape: conv_output, predictions = grad_model(image) loss = predictions[:, class_idx] grads = tape.gradient(loss, conv_output) weights = tf.reduce_mean(grads, axis=(1, 2)) # global avg pool gradients cam = tf.reduce_sum(weights[:, tf.newaxis, tf.newaxis, :] * conv_output, axis=-1) cam = tf.nn.relu(cam) # only positive contributions cam = cam / (tf.reduce_max(cam) + 1e-8) # normalize return cam.numpy()[0] # Usage: # heatmap = grad_cam(model, img, class_idx=3, last_conv_layer_name='conv2d_5') # plt.imshow(img[0]); plt.imshow(heatmap, alpha=0.4, cmap='jet') # β Shows WHERE the model focuses to make its prediction!
10. Proyek: Production Image Classifier 95%+
10. Project: Production Image Classifier 95%+
import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers # βββββββββββββββββββββββββββββββββββββββ # π PRODUCTION IMAGE CLASSIFIER TEMPLATE # Ganti folder & NUM_CLASSES β done! # βββββββββββββββββββββββββββββββββββββββ # 1. CONFIG IMG_SIZE = (224, 224) BATCH_SIZE = 32 # 2. LOAD DATA train_ds = keras.utils.image_dataset_from_directory( "data/train", image_size=IMG_SIZE, batch_size=BATCH_SIZE, validation_split=0.2, subset="training", seed=42) val_ds = keras.utils.image_dataset_from_directory( "data/train", image_size=IMG_SIZE, batch_size=BATCH_SIZE, validation_split=0.2, subset="validation", seed=42) NUM_CLASSES = len(train_ds.class_names) print(f"π· Classes: {train_ds.class_names} ({NUM_CLASSES})") # 3. OPTIMIZE PIPELINE AUTOTUNE = tf.data.AUTOTUNE train_ds = train_ds.cache().prefetch(AUTOTUNE) val_ds = val_ds.cache().prefetch(AUTOTUNE) # 4. AUGMENTATION augmentation = keras.Sequential([ layers.RandomFlip("horizontal"), layers.RandomRotation(0.15), layers.RandomZoom(0.1), layers.RandomContrast(0.1), ]) # 5. MODEL: EfficientNetB0 backbone base = keras.applications.EfficientNetB0( input_shape=(*IMG_SIZE, 3), include_top=False, weights="imagenet") base.trainable = False model = keras.Sequential([ augmentation, layers.Rescaling(1./255), base, layers.GlobalAveragePooling2D(), layers.Dropout(0.3), layers.Dense(128, activation='relu'), layers.Dropout(0.2), layers.Dense(NUM_CLASSES, activation='softmax') ]) # 6. PHASE 1: Train head model.compile( optimizer=keras.optimizers.Adam(1e-3), loss="sparse_categorical_crossentropy", metrics=["accuracy"]) model.fit(train_ds, validation_data=val_ds, epochs=15, callbacks=[keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)]) # 7. PHASE 2: Fine-tune base.trainable = True for layer in base.layers[:-20]: layer.trainable = False model.compile( optimizer=keras.optimizers.Adam(1e-5), loss="sparse_categorical_crossentropy", metrics=["accuracy"]) model.fit(train_ds, validation_data=val_ds, epochs=10, callbacks=[ keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True), keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=2)]) # 8. SAVE model.save("production_classifier.keras") print("π Production classifier ready! Typically 95%+ accuracy.")
π Template Universal!
Script di atas adalah template yang bisa Anda pakai untuk hampir semua tugas image classification. Prosesnya:
1. Siapkan folder: data/train/class_a/, data/train/class_b/, dst.
2. Ganti path folder dan nama akan otomatis terdeteksi.
3. Jalankan script β 95%+ akurasi dengan 500-1000 gambar per kelas!
Transfer learning + augmentation + 2-phase training = formula winning yang konsisten.
π Universal Template!
The script above is a template you can use for almost any image classification task. Process:
1. Prepare folders: data/train/class_a/, data/train/class_b/, etc.
2. Change folder path and class names are auto-detected.
3. Run script β 95%+ accuracy with 500-1000 images per class!
Transfer learning + augmentation + 2-phase training = a consistently winning formula.
11. Ringkasan Page 3
11. Page 3 Summary
| Konsep | Apa Itu | Kode Kunci |
|---|---|---|
| Conv2D | Convolution layer β extract features | Conv2D(32, (3,3), padding='same') |
| MaxPooling2D | Downsampling spatial 2Γ | MaxPooling2D((2,2)) |
| GlobalAvgPool2D | Feature maps β 1 number/channel | GlobalAveragePooling2D() |
| BatchNormalization | Normalize activations per batch | BatchNormalization() |
| Data Augmentation | Variasi random saat training | RandomFlip, RandomRotation, RandomZoom |
| Transfer Learning | Pakai backbone pre-trained | MobileNetV2(weights='imagenet') |
| Phase 1 | Freeze backbone, train head (LR=1e-3) | base.trainable = False |
| Phase 2 | Unfreeze top layers (LR=1e-5) | base.layers[:-30].trainable = False |
| Custom Dataset | Load dari folder lokal | image_dataset_from_directory() |
| Feature Maps | Visualisasi apa yang CNN lihat | Model(inputs, [conv.output]) |
| Grad-CAM | Di mana model fokus | GradientTape + heatmap |
| Concept | What It Is | Key Code |
|---|---|---|
| Conv2D | Convolution layer β extract features | Conv2D(32, (3,3), padding='same') |
| MaxPooling2D | 2Γ spatial downsampling | MaxPooling2D((2,2)) |
| GlobalAvgPool2D | Feature maps β 1 number/channel | GlobalAveragePooling2D() |
| BatchNormalization | Normalize activations per batch | BatchNormalization() |
| Data Augmentation | Random training variations | RandomFlip, RandomRotation, RandomZoom |
| Transfer Learning | Use pre-trained backbone | MobileNetV2(weights='imagenet') |
| Phase 1 | Freeze backbone, train head (LR=1e-3) | base.trainable = False |
| Phase 2 | Unfreeze top layers (LR=1e-5) | base.layers[:-30].trainable = False |
| Custom Dataset | Load from local folder | image_dataset_from_directory() |
| Feature Maps | Visualize what CNN sees | Model(inputs, [conv.output]) |
| Grad-CAM | Where model focuses | GradientTape + heatmap |
Page 2 β Keras API & Model Building
Coming Next: Page 4 β tf.data Pipeline & Performance
Data loading yang lambat = GPU menganggur 50-90% waktu! Page 4 membahas secara mendalam: tf.data.Dataset API, the golden pattern (shuffle β map β batch β prefetch), cache untuk dataset kecil, TFRecord format untuk dataset besar, parallel data loading dengan num_parallel_calls, mixed precision training (float16 untuk 2Γ speedup), dan profiling dengan TF Profiler. Optimasi training Anda sampai 10Γ lebih cepat!
Coming Next: Page 4 β tf.data Pipeline & Performance
Slow data loading = GPU sitting idle 50-90% of the time! Page 4 covers in depth: tf.data.Dataset API, the golden pattern (shuffle β map β batch β prefetch), cache for small datasets, TFRecord format for large datasets, parallel data loading with num_parallel_calls, mixed precision training (float16 for 2Γ speedup), and profiling with TF Profiler. Optimize your training up to 10Γ faster!