Belajar TensorFlow Page 7 — Custom Training & Advanced Keras

📑 Daftar Isi — Page 7

📑 Table of Contents — Page 7

Kapan Butuh Custom Training? — model.fit() vs GradientTape
Custom Training Loop — GradientTape + @tf.function lengkap
Custom Loss Functions — Focal Loss, Contrastive, Triplet
Custom Metrics — F1 Score, Matthews Correlation
Model Subclassing — tf.keras.Model inheritance
tf.distribute — Multi-GPU — MirroredStrategy & TPUStrategy
Gradient Accumulation — Batch besar di GPU kecil
Gradient Clipping — Stabilitas training
Proyek: Custom GAN Training Loop
Ringkasan & Preview Page 8

When You Need Custom Training? — model.fit() vs GradientTape
Custom Training Loop — Complete GradientTape + @tf.function
Custom Loss Functions — Focal Loss, Contrastive, Triplet
Custom Metrics — F1 Score, Matthews Correlation
Model Subclassing — tf.keras.Model inheritance
tf.distribute — Multi-GPU — MirroredStrategy & TPUStrategy
Gradient Accumulation — Large batches on small GPUs
Gradient Clipping — Training stability
Project: Custom GAN Training Loop
Summary & Page 8 Preview

🤔

1. Kapan Butuh Custom Training Loop?

1. When Do You Need Custom Training?

model.fit() sangat powerful — tapi ada situasi yang membutuhkan kontrol penuh

model.fit() is very powerful — but some situations need full control

model.fit() menangani 90% kebutuhan training. Tapi ada situasi di mana Anda butuh kontrol penuh atas setiap aspek training loop:

model.fit() handles 90% of training needs. But there are situations where you need full control over every aspect of the training loop:

model.fit() vs Custom Training Loop — Kapan Pakai Mana? ✅ model.fit() — gunakan untuk: • Standard classification/regression • Transfer learning (Page 3) • Any model with 1 input → 1 output → 1 loss • Prototyping dan iterasi cepat • 90% of all deep learning projects! 🔧 Custom Training Loop — gunakan untuk: • GAN training: alternating Generator & Discriminator updates • Reinforcement Learning: reward-based gradient updates • Multiple loss functions with custom weighting • Gradient accumulation: simulate large batch on small GPU • Research experiments: custom gradient modifications • Meta-learning: learning to learn (MAML, etc.) • Curriculum learning: change data difficulty over time Rule: Start with model.fit(). Switch to custom loop ONLY when needed.

⚡

2. Custom Training Loop — GradientTape Lengkap

2. Custom Training Loop — Complete GradientTape

Full control: forward pass → loss → gradients → optimizer → metrics → logging

45_custom_training.py — Production-Grade Custom Loop 🔥python

import tensorflow as tf
from tensorflow import keras
import time

# ===========================
# 1. Setup
# ===========================
model = keras.Sequential([
    keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    keras.layers.BatchNormalization(),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(10, activation='softmax')
])

optimizer = keras.optimizers.Adam(learning_rate=1e-3)
loss_fn = keras.losses.SparseCategoricalCrossentropy()
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()
train_loss_metric = keras.metrics.Mean()

# Load data
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
X_train = X_train.reshape(-1, 784).astype('float32') / 255.0
X_test = X_test.reshape(-1, 784).astype('float32') / 255.0

train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_ds = train_ds.shuffle(60000).batch(64).prefetch(tf.data.AUTOTUNE)
val_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test)).batch(64)

# ===========================
# 2. Train step (compiled with @tf.function for speed!)
# ===========================
@tf.function
def train_step(x_batch, y_batch):
    with tf.GradientTape() as tape:
        # Forward pass (training=True for Dropout/BatchNorm)
        predictions = model(x_batch, training=True)
        loss = loss_fn(y_batch, predictions)

    # Compute gradients
    gradients = tape.gradient(loss, model.trainable_variables)

    # Apply gradients (update weights)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    # Update metrics
    train_acc_metric.update_state(y_batch, predictions)
    train_loss_metric.update_state(loss)
    return loss

# ===========================
# 3. Validation step
# ===========================
@tf.function
def val_step(x_batch, y_batch):
    predictions = model(x_batch, training=False)  # training=False!
    val_acc_metric.update_state(y_batch, predictions)

# ===========================
# 4. Training loop — FULL CONTROL
# ===========================
EPOCHS = 10
best_val_acc = 0

for epoch in range(EPOCHS):
    start_time = time.time()

    # Training
    for x_batch, y_batch in train_ds:
        train_step(x_batch, y_batch)

    train_acc = train_acc_metric.result()
    train_loss = train_loss_metric.result()

    # Validation
    for x_batch, y_batch in val_ds:
        val_step(x_batch, y_batch)

    val_acc = val_acc_metric.result()
    elapsed = time.time() - start_time

    # Logging
    print(f"Epoch {epoch+1}/{EPOCHS} ({elapsed:.1f}s) — "
          f"loss: {train_loss:.4f} — acc: {train_acc:.1%} — "
          f"val_acc: {val_acc:.1%}")

    # Save best model
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        model.save_weights('best_weights.weights.h5')
        print(f"  ↑ New best: {val_acc:.1%}")

    # Reset metrics for next epoch
    train_acc_metric.reset_state()
    train_loss_metric.reset_state()
    val_acc_metric.reset_state()

print(f"\n🎯 Best Val Accuracy: {best_val_acc:.1%}")
# 🎯 Best Val Accuracy: 98.2%

🎓 @tf.function — Kenapa Penting?
Tanpa @tf.function: Python eager mode — setiap operasi langsung dieksekusi. Mudah di-debug, tapi lambat.
Dengan @tf.function: TF mengkompilasi fungsi menjadi graph — operasi di-fuse, memory dioptimasi, bisa dijalankan di GPU/TPU. 2-10× lebih cepat!

Tips: Develop dan debug tanpa @tf.function. Setelah yakin berjalan benar, tambahkan @tf.function untuk kecepatan production.

🎓 @tf.function — Why It Matters?
Without @tf.function: Python eager mode — each op executes immediately. Easy to debug, but slow.
With @tf.function: TF compiles function into graph — ops are fused, memory optimized, can run on GPU/TPU. 2-10× faster!

Tip: Develop and debug without @tf.function. Once confirmed working, add @tf.function for production speed.

📐

3. Custom Loss Functions — Beyond CrossEntropy

Focal Loss untuk imbalanced data, Contrastive untuk similarity, Triplet untuk embeddings

Focal Loss for imbalanced data, Contrastive for similarity, Triplet for embeddings

46_custom_losses.py — 3 Custom Loss Functionspython

import tensorflow as tf

# ===========================
# 1. Focal Loss — for IMBALANCED classes
# Down-weights easy examples, focuses on hard ones
# Paper: "Focal Loss for Dense Object Detection" (Lin et al.)
# ===========================
class FocalLoss(tf.keras.losses.Loss):
    """Focal Loss: reduces loss for well-classified examples.
    Great for imbalanced datasets (e.g., 99% negative, 1% positive).
    gamma=0 → standard cross-entropy. gamma=2 → recommended default.
    """

    def __init__(self, gamma=2.0, alpha=0.25, **kwargs):
        super().__init__(**kwargs)
        self.gamma = gamma  # focusing parameter
        self.alpha = alpha  # class weight

    def call(self, y_true, y_pred):
        y_pred = tf.clip_by_value(y_pred, 1e-7, 1 - 1e-7)
        bce = -y_true * tf.math.log(y_pred) - (1 - y_true) * tf.math.log(1 - y_pred)
        p_t = y_true * y_pred + (1 - y_true) * (1 - y_pred)
        focal_weight = self.alpha * ((1 - p_t) ** self.gamma)
        return tf.reduce_mean(focal_weight * bce)

# Use: model.compile(loss=FocalLoss(gamma=2.0))
# When: fraud detection, medical diagnosis, rare event prediction

# ===========================
# 2. Contrastive Loss — for SIMILARITY learning
# Brings similar pairs closer, pushes dissimilar pairs apart
# ===========================
class ContrastiveLoss(tf.keras.losses.Loss):
    """Contrastive Loss for Siamese networks.
    y=1: same class → minimize distance.
    y=0: different class → maximize distance (up to margin).
    """

    def __init__(self, margin=1.0, **kwargs):
        super().__init__(**kwargs)
        self.margin = margin

    def call(self, y_true, distance):
        y_true = tf.cast(y_true, tf.float32)
        loss_positive = y_true * tf.square(distance)
        loss_negative = (1 - y_true) * tf.square(
            tf.maximum(self.margin - distance, 0))
        return tf.reduce_mean(0.5 * (loss_positive + loss_negative))

# When: face verification, signature matching, duplicate detection

# ===========================
# 3. Simple function-based loss (easiest approach)
# ===========================
def weighted_mse(y_true, y_pred):
    """MSE with higher weight for large errors"""
    error = y_true - y_pred
    weight = tf.where(tf.abs(error) > 1.0, 2.0, 1.0)
    return tf.reduce_mean(weight * tf.square(error))

# Use: model.compile(loss=weighted_mse)
# Any function(y_true, y_pred) → scalar works as a loss!

# ===========================
# 4. Label Smoothing (built-in, but useful to know)
# ===========================
loss = tf.keras.losses.CategoricalCrossentropy(label_smoothing=0.1)
# Instead of [0, 1, 0]: uses [0.033, 0.933, 0.033]
# Prevents overconfidence → better generalization

📊

4. Custom Metrics — F1 Score & Beyond

Accuracy saja tidak cukup — F1, MCC, dan metric domain-spesifik

Accuracy alone isn't enough — F1, MCC, and domain-specific metrics

47_custom_metrics.py — F1 Score & MCCpython

import tensorflow as tf

# ===========================
# 1. F1 Score — harmonic mean of precision & recall
# ===========================
class F1Score(tf.keras.metrics.Metric):
    """F1 Score: 2 × (precision × recall) / (precision + recall)
    Better than accuracy for imbalanced datasets.
    """

    def __init__(self, name='f1_score', threshold=0.5, **kwargs):
        super().__init__(name=name, **kwargs)
        self.precision = tf.keras.metrics.Precision(thresholds=threshold)
        self.recall = tf.keras.metrics.Recall(thresholds=threshold)

    def update_state(self, y_true, y_pred, sample_weight=None):
        self.precision.update_state(y_true, y_pred, sample_weight)
        self.recall.update_state(y_true, y_pred, sample_weight)

    def result(self):
        p = self.precision.result()
        r = self.recall.result()
        return 2 * ((p * r) / (p + r + tf.keras.backend.epsilon()))

    def reset_state(self):
        self.precision.reset_state()
        self.recall.reset_state()

# Use: model.compile(metrics=['accuracy', F1Score()])
# Output: "f1_score: 0.8723"

# ===========================
# 2. Matthews Correlation Coefficient (MCC)
# Best single metric for binary classification!
# ===========================
class MCC(tf.keras.metrics.Metric):
    """Matthews Correlation Coefficient.
    Range: [-1, +1]. 0 = random, 1 = perfect, -1 = inverse.
    Better than F1 for imbalanced datasets.
    """

    def __init__(self, name='mcc', threshold=0.5, **kwargs):
        super().__init__(name=name, **kwargs)
        self.tp = self.add_weight(name='tp', initializer='zeros')
        self.tn = self.add_weight(name='tn', initializer='zeros')
        self.fp = self.add_weight(name='fp', initializer='zeros')
        self.fn = self.add_weight(name='fn', initializer='zeros')
        self.threshold = threshold

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_pred = tf.cast(y_pred >= self.threshold, tf.float32)
        y_true = tf.cast(y_true, tf.float32)
        self.tp.assign_add(tf.reduce_sum(y_true * y_pred))
        self.tn.assign_add(tf.reduce_sum((1-y_true) * (1-y_pred)))
        self.fp.assign_add(tf.reduce_sum((1-y_true) * y_pred))
        self.fn.assign_add(tf.reduce_sum(y_true * (1-y_pred)))

    def result(self):
        num = self.tp * self.tn - self.fp * self.fn
        den = tf.sqrt((self.tp+self.fp) * (self.tp+self.fn) *
                      (self.tn+self.fp) * (self.tn+self.fn) + 1e-7)
        return num / den

    def reset_state(self):
        self.tp.assign(0); self.tn.assign(0)
        self.fp.assign(0); self.fn.assign(0)

🎓 Kapan Pakai Metric Apa?
Accuracy: Balanced classes, simple tasks.
F1 Score: Imbalanced classes, care about both precision & recall.
MCC: Best single metric for binary — balanced even with extreme imbalance.
AUC-ROC: When you need threshold-independent evaluation.
Domain-specific: BLEU (translation), IoU (segmentation), mAP (detection).

🎓 When to Use Which Metric?
Accuracy: Balanced classes, simple tasks.
F1 Score: Imbalanced classes, care about both precision & recall.
MCC: Best single metric for binary — balanced even with extreme imbalance.
AUC-ROC: When you need threshold-independent evaluation.
Domain-specific: BLEU (translation), IoU (segmentation), mAP (detection).

🏗️

5. Model Subclassing — Full Custom Architecture

tf.keras.Model inheritance untuk arsitektur research dengan dynamic logic

tf.keras.Model inheritance for research architectures with dynamic logic

48_model_subclassing.py — Custom Model Architecturepython

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# ===========================
# 1. Custom Residual Block
# ===========================
class ResidualBlock(layers.Layer):
    """Residual block: output = ReLU(x + F(x))"""

    def __init__(self, units, dropout=0.1, **kwargs):
        super().__init__(**kwargs)
        self.dense1 = layers.Dense(units, activation='relu',
                                   kernel_initializer='he_normal')
        self.dense2 = layers.Dense(units, kernel_initializer='he_normal')
        self.bn1 = layers.BatchNormalization()
        self.bn2 = layers.BatchNormalization()
        self.dropout = layers.Dropout(dropout)
        self.add = layers.Add()

    def call(self, inputs, training=False):
        x = self.dense1(inputs)
        x = self.bn1(x, training=training)
        x = self.dropout(x, training=training)
        x = self.dense2(x)
        x = self.bn2(x, training=training)
        return tf.nn.relu(self.add([x, inputs]))  # residual!

# ===========================
# 2. Full Custom Model with Dynamic Logic
# ===========================
class AdaptiveClassifier(keras.Model):
    """Model with dynamic routing based on input complexity."""

    def __init__(self, num_classes, num_blocks=3):
        super().__init__()
        self.flatten = layers.Flatten()
        self.project = layers.Dense(128, activation='relu')

        # Stack of residual blocks
        self.blocks = [ResidualBlock(128) for _ in range(num_blocks)]

        # Complexity estimator (dynamic routing!)
        self.complexity = layers.Dense(1, activation='sigmoid')

        self.dropout = layers.Dropout(0.3)
        self.classifier = layers.Dense(num_classes, activation='softmax')

    def call(self, inputs, training=False):
        x = self.flatten(inputs)
        x = self.project(x)

        # Estimate complexity — decide how many blocks to use
        complexity_score = self.complexity(tf.stop_gradient(x))

        # Dynamic depth! (not possible with Sequential/Functional)
        for i, block in enumerate(self.blocks):
            x = block(x, training=training)
            # Could add early exit logic here based on complexity_score

        x = self.dropout(x, training=training)
        return self.classifier(x)

# Use exactly like any Keras model!
model = AdaptiveClassifier(num_classes=10, num_blocks=4)
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy', F1Score()])
# model.fit(X_train, y_train, ...) → works!

🎓 3 Cara Build Model — Quick Decision:
Sequential: Linear stack. 80% kasus. model = Sequential([...])
Functional: Branching, skip connections, multi-input. 15% kasus. Model(inputs, outputs)
Subclassing: Dynamic logic (if/else, loops), research. 5% kasus. class MyModel(Model)
Start simple. Upgrade complexity only when needed.

🎓 3 Ways to Build Models — Quick Decision:
Sequential: Linear stack. 80% of cases. model = Sequential([...])
Functional: Branching, skip connections, multi-input. 15%. Model(inputs, outputs)
Subclassing: Dynamic logic (if/else, loops), research. 5%. class MyModel(Model)
Start simple. Upgrade complexity only when needed.

🖥️

6. tf.distribute — Multi-GPU & TPU Training

Scale training dari 1 GPU ke banyak GPU/TPU — minimal code changes

Scale training from 1 GPU to many GPUs/TPUs — minimal code changes

49_multi_gpu.py — tf.distribute Strategiespython

import tensorflow as tf

# ===========================
# 1. MirroredStrategy — multi-GPU on ONE machine
# ===========================
strategy = tf.distribute.MirroredStrategy()
print(f"Number of devices: {strategy.num_replicas_in_sync}")
# → "Number of devices: 4" (if 4 GPUs)

# Build model INSIDE strategy scope!
with strategy.scope():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

# Training — AUTOMATIC data distribution!
model.fit(train_ds, epochs=10)
# Each GPU processes batch_size/num_gpus samples
# Gradients are synchronized across GPUs (all-reduce)
# 2 GPUs ≈ 1.8× throughput, 4 GPUs ≈ 3.5× throughput

# ===========================
# 2. TPUStrategy — Google TPU (Colab/Cloud)
# ===========================
# resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
# tf.config.experimental_connect_to_cluster(resolver)
# tf.tpu.experimental.initialize_tpu_system(resolver)
# strategy = tf.distribute.TPUStrategy(resolver)
# 
# with strategy.scope():
#     model = build_model()
#     model.compile(...)
# model.fit(train_ds, epochs=10)
# TPU v3-8: ~8× faster than single GPU!

# ===========================
# 3. MultiWorkerMirroredStrategy — multi-MACHINE
# For distributed training across multiple servers
# ===========================
# strategy = tf.distribute.MultiWorkerMirroredStrategy()
# Same API, but runs across network-connected machines

# ===========================
# Tips for multi-GPU training
# ===========================
# 1. Scale batch_size × num_GPUs (e.g., 64 × 4 = 256)
# 2. Scale learning_rate × num_GPUs (linear scaling rule)
# 3. Use tf.data pipeline (prefetch!) — GPUs are HUNGRY
# 4. Use mixed precision — even faster on multiple GPUs

📦

7. Gradient Accumulation — Batch Besar di GPU Kecil

7. Gradient Accumulation — Large Batches on Small GPUs

Simulasikan batch size 256 di GPU yang hanya muat batch 32

Simulate batch size 256 on a GPU that only fits batch 32

50_gradient_accumulation.pypython

import tensorflow as tf

# Problem: BERT fine-tuning needs batch_size=32+
# But GPU only fits batch_size=8 (OOM with 32!)
# Solution: accumulate gradients over 4 mini-batches → effective batch=32

ACCUM_STEPS = 4  # accumulate 4 mini-batches
MINI_BATCH = 8   # each mini-batch
# Effective batch = 4 × 8 = 32

optimizer = tf.keras.optimizers.Adam(1e-3)

# Accumulator variables (same shape as model weights)
accum_gradients = [tf.Variable(tf.zeros_like(v), trainable=False)
                   for v in model.trainable_variables]

@tf.function
def train_step_accumulate(x, y, step):
    with tf.GradientTape() as tape:
        preds = model(x, training=True)
        loss = loss_fn(y, preds) / ACCUM_STEPS  # scale loss!

    grads = tape.gradient(loss, model.trainable_variables)

    # Accumulate
    for accum, grad in zip(accum_gradients, grads):
        accum.assign_add(grad)

    # Apply when accumulated enough
    if (step + 1) % ACCUM_STEPS == 0:
        optimizer.apply_gradients(
            zip(accum_gradients, model.trainable_variables))
        for accum in accum_gradients:
            accum.assign(tf.zeros_like(accum))  # reset!

    return loss * ACCUM_STEPS  # return unscaled loss for logging

✂️

8. Gradient Clipping — Stabilitas Training

8. Gradient Clipping — Training Stability

Mencegah exploding gradients — terutama penting untuk RNN dan Transformer

Preventing exploding gradients — especially important for RNN and Transformer

51_gradient_clipping.pypython

import tensorflow as tf

# ===========================
# Method 1: In optimizer (easiest!)
# ===========================
optimizer = tf.keras.optimizers.Adam(
    learning_rate=1e-3,
    clipnorm=1.0,      # clip gradients with L2 norm > 1.0
    # clipvalue=0.5,   # clip each gradient value to [-0.5, 0.5]
)
# clipnorm: scales all gradients so total norm ≤ 1.0
# clipvalue: clips each individual gradient value
# clipnorm is generally preferred (preserves direction)

# ===========================
# Method 2: Manual clipping in custom loop
# ===========================
@tf.function
def train_step_clipped(x, y):
    with tf.GradientTape() as tape:
        preds = model(x, training=True)
        loss = loss_fn(y, preds)

    grads = tape.gradient(loss, model.trainable_variables)

    # Clip gradients
    grads, global_norm = tf.clip_by_global_norm(grads, clip_norm=1.0)
    # global_norm = original gradient norm (useful for monitoring)

    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    return loss, global_norm

# Monitor gradient norm — if consistently > 10, you have a problem!
# Healthy range: 0.1 — 5.0
# > 10: exploding gradients → reduce LR or add clipping
# < 0.001: vanishing gradients → use residual connections

💡 Best Practice: Selalu gunakan clipnorm=1.0 di optimizer untuk RNN, LSTM, GRU, dan Transformer training. Ini mencegah gradient explosion tanpa mengorbankan konvergensi. Untuk training yang sangat stabil: kombinasikan gradient clipping + learning rate warmup + cosine decay.

💡 Best Practice: Always use clipnorm=1.0 in the optimizer for RNN, LSTM, GRU, and Transformer training. This prevents gradient explosion without sacrificing convergence. For very stable training: combine gradient clipping + learning rate warmup + cosine decay.

🎨

9. Proyek: Custom GAN Training Loop

9. Project: Custom GAN Training Loop

Contoh nyata kenapa custom loop dibutuhkan — alternating 2 optimizers

Real example of why custom loops are needed — alternating 2 optimizers

52_gan_training_loop.py — GAN Custom Training 🔥python

import tensorflow as tf

# This is IMPOSSIBLE with model.fit()!
# GAN needs to alternate between training D and G

generator = build_generator()      # noise → fake image
discriminator = build_discriminator()  # image → real/fake

g_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
d_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
bce = tf.keras.losses.BinaryCrossentropy(from_logits=True)

NOISE_DIM = 100
BATCH_SIZE = 64

@tf.function
def train_step(real_images):
    noise = tf.random.normal([BATCH_SIZE, NOISE_DIM])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        # Forward pass
        fake_images = generator(noise, training=True)
        real_output = discriminator(real_images, training=True)
        fake_output = discriminator(fake_images, training=True)

        # Losses
        d_loss_real = bce(tf.ones_like(real_output), real_output)
        d_loss_fake = bce(tf.zeros_like(fake_output), fake_output)
        d_loss = d_loss_real + d_loss_fake

        g_loss = bce(tf.ones_like(fake_output), fake_output)

    # Separate gradient updates — THIS is why we need custom loop!
    d_grads = disc_tape.gradient(d_loss, discriminator.trainable_variables)
    g_grads = gen_tape.gradient(g_loss, generator.trainable_variables)

    d_optimizer.apply_gradients(
        zip(d_grads, discriminator.trainable_variables))
    g_optimizer.apply_gradients(
        zip(g_grads, generator.trainable_variables))

    return d_loss, g_loss

# Training loop
for epoch in range(100):
    for real_batch in train_dataset:
        d_loss, g_loss = train_step(real_batch)

    print(f"Epoch {epoch+1} | D_loss: {d_loss:.4f} | G_loss: {g_loss:.4f}")

    # Generate sample images every 10 epochs
    if (epoch + 1) % 10 == 0:
        noise = tf.random.normal([16, NOISE_DIM])
        generated = generator(noise, training=False)
        # save_images(generated, f"gen_epoch_{epoch+1}.png")

🎓 Kenapa GAN Butuh Custom Loop?
model.fit() mengoptimasi satu model dengan satu loss dan satu optimizer. GAN punya:
• 2 model (Generator + Discriminator)
• 2 loss (G_loss + D_loss — berlawanan arah!)
• 2 optimizer (satu per model)
• Alternating updates: train D → freeze D → train G → repeat
Ini tidak bisa dilakukan dengan model.fit(). Custom loop = satu-satunya cara.

🎓 Why GAN Needs Custom Loop?
model.fit() optimizes one model with one loss and one optimizer. GAN has:
• 2 models (Generator + Discriminator)
• 2 losses (G_loss + D_loss — opposing directions!)
• 2 optimizers (one per model)
• Alternating updates: train D → freeze D → train G → repeat
This cannot be done with model.fit(). Custom loop = the only way.

📝

10. Ringkasan Page 7

10. Page 7 Summary

Semua yang sudah kita pelajari

Everything we learned

Konsep	Apa Itu	Kode Kunci
Custom Training	Full control GradientTape	`tape.gradient(loss, vars)`
@tf.function	Compile ke graph (2-10× faster)	`@tf.function`
Focal Loss	Loss untuk imbalanced data	`class FocalLoss(Loss)`
Contrastive Loss	Loss untuk similarity learning	`class ContrastiveLoss(Loss)`
F1 Score	Custom metric balanced	`class F1Score(Metric)`
Subclassing	Model dengan dynamic logic	`class MyModel(keras.Model)`
MirroredStrategy	Multi-GPU satu mesin	`tf.distribute.MirroredStrategy()`
Gradient Accum	Batch besar, GPU kecil	`accum.assign_add(grad)`
Gradient Clipping	Cegah explosion	`clipnorm=1.0`

Concept	What It Is	Key Code
Custom Training	Full control GradientTape	`tape.gradient(loss, vars)`
@tf.function	Compile to graph (2-10× faster)	`@tf.function`
Focal Loss	Loss for imbalanced data	`class FocalLoss(Loss)`
Contrastive Loss	Loss for similarity learning	`class ContrastiveLoss(Loss)`
F1 Score	Custom balanced metric	`class F1Score(Metric)`
Subclassing	Model with dynamic logic	`class MyModel(keras.Model)`
MirroredStrategy	Multi-GPU one machine	`tf.distribute.MirroredStrategy()`
Gradient Accum	Large batch, small GPU	`accum.assign_add(grad)`
Gradient Clipping	Prevent explosion	`clipnorm=1.0`

← Page Sebelumnya← Previous Page

Page 6 — Transformer & BERT di TensorFlow

📘

Coming Next: Page 8 — GAN & Generative Models

Membuat gambar dari nol! Page 8 membahas: arsitektur DCGAN lengkap (Generator Conv2DTranspose + Discriminator Conv2D), training loop adversarial, Variational Autoencoder (VAE) dan reparameterization trick, conditional GAN untuk generate kelas tertentu, Wasserstein GAN untuk training stabil, latent space interpolation, dan tips production GAN training.

📘

Coming Next: Page 8 — GAN & Generative Models

Creating images from scratch! Page 8 covers: complete DCGAN architecture (Generator Conv2DTranspose + Discriminator Conv2D), adversarial training loop, Variational Autoencoder (VAE) and reparameterization trick, conditional GAN for specific class generation, Wasserstein GAN for stable training, latent space interpolation, and production GAN training tips.