š Daftar Isi ā Page 7
š Table of Contents ā Page 7
- Kapan Butuh Custom Training? ā model.fit() vs GradientTape
- Custom Training Loop ā GradientTape + @tf.function lengkap
- Custom Loss Functions ā Focal Loss, Contrastive, Triplet
- Custom Metrics ā F1 Score, Matthews Correlation
- Model Subclassing ā tf.keras.Model inheritance
- tf.distribute ā Multi-GPU ā MirroredStrategy & TPUStrategy
- Gradient Accumulation ā Batch besar di GPU kecil
- Gradient Clipping ā Stabilitas training
- Proyek: Custom GAN Training Loop
- Ringkasan & Preview Page 8
- When You Need Custom Training? ā model.fit() vs GradientTape
- Custom Training Loop ā Complete GradientTape + @tf.function
- Custom Loss Functions ā Focal Loss, Contrastive, Triplet
- Custom Metrics ā F1 Score, Matthews Correlation
- Model Subclassing ā tf.keras.Model inheritance
- tf.distribute ā Multi-GPU ā MirroredStrategy & TPUStrategy
- Gradient Accumulation ā Large batches on small GPUs
- Gradient Clipping ā Training stability
- Project: Custom GAN Training Loop
- Summary & Page 8 Preview
1. Kapan Butuh Custom Training Loop?
1. When Do You Need Custom Training?
model.fit() menangani 90% kebutuhan training. Tapi ada situasi di mana Anda butuh kontrol penuh atas setiap aspek training loop:
model.fit() handles 90% of training needs. But there are situations where you need full control over every aspect of the training loop:
2. Custom Training Loop ā GradientTape Lengkap
2. Custom Training Loop ā Complete GradientTape
import tensorflow as tf from tensorflow import keras import time # =========================== # 1. Setup # =========================== model = keras.Sequential([ keras.layers.Dense(128, activation='relu', input_shape=(784,)), keras.layers.BatchNormalization(), keras.layers.Dropout(0.3), keras.layers.Dense(10, activation='softmax') ]) optimizer = keras.optimizers.Adam(learning_rate=1e-3) loss_fn = keras.losses.SparseCategoricalCrossentropy() train_acc_metric = keras.metrics.SparseCategoricalAccuracy() val_acc_metric = keras.metrics.SparseCategoricalAccuracy() train_loss_metric = keras.metrics.Mean() # Load data (X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data() X_train = X_train.reshape(-1, 784).astype('float32') / 255.0 X_test = X_test.reshape(-1, 784).astype('float32') / 255.0 train_ds = tf.data.Dataset.from_tensor_slices((X_train, y_train)) train_ds = train_ds.shuffle(60000).batch(64).prefetch(tf.data.AUTOTUNE) val_ds = tf.data.Dataset.from_tensor_slices((X_test, y_test)).batch(64) # =========================== # 2. Train step (compiled with @tf.function for speed!) # =========================== @tf.function def train_step(x_batch, y_batch): with tf.GradientTape() as tape: # Forward pass (training=True for Dropout/BatchNorm) predictions = model(x_batch, training=True) loss = loss_fn(y_batch, predictions) # Compute gradients gradients = tape.gradient(loss, model.trainable_variables) # Apply gradients (update weights) optimizer.apply_gradients(zip(gradients, model.trainable_variables)) # Update metrics train_acc_metric.update_state(y_batch, predictions) train_loss_metric.update_state(loss) return loss # =========================== # 3. Validation step # =========================== @tf.function def val_step(x_batch, y_batch): predictions = model(x_batch, training=False) # training=False! val_acc_metric.update_state(y_batch, predictions) # =========================== # 4. Training loop ā FULL CONTROL # =========================== EPOCHS = 10 best_val_acc = 0 for epoch in range(EPOCHS): start_time = time.time() # Training for x_batch, y_batch in train_ds: train_step(x_batch, y_batch) train_acc = train_acc_metric.result() train_loss = train_loss_metric.result() # Validation for x_batch, y_batch in val_ds: val_step(x_batch, y_batch) val_acc = val_acc_metric.result() elapsed = time.time() - start_time # Logging print(f"Epoch {epoch+1}/{EPOCHS} ({elapsed:.1f}s) ā " f"loss: {train_loss:.4f} ā acc: {train_acc:.1%} ā " f"val_acc: {val_acc:.1%}") # Save best model if val_acc > best_val_acc: best_val_acc = val_acc model.save_weights('best_weights.weights.h5') print(f" ā New best: {val_acc:.1%}") # Reset metrics for next epoch train_acc_metric.reset_state() train_loss_metric.reset_state() val_acc_metric.reset_state() print(f"\nšÆ Best Val Accuracy: {best_val_acc:.1%}") # šÆ Best Val Accuracy: 98.2%
š @tf.function ā Kenapa Penting?
Tanpa @tf.function: Python eager mode ā setiap operasi langsung dieksekusi. Mudah di-debug, tapi lambat.
Dengan @tf.function: TF mengkompilasi fungsi menjadi graph ā operasi di-fuse, memory dioptimasi, bisa dijalankan di GPU/TPU. 2-10Ć lebih cepat!
Tips: Develop dan debug tanpa @tf.function. Setelah yakin berjalan benar, tambahkan @tf.function untuk kecepatan production.
š @tf.function ā Why It Matters?
Without @tf.function: Python eager mode ā each op executes immediately. Easy to debug, but slow.
With @tf.function: TF compiles function into graph ā ops are fused, memory optimized, can run on GPU/TPU. 2-10Ć faster!
Tip: Develop and debug without @tf.function. Once confirmed working, add @tf.function for production speed.
3. Custom Loss Functions ā Beyond CrossEntropy
3. Custom Loss Functions ā Beyond CrossEntropy
import tensorflow as tf # =========================== # 1. Focal Loss ā for IMBALANCED classes # Down-weights easy examples, focuses on hard ones # Paper: "Focal Loss for Dense Object Detection" (Lin et al.) # =========================== class FocalLoss(tf.keras.losses.Loss): """Focal Loss: reduces loss for well-classified examples. Great for imbalanced datasets (e.g., 99% negative, 1% positive). gamma=0 ā standard cross-entropy. gamma=2 ā recommended default. """ def __init__(self, gamma=2.0, alpha=0.25, **kwargs): super().__init__(**kwargs) self.gamma = gamma # focusing parameter self.alpha = alpha # class weight def call(self, y_true, y_pred): y_pred = tf.clip_by_value(y_pred, 1e-7, 1 - 1e-7) bce = -y_true * tf.math.log(y_pred) - (1 - y_true) * tf.math.log(1 - y_pred) p_t = y_true * y_pred + (1 - y_true) * (1 - y_pred) focal_weight = self.alpha * ((1 - p_t) ** self.gamma) return tf.reduce_mean(focal_weight * bce) # Use: model.compile(loss=FocalLoss(gamma=2.0)) # When: fraud detection, medical diagnosis, rare event prediction # =========================== # 2. Contrastive Loss ā for SIMILARITY learning # Brings similar pairs closer, pushes dissimilar pairs apart # =========================== class ContrastiveLoss(tf.keras.losses.Loss): """Contrastive Loss for Siamese networks. y=1: same class ā minimize distance. y=0: different class ā maximize distance (up to margin). """ def __init__(self, margin=1.0, **kwargs): super().__init__(**kwargs) self.margin = margin def call(self, y_true, distance): y_true = tf.cast(y_true, tf.float32) loss_positive = y_true * tf.square(distance) loss_negative = (1 - y_true) * tf.square( tf.maximum(self.margin - distance, 0)) return tf.reduce_mean(0.5 * (loss_positive + loss_negative)) # When: face verification, signature matching, duplicate detection # =========================== # 3. Simple function-based loss (easiest approach) # =========================== def weighted_mse(y_true, y_pred): """MSE with higher weight for large errors""" error = y_true - y_pred weight = tf.where(tf.abs(error) > 1.0, 2.0, 1.0) return tf.reduce_mean(weight * tf.square(error)) # Use: model.compile(loss=weighted_mse) # Any function(y_true, y_pred) ā scalar works as a loss! # =========================== # 4. Label Smoothing (built-in, but useful to know) # =========================== loss = tf.keras.losses.CategoricalCrossentropy(label_smoothing=0.1) # Instead of [0, 1, 0]: uses [0.033, 0.933, 0.033] # Prevents overconfidence ā better generalization
4. Custom Metrics ā F1 Score & Beyond
4. Custom Metrics ā F1 Score & Beyond
import tensorflow as tf # =========================== # 1. F1 Score ā harmonic mean of precision & recall # =========================== class F1Score(tf.keras.metrics.Metric): """F1 Score: 2 Ć (precision Ć recall) / (precision + recall) Better than accuracy for imbalanced datasets. """ def __init__(self, name='f1_score', threshold=0.5, **kwargs): super().__init__(name=name, **kwargs) self.precision = tf.keras.metrics.Precision(thresholds=threshold) self.recall = tf.keras.metrics.Recall(thresholds=threshold) def update_state(self, y_true, y_pred, sample_weight=None): self.precision.update_state(y_true, y_pred, sample_weight) self.recall.update_state(y_true, y_pred, sample_weight) def result(self): p = self.precision.result() r = self.recall.result() return 2 * ((p * r) / (p + r + tf.keras.backend.epsilon())) def reset_state(self): self.precision.reset_state() self.recall.reset_state() # Use: model.compile(metrics=['accuracy', F1Score()]) # Output: "f1_score: 0.8723" # =========================== # 2. Matthews Correlation Coefficient (MCC) # Best single metric for binary classification! # =========================== class MCC(tf.keras.metrics.Metric): """Matthews Correlation Coefficient. Range: [-1, +1]. 0 = random, 1 = perfect, -1 = inverse. Better than F1 for imbalanced datasets. """ def __init__(self, name='mcc', threshold=0.5, **kwargs): super().__init__(name=name, **kwargs) self.tp = self.add_weight(name='tp', initializer='zeros') self.tn = self.add_weight(name='tn', initializer='zeros') self.fp = self.add_weight(name='fp', initializer='zeros') self.fn = self.add_weight(name='fn', initializer='zeros') self.threshold = threshold def update_state(self, y_true, y_pred, sample_weight=None): y_pred = tf.cast(y_pred >= self.threshold, tf.float32) y_true = tf.cast(y_true, tf.float32) self.tp.assign_add(tf.reduce_sum(y_true * y_pred)) self.tn.assign_add(tf.reduce_sum((1-y_true) * (1-y_pred))) self.fp.assign_add(tf.reduce_sum((1-y_true) * y_pred)) self.fn.assign_add(tf.reduce_sum(y_true * (1-y_pred))) def result(self): num = self.tp * self.tn - self.fp * self.fn den = tf.sqrt((self.tp+self.fp) * (self.tp+self.fn) * (self.tn+self.fp) * (self.tn+self.fn) + 1e-7) return num / den def reset_state(self): self.tp.assign(0); self.tn.assign(0) self.fp.assign(0); self.fn.assign(0)
š Kapan Pakai Metric Apa?
Accuracy: Balanced classes, simple tasks.
F1 Score: Imbalanced classes, care about both precision & recall.
MCC: Best single metric for binary ā balanced even with extreme imbalance.
AUC-ROC: When you need threshold-independent evaluation.
Domain-specific: BLEU (translation), IoU (segmentation), mAP (detection).
š When to Use Which Metric?
Accuracy: Balanced classes, simple tasks.
F1 Score: Imbalanced classes, care about both precision & recall.
MCC: Best single metric for binary ā balanced even with extreme imbalance.
AUC-ROC: When you need threshold-independent evaluation.
Domain-specific: BLEU (translation), IoU (segmentation), mAP (detection).
5. Model Subclassing ā Full Custom Architecture
5. Model Subclassing ā Full Custom Architecture
import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers # =========================== # 1. Custom Residual Block # =========================== class ResidualBlock(layers.Layer): """Residual block: output = ReLU(x + F(x))""" def __init__(self, units, dropout=0.1, **kwargs): super().__init__(**kwargs) self.dense1 = layers.Dense(units, activation='relu', kernel_initializer='he_normal') self.dense2 = layers.Dense(units, kernel_initializer='he_normal') self.bn1 = layers.BatchNormalization() self.bn2 = layers.BatchNormalization() self.dropout = layers.Dropout(dropout) self.add = layers.Add() def call(self, inputs, training=False): x = self.dense1(inputs) x = self.bn1(x, training=training) x = self.dropout(x, training=training) x = self.dense2(x) x = self.bn2(x, training=training) return tf.nn.relu(self.add([x, inputs])) # residual! # =========================== # 2. Full Custom Model with Dynamic Logic # =========================== class AdaptiveClassifier(keras.Model): """Model with dynamic routing based on input complexity.""" def __init__(self, num_classes, num_blocks=3): super().__init__() self.flatten = layers.Flatten() self.project = layers.Dense(128, activation='relu') # Stack of residual blocks self.blocks = [ResidualBlock(128) for _ in range(num_blocks)] # Complexity estimator (dynamic routing!) self.complexity = layers.Dense(1, activation='sigmoid') self.dropout = layers.Dropout(0.3) self.classifier = layers.Dense(num_classes, activation='softmax') def call(self, inputs, training=False): x = self.flatten(inputs) x = self.project(x) # Estimate complexity ā decide how many blocks to use complexity_score = self.complexity(tf.stop_gradient(x)) # Dynamic depth! (not possible with Sequential/Functional) for i, block in enumerate(self.blocks): x = block(x, training=training) # Could add early exit logic here based on complexity_score x = self.dropout(x, training=training) return self.classifier(x) # Use exactly like any Keras model! model = AdaptiveClassifier(num_classes=10, num_blocks=4) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy', F1Score()]) # model.fit(X_train, y_train, ...) ā works!
š 3 Cara Build Model ā Quick Decision:
Sequential: Linear stack. 80% kasus. model = Sequential([...])
Functional: Branching, skip connections, multi-input. 15% kasus. Model(inputs, outputs)
Subclassing: Dynamic logic (if/else, loops), research. 5% kasus. class MyModel(Model)
Start simple. Upgrade complexity only when needed.
š 3 Ways to Build Models ā Quick Decision:
Sequential: Linear stack. 80% of cases. model = Sequential([...])
Functional: Branching, skip connections, multi-input. 15%. Model(inputs, outputs)
Subclassing: Dynamic logic (if/else, loops), research. 5%. class MyModel(Model)
Start simple. Upgrade complexity only when needed.
6. tf.distribute ā Multi-GPU & TPU Training
6. tf.distribute ā Multi-GPU & TPU Training
import tensorflow as tf # =========================== # 1. MirroredStrategy ā multi-GPU on ONE machine # =========================== strategy = tf.distribute.MirroredStrategy() print(f"Number of devices: {strategy.num_replicas_in_sync}") # ā "Number of devices: 4" (if 4 GPUs) # Build model INSIDE strategy scope! with strategy.scope(): model = tf.keras.Sequential([ tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)), tf.keras.layers.Dense(10, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Training ā AUTOMATIC data distribution! model.fit(train_ds, epochs=10) # Each GPU processes batch_size/num_gpus samples # Gradients are synchronized across GPUs (all-reduce) # 2 GPUs ā 1.8Ć throughput, 4 GPUs ā 3.5Ć throughput # =========================== # 2. TPUStrategy ā Google TPU (Colab/Cloud) # =========================== # resolver = tf.distribute.cluster_resolver.TPUClusterResolver() # tf.config.experimental_connect_to_cluster(resolver) # tf.tpu.experimental.initialize_tpu_system(resolver) # strategy = tf.distribute.TPUStrategy(resolver) # # with strategy.scope(): # model = build_model() # model.compile(...) # model.fit(train_ds, epochs=10) # TPU v3-8: ~8Ć faster than single GPU! # =========================== # 3. MultiWorkerMirroredStrategy ā multi-MACHINE # For distributed training across multiple servers # =========================== # strategy = tf.distribute.MultiWorkerMirroredStrategy() # Same API, but runs across network-connected machines # =========================== # Tips for multi-GPU training # =========================== # 1. Scale batch_size Ć num_GPUs (e.g., 64 Ć 4 = 256) # 2. Scale learning_rate Ć num_GPUs (linear scaling rule) # 3. Use tf.data pipeline (prefetch!) ā GPUs are HUNGRY # 4. Use mixed precision ā even faster on multiple GPUs
7. Gradient Accumulation ā Batch Besar di GPU Kecil
7. Gradient Accumulation ā Large Batches on Small GPUs
import tensorflow as tf # Problem: BERT fine-tuning needs batch_size=32+ # But GPU only fits batch_size=8 (OOM with 32!) # Solution: accumulate gradients over 4 mini-batches ā effective batch=32 ACCUM_STEPS = 4 # accumulate 4 mini-batches MINI_BATCH = 8 # each mini-batch # Effective batch = 4 Ć 8 = 32 optimizer = tf.keras.optimizers.Adam(1e-3) # Accumulator variables (same shape as model weights) accum_gradients = [tf.Variable(tf.zeros_like(v), trainable=False) for v in model.trainable_variables] @tf.function def train_step_accumulate(x, y, step): with tf.GradientTape() as tape: preds = model(x, training=True) loss = loss_fn(y, preds) / ACCUM_STEPS # scale loss! grads = tape.gradient(loss, model.trainable_variables) # Accumulate for accum, grad in zip(accum_gradients, grads): accum.assign_add(grad) # Apply when accumulated enough if (step + 1) % ACCUM_STEPS == 0: optimizer.apply_gradients( zip(accum_gradients, model.trainable_variables)) for accum in accum_gradients: accum.assign(tf.zeros_like(accum)) # reset! return loss * ACCUM_STEPS # return unscaled loss for logging
8. Gradient Clipping ā Stabilitas Training
8. Gradient Clipping ā Training Stability
import tensorflow as tf # =========================== # Method 1: In optimizer (easiest!) # =========================== optimizer = tf.keras.optimizers.Adam( learning_rate=1e-3, clipnorm=1.0, # clip gradients with L2 norm > 1.0 # clipvalue=0.5, # clip each gradient value to [-0.5, 0.5] ) # clipnorm: scales all gradients so total norm ⤠1.0 # clipvalue: clips each individual gradient value # clipnorm is generally preferred (preserves direction) # =========================== # Method 2: Manual clipping in custom loop # =========================== @tf.function def train_step_clipped(x, y): with tf.GradientTape() as tape: preds = model(x, training=True) loss = loss_fn(y, preds) grads = tape.gradient(loss, model.trainable_variables) # Clip gradients grads, global_norm = tf.clip_by_global_norm(grads, clip_norm=1.0) # global_norm = original gradient norm (useful for monitoring) optimizer.apply_gradients(zip(grads, model.trainable_variables)) return loss, global_norm # Monitor gradient norm ā if consistently > 10, you have a problem! # Healthy range: 0.1 ā 5.0 # > 10: exploding gradients ā reduce LR or add clipping # < 0.001: vanishing gradients ā use residual connections
š” Best Practice: Selalu gunakan clipnorm=1.0 di optimizer untuk RNN, LSTM, GRU, dan Transformer training. Ini mencegah gradient explosion tanpa mengorbankan konvergensi. Untuk training yang sangat stabil: kombinasikan gradient clipping + learning rate warmup + cosine decay.
š” Best Practice: Always use clipnorm=1.0 in the optimizer for RNN, LSTM, GRU, and Transformer training. This prevents gradient explosion without sacrificing convergence. For very stable training: combine gradient clipping + learning rate warmup + cosine decay.
9. Proyek: Custom GAN Training Loop
9. Project: Custom GAN Training Loop
import tensorflow as tf # This is IMPOSSIBLE with model.fit()! # GAN needs to alternate between training D and G generator = build_generator() # noise ā fake image discriminator = build_discriminator() # image ā real/fake g_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5) d_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5) bce = tf.keras.losses.BinaryCrossentropy(from_logits=True) NOISE_DIM = 100 BATCH_SIZE = 64 @tf.function def train_step(real_images): noise = tf.random.normal([BATCH_SIZE, NOISE_DIM]) with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape: # Forward pass fake_images = generator(noise, training=True) real_output = discriminator(real_images, training=True) fake_output = discriminator(fake_images, training=True) # Losses d_loss_real = bce(tf.ones_like(real_output), real_output) d_loss_fake = bce(tf.zeros_like(fake_output), fake_output) d_loss = d_loss_real + d_loss_fake g_loss = bce(tf.ones_like(fake_output), fake_output) # Separate gradient updates ā THIS is why we need custom loop! d_grads = disc_tape.gradient(d_loss, discriminator.trainable_variables) g_grads = gen_tape.gradient(g_loss, generator.trainable_variables) d_optimizer.apply_gradients( zip(d_grads, discriminator.trainable_variables)) g_optimizer.apply_gradients( zip(g_grads, generator.trainable_variables)) return d_loss, g_loss # Training loop for epoch in range(100): for real_batch in train_dataset: d_loss, g_loss = train_step(real_batch) print(f"Epoch {epoch+1} | D_loss: {d_loss:.4f} | G_loss: {g_loss:.4f}") # Generate sample images every 10 epochs if (epoch + 1) % 10 == 0: noise = tf.random.normal([16, NOISE_DIM]) generated = generator(noise, training=False) # save_images(generated, f"gen_epoch_{epoch+1}.png")
š Kenapa GAN Butuh Custom Loop?
model.fit() mengoptimasi satu model dengan satu loss dan satu optimizer. GAN punya:
⢠2 model (Generator + Discriminator)
⢠2 loss (G_loss + D_loss ā berlawanan arah!)
⢠2 optimizer (satu per model)
⢠Alternating updates: train D ā freeze D ā train G ā repeat
Ini tidak bisa dilakukan dengan model.fit(). Custom loop = satu-satunya cara.
š Why GAN Needs Custom Loop?
model.fit() optimizes one model with one loss and one optimizer. GAN has:
⢠2 models (Generator + Discriminator)
⢠2 losses (G_loss + D_loss ā opposing directions!)
⢠2 optimizers (one per model)
⢠Alternating updates: train D ā freeze D ā train G ā repeat
This cannot be done with model.fit(). Custom loop = the only way.
10. Ringkasan Page 7
10. Page 7 Summary
| Konsep | Apa Itu | Kode Kunci |
|---|---|---|
| Custom Training | Full control GradientTape | tape.gradient(loss, vars) |
| @tf.function | Compile ke graph (2-10Ć faster) | @tf.function |
| Focal Loss | Loss untuk imbalanced data | class FocalLoss(Loss) |
| Contrastive Loss | Loss untuk similarity learning | class ContrastiveLoss(Loss) |
| F1 Score | Custom metric balanced | class F1Score(Metric) |
| Subclassing | Model dengan dynamic logic | class MyModel(keras.Model) |
| MirroredStrategy | Multi-GPU satu mesin | tf.distribute.MirroredStrategy() |
| Gradient Accum | Batch besar, GPU kecil | accum.assign_add(grad) |
| Gradient Clipping | Cegah explosion | clipnorm=1.0 |
| Concept | What It Is | Key Code |
|---|---|---|
| Custom Training | Full control GradientTape | tape.gradient(loss, vars) |
| @tf.function | Compile to graph (2-10Ć faster) | @tf.function |
| Focal Loss | Loss for imbalanced data | class FocalLoss(Loss) |
| Contrastive Loss | Loss for similarity learning | class ContrastiveLoss(Loss) |
| F1 Score | Custom balanced metric | class F1Score(Metric) |
| Subclassing | Model with dynamic logic | class MyModel(keras.Model) |
| MirroredStrategy | Multi-GPU one machine | tf.distribute.MirroredStrategy() |
| Gradient Accum | Large batch, small GPU | accum.assign_add(grad) |
| Gradient Clipping | Prevent explosion | clipnorm=1.0 |
Page 6 ā Transformer & BERT di TensorFlow
Coming Next: Page 8 ā GAN & Generative Models
Membuat gambar dari nol! Page 8 membahas: arsitektur DCGAN lengkap (Generator Conv2DTranspose + Discriminator Conv2D), training loop adversarial, Variational Autoencoder (VAE) dan reparameterization trick, conditional GAN untuk generate kelas tertentu, Wasserstein GAN untuk training stabil, latent space interpolation, dan tips production GAN training.
Coming Next: Page 8 ā GAN & Generative Models
Creating images from scratch! Page 8 covers: complete DCGAN architecture (Generator Conv2DTranspose + Discriminator Conv2D), adversarial training loop, Variational Autoencoder (VAE) and reparameterization trick, conditional GAN for specific class generation, Wasserstein GAN for stable training, latent space interpolation, and production GAN training tips.