π Daftar Isi β Page 8
π Table of Contents β Page 8
- Konsep GAN β Generator vs Discriminator: permainan adversarial
- DCGAN Generator β Conv2DTranspose: noise β gambar
- DCGAN Discriminator β Conv2D: gambar β real/fake
- GAN Training Loop β Alternating D & G updates
- VAE β Variational Autoencoder & reparameterization
- Conditional GAN β Generate kelas tertentu
- WGAN β Wasserstein loss untuk stabilitas
- Latent Space β Interpolasi dan eksplorasi
- Tips Training GAN β Menghindari mode collapse
- Ringkasan & Preview Page 9
- GAN Concept β Generator vs Discriminator: adversarial game
- DCGAN Generator β Conv2DTranspose: noise β image
- DCGAN Discriminator β Conv2D: image β real/fake
- GAN Training Loop β Alternating D & G updates
- VAE β Variational Autoencoder & reparameterization
- Conditional GAN β Generate specific classes
- WGAN β Wasserstein loss for stability
- Latent Space β Interpolation and exploration
- GAN Training Tips β Avoiding mode collapse
- Summary & Page 9 Preview
1. Konsep GAN β Permainan Adversarial
1. GAN Concept β The Adversarial Game
Di seri Neural Network Page 7, kita membahas GAN dari nol. Sekarang kita implementasi di TensorFlow dengan arsitektur production-grade. Konsepnya tetap sama: Generator (G) membuat gambar palsu dari noise random, Discriminator (D) berusaha membedakan gambar asli dari palsu. Keduanya saling bersaing β G semakin jago membuat gambar realistis, D semakin jago mendeteksi palsu.
In Neural Network series Page 7, we discussed GAN from scratch. Now we implement it in TensorFlow with production-grade architecture. The concept remains the same: Generator (G) creates fake images from random noise, Discriminator (D) tries to distinguish real from fake. They compete against each other β G gets better at making realistic images, D gets better at detecting fakes.
2. DCGAN Generator β Noise β Gambar
2. DCGAN Generator β Noise β Image
import tensorflow as tf from tensorflow.keras import layers # =========================== # DCGAN Generator: noise (100,) β image (28, 28, 1) # Uses Conv2DTranspose for learned upsampling # =========================== def make_generator(noise_dim=100): model = tf.keras.Sequential([ # Step 1: Dense projection β noise β feature volume layers.Dense(7 * 7 * 256, use_bias=False, input_shape=(noise_dim,)), layers.BatchNormalization(), layers.LeakyReLU(0.2), layers.Reshape((7, 7, 256)), # β (7, 7, 256) # Step 2: Conv2DTranspose β upsample 7β7 (strides=1) layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False), layers.BatchNormalization(), layers.LeakyReLU(0.2), # β (7, 7, 128) # Step 3: Conv2DTranspose β upsample 7β14 layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False), layers.BatchNormalization(), layers.LeakyReLU(0.2), # β (14, 14, 64) # Step 4: Conv2DTranspose β upsample 14β28 (output!) layers.Conv2DTranspose(1, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'), # β (28, 28, 1) ], name="generator") return model G = make_generator() G.summary() # Total params: ~1.8M # Input: (batch, 100) β random noise # Output: (batch, 28, 28, 1) β generated MNIST-sized image # Activation tanh β output range [-1, 1] (normalize real data to match!) # Test noise = tf.random.normal([1, 100]) fake_image = G(noise, training=False) print(fake_image.shape) # (1, 28, 28, 1) β
π Conv2D vs Conv2DTranspose:
Conv2D: Downsampling β gambar menjadi lebih kecil. (32Γ32 β 16Γ16 dengan stride=2)
Conv2DTranspose: Upsampling β gambar menjadi lebih besar. (7Γ7 β 14Γ14 dengan stride=2)
Conv2DTranspose = "kebalikan" dari Conv2D β tapi dengan learnable weights (bukan hanya interpolasi biasa). Inilah yang memungkinkan Generator membuat gambar detail dari noise.
π Conv2D vs Conv2DTranspose:
Conv2D: Downsampling β image gets smaller. (32Γ32 β 16Γ16 with stride=2)
Conv2DTranspose: Upsampling β image gets larger. (7Γ7 β 14Γ14 with stride=2)
Conv2DTranspose = the "reverse" of Conv2D β but with learnable weights (not just simple interpolation). This is what allows the Generator to create detailed images from noise.
3. DCGAN Discriminator β Gambar β Real/Fake
3. DCGAN Discriminator β Image β Real/Fake
def make_discriminator(): model = tf.keras.Sequential([ # Conv Block 1: 28Γ28 β 14Γ14 layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same', input_shape=(28, 28, 1)), layers.LeakyReLU(0.2), layers.Dropout(0.3), # Conv Block 2: 14Γ14 β 7Γ7 layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'), layers.LeakyReLU(0.2), layers.Dropout(0.3), # Flatten & classify layers.Flatten(), # β (7*7*128) = 6272 layers.Dense(1), # β logit (NO sigmoid!) ], name="discriminator") return model D = make_discriminator() D.summary() # Total params: ~400k # Input: (batch, 28, 28, 1) β real or fake image # Output: (batch, 1) β logit (use from_logits=True in loss!) # IMPORTANT design choices: # 1. LeakyReLU (not ReLU) β prevents dead neurons in D # 2. NO BatchNorm in D (or use SpectralNorm) β BN can destabilize GAN # 3. NO sigmoid at output β use from_logits=True (numerically stable) # 4. Dropout in D β prevents D from getting too strong too fast
4. GAN Training Loop β Adversarial Training
4. GAN Training Loop β Adversarial Training
import tensorflow as tf from tensorflow import keras import numpy as np import time # =========================== # 1. Setup # =========================== NOISE_DIM = 100 BATCH_SIZE = 256 EPOCHS = 50 G = make_generator() D = make_discriminator() g_opt = keras.optimizers.Adam(2e-4, beta_1=0.5) # beta_1=0.5 for GANs! d_opt = keras.optimizers.Adam(2e-4, beta_1=0.5) bce = keras.losses.BinaryCrossentropy(from_logits=True) # Load MNIST (train_images, _), (_, _) = keras.datasets.mnist.load_data() train_images = train_images.reshape(-1, 28, 28, 1).astype('float32') train_images = (train_images - 127.5) / 127.5 # normalize to [-1, 1]! train_ds = (tf.data.Dataset.from_tensor_slices(train_images) .shuffle(60000).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)) # =========================== # 2. Training step # =========================== @tf.function def train_step(real_images): noise = tf.random.normal([BATCH_SIZE, NOISE_DIM]) with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape: # Generate fake images fake_images = G(noise, training=True) # Discriminator predictions real_output = D(real_images, training=True) fake_output = D(fake_images, training=True) # Discriminator loss: realβ1, fakeβ0 d_loss_real = bce(tf.ones_like(real_output), real_output) d_loss_fake = bce(tf.zeros_like(fake_output), fake_output) d_loss = d_loss_real + d_loss_fake # Generator loss: fool D β make D think fake is real g_loss = bce(tf.ones_like(fake_output), fake_output) # Update D d_grads = disc_tape.gradient(d_loss, D.trainable_variables) d_opt.apply_gradients(zip(d_grads, D.trainable_variables)) # Update G g_grads = gen_tape.gradient(g_loss, G.trainable_variables) g_opt.apply_gradients(zip(g_grads, G.trainable_variables)) return d_loss, g_loss # =========================== # 3. Training loop # =========================== seed_noise = tf.random.normal([16, NOISE_DIM]) # fixed noise for visualization for epoch in range(EPOCHS): start = time.time() for batch in train_ds: d_loss, g_loss = train_step(batch) elapsed = time.time() - start print(f"Epoch {epoch+1:3d}/{EPOCHS} ({elapsed:.1f}s) | " f"D_loss: {d_loss:.4f} | G_loss: {g_loss:.4f}") # Generate sample images every 5 epochs if (epoch + 1) % 5 == 0: generated = G(seed_noise, training=False) # save_grid(generated, f"gen_epoch_{epoch+1}.png") print(f" β Sample images saved") # Save models G.save("generator.keras") D.save("discriminator.keras") print("π¨ GAN training complete!")
5. Variational Autoencoder (VAE)
5. Variational Autoencoder (VAE)
VAE berbeda dari GAN: alih-alih adversarial training, VAE belajar mengkompresi data ke distribusi latent (encoding) lalu merekonstruksi (decoding). Loss = reconstruction loss + KL divergence (menjaga distribusi latent mendekati Gaussian).
VAE differs from GAN: instead of adversarial training, VAE learns to compress data to a latent distribution (encoding) then reconstruct (decoding). Loss = reconstruction loss + KL divergence (keeps latent distribution close to Gaussian).
import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers # =========================== # 1. Sampling layer (reparameterization trick) # =========================== class Sampling(layers.Layer): """z = mu + sigma * epsilon, where epsilon ~ N(0,1) This trick allows gradients to flow through the sampling step! """ def call(self, inputs): z_mean, z_log_var = inputs epsilon = tf.random.normal(shape=tf.shape(z_mean)) return z_mean + tf.exp(0.5 * z_log_var) * epsilon # =========================== # 2. Encoder: image β z_mean, z_log_var, z # =========================== LATENT_DIM = 16 encoder_input = keras.Input(shape=(28, 28, 1)) x = layers.Conv2D(32, 3, strides=2, padding='same', activation='relu')(encoder_input) x = layers.Conv2D(64, 3, strides=2, padding='same', activation='relu')(x) x = layers.Flatten()(x) x = layers.Dense(256, activation='relu')(x) z_mean = layers.Dense(LATENT_DIM, name='z_mean')(x) z_log_var = layers.Dense(LATENT_DIM, name='z_log_var')(x) z = Sampling()([z_mean, z_log_var]) encoder = keras.Model(encoder_input, [z_mean, z_log_var, z], name='encoder') # =========================== # 3. Decoder: z β image # =========================== decoder_input = keras.Input(shape=(LATENT_DIM,)) x = layers.Dense(7 * 7 * 64, activation='relu')(decoder_input) x = layers.Reshape((7, 7, 64))(x) x = layers.Conv2DTranspose(64, 3, strides=2, padding='same', activation='relu')(x) x = layers.Conv2DTranspose(32, 3, strides=2, padding='same', activation='relu')(x) decoder_output = layers.Conv2DTranspose(1, 3, padding='same', activation='sigmoid')(x) decoder = keras.Model(decoder_input, decoder_output, name='decoder') # =========================== # 4. VAE Model with custom train_step # =========================== class VAE(keras.Model): def __init__(self, encoder, decoder, **kwargs): super().__init__(**kwargs) self.encoder = encoder self.decoder = decoder def train_step(self, data): with tf.GradientTape() as tape: z_mean, z_log_var, z = self.encoder(data) reconstruction = self.decoder(z) # Reconstruction loss (how good is the reconstruction?) recon_loss = tf.reduce_mean( keras.losses.binary_crossentropy(data, reconstruction) ) * 28 * 28 # KL divergence (how close is latent to N(0,1)?) kl_loss = -0.5 * tf.reduce_mean( 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)) total_loss = recon_loss + kl_loss grads = tape.gradient(total_loss, self.trainable_variables) self.optimizer.apply_gradients(zip(grads, self.trainable_variables)) return {"loss": total_loss, "recon": recon_loss, "kl": kl_loss} # Train vae = VAE(encoder, decoder) vae.compile(optimizer=keras.optimizers.Adam(1e-3)) vae.fit(train_images_01, epochs=30, batch_size=128) # normalized [0,1] # Generate new images from random latent vectors! z_sample = tf.random.normal([16, LATENT_DIM]) generated = decoder(z_sample) # β 16 new MNIST-like images! π¨
| Aspek | GAN | VAE |
|---|---|---|
| Training | Adversarial (2 models) | Reconstruction + KL (1 model) |
| Image Quality | Sharper, lebih realistis | Blurrier, smoother |
| Training Stability | Sulit (mode collapse) | Stabil dan konsisten |
| Latent Space | Tidak terstruktur | Smooth, bisa interpolasi |
| Likelihood | Tidak bisa menghitung | Bisa (ELBO) |
| Best For | Realistis image generation | Reconstruction, interpolasi |
| Aspect | GAN | VAE |
|---|---|---|
| Training | Adversarial (2 models) | Reconstruction + KL (1 model) |
| Image Quality | Sharper, more realistic | Blurrier, smoother |
| Training Stability | Difficult (mode collapse) | Stable and consistent |
| Latent Space | Unstructured | Smooth, interpolatable |
| Likelihood | Cannot compute | Can compute (ELBO) |
| Best For | Realistic image generation | Reconstruction, interpolation |
6. Conditional GAN β Generate Kelas Tertentu
6. Conditional GAN β Generate Specific Classes
import tensorflow as tf from tensorflow.keras import layers # =========================== # Conditional GAN: noise + label β specific class image # =========================== # Generator: (noise, label) β image def make_cgan_generator(noise_dim=100, num_classes=10): # Noise input noise_in = layers.Input(shape=(noise_dim,)) # Label input (one-hot or embedding) label_in = layers.Input(shape=(num_classes,)) # Concatenate noise + label combined = layers.Concatenate()([noise_in, label_in]) # (110,) x = layers.Dense(7*7*128, activation='relu')(combined) x = layers.Reshape((7, 7, 128))(x) x = layers.Conv2DTranspose(64, 5, strides=2, padding='same', activation='relu')(x) x = layers.BatchNormalization()(x) img = layers.Conv2DTranspose(1, 5, strides=2, padding='same', activation='tanh')(x) return tf.keras.Model([noise_in, label_in], img, name='cgan_gen') # Generate specific digit: # noise = tf.random.normal([1, 100]) # label_7 = tf.one_hot([7], 10) # "generate a 7" # fake_7 = generator([noise, label_7]) # β Generates an image that looks like the digit 7! π―
7. Wasserstein GAN (WGAN) β Training Stabil
7. Wasserstein GAN (WGAN) β Stable Training
import tensorflow as tf # =========================== # WGAN: replace BCE with Wasserstein distance # Critic (not "discriminator") outputs unbounded score # =========================== # Critic loss: maximize E[C(real)] - E[C(fake)] def critic_loss(real_output, fake_output): return tf.reduce_mean(fake_output) - tf.reduce_mean(real_output) # Generator loss: minimize -E[C(fake)] def generator_loss(fake_output): return -tf.reduce_mean(fake_output) # WGAN-GP: gradient penalty (replaces weight clipping) def gradient_penalty(critic, real, fake, batch_size): alpha = tf.random.uniform([batch_size, 1, 1, 1]) interpolated = alpha * real + (1 - alpha) * fake with tf.GradientTape() as tape: tape.watch(interpolated) pred = critic(interpolated, training=True) grads = tape.gradient(pred, interpolated) norm = tf.sqrt(tf.reduce_sum(tf.square(grads), axis=[1, 2, 3]) + 1e-8) penalty = tf.reduce_mean((norm - 1.0) ** 2) return penalty # Training: update Critic 5Γ per Generator update # for each batch: # for _ in range(5): # train critic more! # c_loss = critic_loss + 10 * gradient_penalty # g_loss = generator_loss # WGAN advantages over standard GAN: # β Meaningful loss curve (correlates with image quality!) # β No mode collapse # β More stable training # β No need to carefully balance G and D
8. Latent Space β Interpolasi dan Eksplorasi
8. Latent Space β Interpolation and Exploration
import tensorflow as tf import numpy as np import matplotlib.pyplot as plt # =========================== # 1. Linear interpolation between two points # =========================== z1 = tf.random.normal([1, 100]) # point A in latent space z2 = tf.random.normal([1, 100]) # point B in latent space # Generate images along the path from A to B steps = 10 images = [] for alpha in np.linspace(0, 1, steps): z = z1 * (1 - alpha) + z2 * alpha # linear interpolation (LERP) img = generator(z, training=False) images.append(img[0]) # Show: smooth transition from image A to image B! fig, axes = plt.subplots(1, steps, figsize=(20, 2)) for i, ax in enumerate(axes): ax.imshow(images[i][:, :, 0], cmap='gray') ax.axis('off') plt.suptitle('Latent Space Interpolation') plt.show() # =========================== # 2. Spherical interpolation (SLERP β better for high-dim) # =========================== def slerp(z1, z2, alpha): """Spherical linear interpolation β better than LERP for latent space""" z1_norm = z1 / (tf.norm(z1) + 1e-8) z2_norm = z2 / (tf.norm(z2) + 1e-8) omega = tf.acos(tf.clip_by_value( tf.reduce_sum(z1_norm * z2_norm), -1.0, 1.0)) so = tf.sin(omega) return tf.sin((1 - alpha) * omega) / so * z1 + tf.sin(alpha * omega) / so * z2 # =========================== # 3. Latent space arithmetic (like Word2Vec for images!) # =========================== # With a face GAN: # z_man_glasses = encoder(man_with_glasses) # z_man = encoder(man_without_glasses) # z_woman = encoder(woman_without_glasses) # z_woman_glasses = z_woman + (z_man_glasses - z_man) # result = decoder(z_woman_glasses) β woman with glasses! β¨ # =========================== # 4. 2D latent grid visualization (for VAE with latent_dim=2) # =========================== # grid_x = np.linspace(-3, 3, 20) # grid_y = np.linspace(-3, 3, 20) # for i, xi in enumerate(grid_x): # for j, yi in enumerate(grid_y): # z = np.array([[xi, yi]]) # img = decoder.predict(z) β shows entire manifold of digits!
9. Tips Training GAN β Menghindari Mode Collapse
9. GAN Training Tips β Avoiding Mode Collapse
π 10 Tips Training GAN Stabil:
1. Normalize images ke [-1, 1], gunakan tanh di output Generator
2. Gunakan LeakyReLU (0.2) di Discriminator, bukan ReLU
3. Jangan pakai BatchNorm di D (atau pakai SpectralNorm)
4. from_logits=True di loss β numerically lebih stabil
5. Label smoothing: real labels = 0.9 bukan 1.0
6. Adam beta_1=0.5 (bukan default 0.9) β empirically better
7. LR = 2e-4 untuk kedua model β jangan terlalu beda
8. Train D lebih banyak jika G loss collapse (WGAN: 5Γ)
9. Monitor kedua loss β jika D_loss β 0, D terlalu kuat
10. Gunakan WGAN-GP jika standard GAN tidak stabil
π 10 Tips for Stable GAN Training:
1. Normalize images to [-1, 1], use tanh in Generator output
2. Use LeakyReLU (0.2) in Discriminator, not ReLU
3. No BatchNorm in D (or use SpectralNorm)
4. from_logits=True in loss β numerically more stable
5. Label smoothing: real labels = 0.9 not 1.0
6. Adam beta_1=0.5 (not default 0.9) β empirically better
7. LR = 2e-4 for both models β don't make them too different
8. Train D more if G loss collapses (WGAN: 5Γ)
9. Monitor both losses β if D_loss β 0, D is too strong
10. Use WGAN-GP if standard GAN is unstable
10. Ringkasan Page 8
10. Page 8 Summary
| Konsep | Apa Itu | Kode Kunci |
|---|---|---|
| Generator | Noise β fake image | Conv2DTranspose, tanh output |
| Discriminator | Image β real/fake score | Conv2D, LeakyReLU, no sigmoid |
| DCGAN | CNN-based GAN | ConvT(strides=2) + Conv(strides=2) |
| VAE | Encoder + Decoder + KL | Sampling(z_mean, z_log_var) |
| cGAN | GAN + class conditioning | Concatenate([noise, label]) |
| WGAN-GP | Wasserstein distance + gradient penalty | critic_loss + 10 * gp |
| Interpolation | Smooth morphing antar gambar | z = z1*(1-Ξ±) + z2*Ξ± |
| Concept | What It Is | Key Code |
|---|---|---|
| Generator | Noise β fake image | Conv2DTranspose, tanh output |
| Discriminator | Image β real/fake score | Conv2D, LeakyReLU, no sigmoid |
| DCGAN | CNN-based GAN | ConvT(strides=2) + Conv(strides=2) |
| VAE | Encoder + Decoder + KL | Sampling(z_mean, z_log_var) |
| cGAN | GAN + class conditioning | Concatenate([noise, label]) |
| WGAN-GP | Wasserstein distance + gradient penalty | critic_loss + 10 * gp |
| Interpolation | Smooth morphing between images | z = z1*(1-Ξ±) + z2*Ξ± |
Page 7 β Custom Training & Advanced Keras
Coming Next: Page 9 β TF Serving & Deployment
Model di notebook tidak berguna sampai di-deploy! Page 9 membahas: SavedModel format, TF Serving REST & gRPC API, TFLite untuk Android/iOS, TF.js untuk browser, Docker containerization, model versioning & A/B testing, dan monitoring production models.
Coming Next: Page 9 β TF Serving & Deployment
A model in a notebook is useless until deployed! Page 9 covers: SavedModel format, TF Serving REST & gRPC API, TFLite for Android/iOS, TF.js for browser, Docker containerization, model versioning & A/B testing, and production model monitoring.