📝 Artikel ini ditulis dalam Bahasa Indonesia
🔥 Seri Belajar PyTorch Part 7

Generative AI: GANs & Autoencoders

Dari classify ke create! Part 7 mengajarkan model yang bisa menghasilkan gambar baru dari noise: Autoencoder untuk kompresi & rekonstruksi, Variational Autoencoder (VAE) untuk generasi terkontrol, dan DCGAN — arsitektur yang pertama kali menghasilkan wajah realistis dari random noise.

📅 Maret 2026⏱ 28 menit baca🏷 GAN • DCGAN • Autoencoder • VAE • Generative
📚 Seri Belajar PyTorch:
1 2 3 4 5 6 7 8 9 10

📑 Daftar Isi — Part 7

  1. Discriminative vs Generative — Classify vs Create
  2. Autoencoder — Kompresi & rekonstruksi
  3. VAE — Variational Autoencoder untuk generasi
  4. GAN: Konsep — Generator vs Discriminator
  5. Kode: DCGAN — Generate MNIST digits dari noise
  6. Training Tips — GANs terkenal susah di-train
  7. Evolusi Generative Models
  8. Ringkasan & Preview Part 8
🎨

1. Discriminative vs Generative

Classify gambar kucing vs MEMBUAT gambar kucing baru

📋 Discriminative (Part 1-5)

Input gambar → Output label. "Ini kucing atau anjing?" Model MENILAI. P(y|x). Contoh: CNN, LSTM, BERT.

🎨 Generative (Part 7)

Input noise → Output gambar BARU. "Buat gambar kucing yang belum pernah ada." Model MENCIPTAKAN. P(x). Contoh: GAN, VAE, Diffusion.

🔄

2. Autoencoder — Kompresi & Rekonstruksi

Compress gambar ke vektor kecil, lalu reconstruct kembali

🔄 Autoencoder — Encoder (Compress) → Latent Space → Decoder (Reconstruct)

728×28 = 784Input 🔒 ENCODER 784 → 256 → 64 → 16 Compress: buang noise, simpan esensi Latent z = 16 dim 🧬 "DNA" gambar 🔓 DECODER 16 → 64 → 256 → 784 Reconstruct: bangun kembali gambar 728×28 ≈ inputOutput Loss = MSE(input, output) → model belajar reconstruct gambar melalui bottleneck
25_autoencoder.py
class Autoencoder(nn.Module): def __init__(self): super().__init__() self.encoder = nn.Sequential( nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 64), nn.ReLU(), nn.Linear(64, 16), # → Latent space (16 dim) ) self.decoder = nn.Sequential( nn.Linear(16, 64), nn.ReLU(), nn.Linear(64, 256), nn.ReLU(), nn.Linear(256, 784), nn.Sigmoid(), # → pixel [0,1] ) def forward(self, x): z = self.encoder(x) # Compress return self.decoder(z) # Reconstruct # Loss: seberapa mirip output dengan input loss_fn = nn.MSELoss()
⚔️

4. GAN — Generator vs Discriminator

Pemalsu vs Detektif: dua network saling "bertarung"

⚔️ GAN — Adversarial Training: Generator vs Discriminator

z ~ N(0,1) Random noise Input 🎨 GENERATOR "Pemalsu uang" Noise → Fake image Goal: tipu Discriminator! 7 FAKE 7 REAL 🔍 DISCRIMINATOR "Detektif uang palsu" Image → Real or Fake? Goal: bedakan real vs fake! P(real) ∈ [0, 1] Real: 0.92 Fake: 0.15 Seiring training: Generator makin jago bikin fake, Discriminator makin jago mendeteksi → keduanya improve!

💡 Analogi: Pemalsu vs Detektif

Generator = pemalsu uang yang belajar membuat uang palsu semakin mirip asli. Discriminator = detektif yang belajar membedakan uang asli dan palsu. Keduanya saling "bertarung" — dan seiring waktu, kedua pihak makin jago. Akhirnya, Generator menghasilkan gambar yang tidak bisa dibedakan dari gambar asli, bahkan oleh Discriminator.

💻

5. Kode: DCGAN — Generate MNIST

Deep Convolutional GAN: generate angka dari random noise
26_dcgan.py — Generator & Discriminator
import torch import torch.nn as nn # ===== GENERATOR: Noise → Image ===== class Generator(nn.Module): def __init__(self, latent_dim=100): super().__init__() self.net = nn.Sequential( # z: [batch, 100] → reshape ke [batch, 100, 1, 1] nn.ConvTranspose2d(100, 256, 7, 1, 0), # → 7×7 nn.BatchNorm2d(256), nn.ReLU(), nn.ConvTranspose2d(256, 128, 4, 2, 1), # → 14×14 nn.BatchNorm2d(128), nn.ReLU(), nn.ConvTranspose2d(128, 1, 4, 2, 1), # → 28×28 nn.Tanh() # Output: [-1, 1] ) def forward(self, z): return self.net(z.view(-1, 100, 1, 1)) # ===== DISCRIMINATOR: Image → Real/Fake ===== class Discriminator(nn.Module): def __init__(self): super().__init__() self.net = nn.Sequential( nn.Conv2d(1, 64, 4, 2, 1), # 28→14 nn.LeakyReLU(0.2), nn.Conv2d(64, 128, 4, 2, 1), # 14→7 nn.BatchNorm2d(128), nn.LeakyReLU(0.2), nn.Flatten(), nn.Linear(128*7*7, 1), nn.Sigmoid() # → P(real) ∈ [0, 1] ) def forward(self, x): return self.net(x) # ===== TRAINING LOOP ===== G = Generator() D = Discriminator() loss_fn = nn.BCELoss() opt_G = torch.optim.Adam(G.parameters(), lr=2e-4, betas=(0.5, 0.999)) opt_D = torch.optim.Adam(D.parameters(), lr=2e-4, betas=(0.5, 0.999)) for epoch in range(50): for real_imgs, _ in dataloader: batch = real_imgs.size(0) real_labels = torch.ones(batch, 1) fake_labels = torch.zeros(batch, 1) # --- Train Discriminator --- z = torch.randn(batch, 100) fake_imgs = G(z).detach() loss_D = loss_fn(D(real_imgs), real_labels) + \ loss_fn(D(fake_imgs), fake_labels) opt_D.zero_grad(); loss_D.backward(); opt_D.step() # --- Train Generator --- z = torch.randn(batch, 100) fake_imgs = G(z) loss_G = loss_fn(D(fake_imgs), real_labels) # Trick D! opt_G.zero_grad(); loss_G.backward(); opt_G.step() # Epoch 1: Noisy blobs # Epoch 10: Blurry shapes resembling digits # Epoch 30: Clear, readable handwritten digits! ✅
⚠️

6. GAN Training Tips

GANs terkenal susah dan tidak stabil. Ini cara mengatasinya.
MasalahGejalaSolusi
Mode CollapseGenerator hanya produce 1-2 jenis gambarLabel smoothing, minibatch discrimination
Training OscillationLoss naik-turun terus, tidak convergeTwo-Timescale Update Rule (TTUR)
Vanishing GradientsG berhenti belajarWasserstein loss (WGAN)
D terlalu kuatD selalu menang, G tidak bisa improveTrain D lebih jarang, learning rate rendah
📈

7. Evolusi Generative Models

Dari GAN → Diffusion → Foundation Models
TahunModelBreakthroughQuality
2014GAN (Goodfellow)Adversarial training conceptBlurry
2016DCGANCNN-based GAN, stable trainingDecent
2018StyleGAN (NVIDIA)Photorealistic facesExcellent
2020DDPMDiffusion modelsExcellent
2022Stable DiffusionText-to-image, open sourceAmazing
2024FLUX, SD3Consistency, quality, speedNear-perfect
2025-26Video Gen (Sora, Kling)Text-to-video generationRevolutionary
📝

8. Ringkasan Part 7

Generative AI fundamentals
KonsepApa ItuKode Kunci
AutoencoderCompress → latent → reconstructEncoder + Decoder + MSELoss
VAEAE + sampling dari distribusiμ, σ → reparameterization trick
GeneratorNoise → fake imageConvTranspose2d (upsampling)
DiscriminatorImage → real/fake?Conv2d → Sigmoid
BCELossBinary cross-entropynn.BCELoss()
AdversarialG dan D saling "bertarung"Alternating optimization
🔥
Tech Review Desk — Seri Belajar PyTorch
Sumber: pytorch.org, papers: Goodfellow et al. 2014, Radford et al. 2016 (DCGAN), Kingma & Welling 2014 (VAE).
📧 rominur@gmail.com  •  ✈️ t.me/Jekardah_AI