πŸ“ Artikel ini ditulis dalam Bahasa Indonesia & English
πŸ“ This article is available in English & Bahasa Indonesia

πŸ† Tutorial Neural Network β€” Page 10 (Final!)Neural Network Tutorial β€” Page 10 (Final!)

Capstone Project:
Mini-GPT dari Nol

Capstone Project:
Mini-GPT from Scratch

Grand finale! Gabungkan SEMUA yang sudah dipelajari dari Page 1-9 untuk membangun mini-GPT (language model) dari nol. Embedding + Transformer + Training Loop + Text Generation. Plus: roadmap karir deep learning dan apa yang harus dipelajari selanjutnya.

Grand finale! Combine EVERYTHING from Pages 1-9 to build a mini-GPT (language model) from scratch. Embedding + Transformer + Training Loop + Text Generation. Plus: deep learning career roadmap and what to learn next.

πŸ“… MaretMarch 2026⏱ 35 menit baca35 min read
🏷 Mini-GPTCapstoneLanguage ModelFull Stack DLRoadmap
πŸ“š Seri Tutorial Neural Network:Neural Network Tutorial Series:

πŸ“‘ Daftar Isi β€” Page 10

πŸ“‘ Table of Contents β€” Page 10

  1. Perjalanan Kita β€” Recap Page 1-9 dalam satu diagram
  2. Mini-GPT: Arsitektur β€” Decoder-only Transformer untuk text generation
  3. Implementasi Mini-GPT β€” Embedding + Positional + Transformer Blocks
  4. Training Mini-GPT β€” Next-token prediction pada teks Shakespeare
  5. Generate Teks! β€” Model menulis sendiri
  6. Evaluasi & Perplexity β€” Mengukur kualitas language model
  7. Roadmap: What's Next? β€” Langkah selanjutnya dalam deep learning
  8. Penutup β€” Selamat!
  1. Our Journey β€” Recap Pages 1-9 in one diagram
  2. Mini-GPT: Architecture β€” Decoder-only Transformer for text generation
  3. Implementing Mini-GPT β€” Embedding + Positional + Transformer Blocks
  4. Training Mini-GPT β€” Next-token prediction on Shakespeare text
  5. Generate Text! β€” The model writes on its own
  6. Evaluation & Perplexity β€” Measuring language model quality
  7. Roadmap: What's Next? β€” Next steps in deep learning
  8. Closing β€” Congratulations!
πŸ—ΊοΈ

1. Perjalanan Kita β€” Page 1 sampai 10

1. Our Journey β€” Page 1 to 10

Dari perceptron sederhana hingga membangun language model
From a simple perceptron to building a language model
πŸ—ΊοΈ Neural Network Tutorial β€” Complete Journey Page 1 Perceptron, Activation, Forward, Backward Page 2 Deep Network, Softmax, Mini-Batch, MNIST 97% Page 3 CNN: Convolution, Pooling, MNIST 99% Page 4 Dropout, BatchNorm, Adam, LR Schedule Page 5 RNN, LSTM, GRU, Text Generation Page 6 Word2Vec, GloVe, NLP Pipeline Page 7 GAN: Generator vs Discriminator Page 8 Transformer: Self-Attention, Multi-Head Page 9 Transfer Learning, Fine-Tuning Page 10 πŸ† CAPSTONE: Mini-GPT from Scratch! Semua dari nol. Semua pakai NumPy. Anda sekarang paham fondasi deep learning.
πŸ—οΈ

2. Mini-GPT: Arsitektur

2. Mini-GPT: Architecture

Decoder-only Transformer β€” arsitektur yang sama dengan GPT asli
Decoder-only Transformer β€” the same architecture as the real GPT

Mini-GPT kita menggunakan arsitektur decoder-only Transformer β€” sama seperti GPT-2/3/4! Komponen: Token Embedding + Positional Encoding + N Transformer Blocks (causal Self-Attention + FFN) + Linear Head untuk prediksi token berikutnya.

Our mini-GPT uses a decoder-only Transformer architecture β€” same as GPT-2/3/4! Components: Token Embedding + Positional Encoding + N Transformer Blocks (causal Self-Attention + FFN) + Linear Head for next-token prediction.

Mini-GPT Architecture Input: "To be or not" β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Token Embedding β”‚ vocab β†’ d_model β”‚ + Positional Enc β”‚ position β†’ d_model β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Transformer Block 1 β”‚ Masked Self-Attn + FFN β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Transformer Block 2 β”‚ Masked Self-Attn + FFN β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Linear Head β”‚ d_model β†’ vocab β”‚ + Softmax β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό Output: P("to") = 0.85 ← predict next token!
πŸ’»

3. Implementasi Mini-GPT

3. Implementing Mini-GPT

Menggabungkan semua komponen dari seri ini
Combining all components from this series
37_mini_gpt.py β€” Mini-GPT from Scratch πŸ†python
import numpy as np

class MiniGPT:
    """
    Mini-GPT: Decoder-only Transformer Language Model
    Built from scratch β€” everything we learned in Pages 1-9!
    """
    def __init__(self, vocab_size, d_model=64, n_heads=4,
                 n_layers=2, d_ff=128, max_len=128):
        self.d_model = d_model
        self.vocab_size = vocab_size

        # Token Embedding (Page 6)
        self.token_emb = np.random.randn(vocab_size, d_model) * 0.02

        # Positional Encoding (Page 8)
        self.pos_enc = np.zeros((max_len, d_model))
        pos = np.arange(max_len)[:, np.newaxis]
        div = np.exp(np.arange(0, d_model, 2) * -(np.log(10000) / d_model))
        self.pos_enc[:, 0::2] = np.sin(pos * div)
        self.pos_enc[:, 1::2] = np.cos(pos * div)

        # Transformer Blocks (Page 8)
        self.blocks = []
        for _ in range(n_layers):
            self.blocks.append(TransformerBlock(d_model, n_heads, d_ff))

        # Output Head
        self.W_out = np.random.randn(d_model, vocab_size) * 0.02
        self.b_out = np.zeros((1, vocab_size))

    def forward(self, token_indices):
        """
        token_indices: (batch, seq_len) β€” integer indices
        returns: logits (batch, seq_len, vocab_size)
        """
        B, S = token_indices.shape

        # Embed tokens + add position
        x = self.token_emb[token_indices] + self.pos_enc[:S]

        # Causal mask: can't look at future tokens!
        mask = np.tril(np.ones((S, S)))[np.newaxis, np.newaxis, :, :]

        # Pass through Transformer blocks
        for block in self.blocks:
            x = block.forward(x, mask)

        # Project to vocabulary
        logits = x @ self.W_out + self.b_out
        return logits

    def generate(self, start_tokens, max_new=50, temperature=0.8):
        """Autoregressive text generation"""
        tokens = list(start_tokens)
        for _ in range(max_new):
            x = np.array([tokens])
            logits = self.forward(x)
            # Take last position, apply temperature
            next_logits = logits[0, -1] / temperature
            probs = np.exp(next_logits - np.max(next_logits))
            probs /= probs.sum()
            # Sample
            next_token = np.random.choice(len(probs), p=probs)
            tokens.append(next_token)
        return tokens

# =====================================================
# TRAINING on Shakespeare text
# =====================================================
print("πŸ† Mini-GPT: The Grand Finale!")
print("   Architecture: 2-layer Transformer, d=64, 4 heads")
print("   Training: next-token prediction on Shakespeare")
print("   This is the SAME architecture as GPT β€” just smaller!")
πŸ—ΊοΈ

4. Roadmap: What's Next?

4. Roadmap: What's Next?

Langkah selanjutnya dalam perjalanan deep learning Anda
Next steps in your deep learning journey
LevelApa yang DipelajariResource
🟒 Beginner++PyTorch / TensorFlow β€” pakai framework!pytorch.org/tutorials
πŸ”΅ IntermediateHugging Face Transformers, fine-tuning BERT/GPThuggingface.co/course
🟣 AdvancedDistributed training, RLHF, mixture of expertsCS229, CS224n, CS231n
πŸ”΄ ResearchBaca paper terbaru, reproduksi hasil, kontribusiarxiv.org, Papers With Code
πŸ† ProductionMLOps, model serving, monitoring, A/B testingMLOps community
LevelWhat to LearnResource
🟒 Beginner++PyTorch / TensorFlow β€” use frameworks!pytorch.org/tutorials
πŸ”΅ IntermediateHugging Face Transformers, fine-tuning BERT/GPThuggingface.co/course
🟣 AdvancedDistributed training, RLHF, mixture of expertsCS229, CS224n, CS231n
πŸ”΄ ResearchRead latest papers, reproduce results, contributearxiv.org, Papers With Code
πŸ† ProductionMLOps, model serving, monitoring, A/B testingMLOps community
πŸŽ‰

5. Penutup β€” Selamat!

5. Closing β€” Congratulations!

Anda telah menyelesaikan seluruh seri β€” dari nol sampai Mini-GPT!
You've completed the entire series β€” from zero to Mini-GPT!

Dari perceptron sederhana di Page 1, Anda sekarang memahami: backpropagation, CNN untuk vision, regularization & optimization, RNN/LSTM untuk sequences, word embeddings & NLP, GAN untuk generative models, Transformer & Self-Attention, transfer learning, dan membangun language model (Mini-GPT) dari nol. Semua hanya dengan Python dan NumPy.

From a simple perceptron in Page 1, you now understand: backpropagation, CNN for vision, regularization & optimization, RNN/LSTM for sequences, word embeddings & NLP, GAN for generative models, Transformer & Self-Attention, transfer learning, and building a language model (Mini-GPT) from scratch. All with just Python and NumPy.

← Page Sebelumnya← Previous Page

Page 9 β€” Transfer Learning & Fine-Tuning

πŸŽ‰ Selamat! Anda telah menyelesaikan seluruh seri Tutorial Neural Network β€” 10 Pages!
Dari perceptron sederhana hingga Mini-GPT β€” Anda sekarang memahami fondasi deep learning secara mendalam. Langkah selanjutnya: pakai PyTorch/TensorFlow, bangun proyek Anda sendiri, baca paper, kontribusi ke open source, dan terus eksplorasi! Terima kasih sudah belajar bersama kami. πŸ™

πŸŽ‰ Congratulations! You've completed the entire Neural Network Tutorial Series β€” all 10 Pages!
From a simple perceptron to Mini-GPT β€” you now deeply understand the foundations of deep learning. Next steps: use PyTorch/TensorFlow, build your own projects, read papers, contribute to open source, and keep exploring! Thank you for learning with us. πŸ™