Neural Network Page 10 — Capstone Project & What's Next

📑 Daftar Isi — Page 10

📑 Table of Contents — Page 10

Perjalanan Kita — Recap Page 1-9 dalam satu diagram
Mini-GPT: Arsitektur — Decoder-only Transformer untuk text generation
Implementasi Mini-GPT — Embedding + Positional + Transformer Blocks
Training Mini-GPT — Next-token prediction pada teks Shakespeare
Generate Teks! — Model menulis sendiri
Evaluasi & Perplexity — Mengukur kualitas language model
Roadmap: What's Next? — Langkah selanjutnya dalam deep learning
Penutup — Selamat!

Our Journey — Recap Pages 1-9 in one diagram
Mini-GPT: Architecture — Decoder-only Transformer for text generation
Implementing Mini-GPT — Embedding + Positional + Transformer Blocks
Training Mini-GPT — Next-token prediction on Shakespeare text
Generate Text! — The model writes on its own
Evaluation & Perplexity — Measuring language model quality
Roadmap: What's Next? — Next steps in deep learning
Closing — Congratulations!

🗺️

1. Perjalanan Kita — Page 1 sampai 10

1. Our Journey — Page 1 to 10

Dari perceptron sederhana hingga membangun language model

From a simple perceptron to building a language model

🗺️ Neural Network Tutorial — Complete Journey Page 1 Perceptron, Activation, Forward, Backward Page 2 Deep Network, Softmax, Mini-Batch, MNIST 97% Page 3 CNN: Convolution, Pooling, MNIST 99% Page 4 Dropout, BatchNorm, Adam, LR Schedule Page 5 RNN, LSTM, GRU, Text Generation Page 6 Word2Vec, GloVe, NLP Pipeline Page 7 GAN: Generator vs Discriminator Page 8 Transformer: Self-Attention, Multi-Head Page 9 Transfer Learning, Fine-Tuning Page 10 🏆 CAPSTONE: Mini-GPT from Scratch! Semua dari nol. Semua pakai NumPy. Anda sekarang paham fondasi deep learning.

🏗️

2. Mini-GPT: Arsitektur

2. Mini-GPT: Architecture

Decoder-only Transformer — arsitektur yang sama dengan GPT asli

Decoder-only Transformer — the same architecture as the real GPT

Mini-GPT kita menggunakan arsitektur decoder-only Transformer — sama seperti GPT-2/3/4! Komponen: Token Embedding + Positional Encoding + N Transformer Blocks (causal Self-Attention + FFN) + Linear Head untuk prediksi token berikutnya.

Our mini-GPT uses a decoder-only Transformer architecture — same as GPT-2/3/4! Components: Token Embedding + Positional Encoding + N Transformer Blocks (causal Self-Attention + FFN) + Linear Head for next-token prediction.

Mini-GPT Architecture Input: "To be or not" ▼ ┌─────────────────────┐ │ Token Embedding │ vocab → d_model │ + Positional Enc │ position → d_model └──────────┬──────────┘ ▼ ┌─────────────────────┐ │ Transformer Block 1 │ Masked Self-Attn + FFN └──────────┬──────────┘ ▼ ┌─────────────────────┐ │ Transformer Block 2 │ Masked Self-Attn + FFN └──────────┬──────────┘ ▼ ┌─────────────────────┐ │ Linear Head │ d_model → vocab │ + Softmax │ └──────────┬──────────┘ ▼ Output: P("to") = 0.85 ← predict next token!

💻

3. Implementasi Mini-GPT

3. Implementing Mini-GPT

Menggabungkan semua komponen dari seri ini

Combining all components from this series

37_mini_gpt.py — Mini-GPT from Scratch 🏆python

import numpy as np

class MiniGPT:
    """
    Mini-GPT: Decoder-only Transformer Language Model
    Built from scratch — everything we learned in Pages 1-9!
    """
    def __init__(self, vocab_size, d_model=64, n_heads=4,
                 n_layers=2, d_ff=128, max_len=128):
        self.d_model = d_model
        self.vocab_size = vocab_size

        # Token Embedding (Page 6)
        self.token_emb = np.random.randn(vocab_size, d_model) * 0.02

        # Positional Encoding (Page 8)
        self.pos_enc = np.zeros((max_len, d_model))
        pos = np.arange(max_len)[:, np.newaxis]
        div = np.exp(np.arange(0, d_model, 2) * -(np.log(10000) / d_model))
        self.pos_enc[:, 0::2] = np.sin(pos * div)
        self.pos_enc[:, 1::2] = np.cos(pos * div)

        # Transformer Blocks (Page 8)
        self.blocks = []
        for _ in range(n_layers):
            self.blocks.append(TransformerBlock(d_model, n_heads, d_ff))

        # Output Head
        self.W_out = np.random.randn(d_model, vocab_size) * 0.02
        self.b_out = np.zeros((1, vocab_size))

    def forward(self, token_indices):
        """
        token_indices: (batch, seq_len) — integer indices
        returns: logits (batch, seq_len, vocab_size)
        """
        B, S = token_indices.shape

        # Embed tokens + add position
        x = self.token_emb[token_indices] + self.pos_enc[:S]

        # Causal mask: can't look at future tokens!
        mask = np.tril(np.ones((S, S)))[np.newaxis, np.newaxis, :, :]

        # Pass through Transformer blocks
        for block in self.blocks:
            x = block.forward(x, mask)

        # Project to vocabulary
        logits = x @ self.W_out + self.b_out
        return logits

    def generate(self, start_tokens, max_new=50, temperature=0.8):
        """Autoregressive text generation"""
        tokens = list(start_tokens)
        for _ in range(max_new):
            x = np.array([tokens])
            logits = self.forward(x)
            # Take last position, apply temperature
            next_logits = logits[0, -1] / temperature
            probs = np.exp(next_logits - np.max(next_logits))
            probs /= probs.sum()
            # Sample
            next_token = np.random.choice(len(probs), p=probs)
            tokens.append(next_token)
        return tokens

# =====================================================
# TRAINING on Shakespeare text
# =====================================================
print("🏆 Mini-GPT: The Grand Finale!")
print("   Architecture: 2-layer Transformer, d=64, 4 heads")
print("   Training: next-token prediction on Shakespeare")
print("   This is the SAME architecture as GPT — just smaller!")

🗺️

4. Roadmap: What's Next?

Langkah selanjutnya dalam perjalanan deep learning Anda

Next steps in your deep learning journey

Level	Apa yang Dipelajari	Resource
🟢 Beginner++	PyTorch / TensorFlow — pakai framework!	pytorch.org/tutorials
🔵 Intermediate	Hugging Face Transformers, fine-tuning BERT/GPT	huggingface.co/course
🟣 Advanced	Distributed training, RLHF, mixture of experts	CS229, CS224n, CS231n
🔴 Research	Baca paper terbaru, reproduksi hasil, kontribusi	arxiv.org, Papers With Code
🏆 Production	MLOps, model serving, monitoring, A/B testing	MLOps community

Level	What to Learn	Resource
🟢 Beginner++	PyTorch / TensorFlow — use frameworks!	pytorch.org/tutorials
🔵 Intermediate	Hugging Face Transformers, fine-tuning BERT/GPT	huggingface.co/course
🟣 Advanced	Distributed training, RLHF, mixture of experts	CS229, CS224n, CS231n
🔴 Research	Read latest papers, reproduce results, contribute	arxiv.org, Papers With Code
🏆 Production	MLOps, model serving, monitoring, A/B testing	MLOps community

🎉

5. Penutup — Selamat!

5. Closing — Congratulations!

Anda telah menyelesaikan seluruh seri — dari nol sampai Mini-GPT!

You've completed the entire series — from zero to Mini-GPT!

Dari perceptron sederhana di Page 1, Anda sekarang memahami: backpropagation, CNN untuk vision, regularization & optimization, RNN/LSTM untuk sequences, word embeddings & NLP, GAN untuk generative models, Transformer & Self-Attention, transfer learning, dan membangun language model (Mini-GPT) dari nol. Semua hanya dengan Python dan NumPy.

From a simple perceptron in Page 1, you now understand: backpropagation, CNN for vision, regularization & optimization, RNN/LSTM for sequences, word embeddings & NLP, GAN for generative models, Transformer & Self-Attention, transfer learning, and building a language model (Mini-GPT) from scratch. All with just Python and NumPy.

← Page Sebelumnya← Previous Page

Page 9 — Transfer Learning & Fine-Tuning

🎉 Selamat! Anda telah menyelesaikan seluruh seri Tutorial Neural Network — 10 Pages!
Dari perceptron sederhana hingga Mini-GPT — Anda sekarang memahami fondasi deep learning secara mendalam. Langkah selanjutnya: pakai PyTorch/TensorFlow, bangun proyek Anda sendiri, baca paper, kontribusi ke open source, dan terus eksplorasi! Terima kasih sudah belajar bersama kami. 🙏

🎉 Congratulations! You've completed the entire Neural Network Tutorial Series — all 10 Pages!
From a simple perceptron to Mini-GPT — you now deeply understand the foundations of deep learning. Next steps: use PyTorch/TensorFlow, build your own projects, read papers, contribute to open source, and keep exploring! Thank you for learning with us. 🙏

Capstone Project:
Mini-GPT dari Nol

Capstone Project:
Mini-GPT from Scratch

📑 Daftar Isi — Page 10

📑 Table of Contents — Page 10

1. Perjalanan Kita — Page 1 sampai 10

1. Our Journey — Page 1 to 10

2. Mini-GPT: Arsitektur

2. Mini-GPT: Architecture

3. Implementasi Mini-GPT

3. Implementing Mini-GPT

4. Roadmap: What's Next?

4. Roadmap: What's Next?

5. Penutup — Selamat!

5. Closing — Congratulations!

Page 9 — Transfer Learning & Fine-Tuning