π Daftar Isi β Page 10
π Table of Contents β Page 10
- Perjalanan Kita β Recap Page 1-9 dalam satu diagram
- Mini-GPT: Arsitektur β Decoder-only Transformer untuk text generation
- Implementasi Mini-GPT β Embedding + Positional + Transformer Blocks
- Training Mini-GPT β Next-token prediction pada teks Shakespeare
- Generate Teks! β Model menulis sendiri
- Evaluasi & Perplexity β Mengukur kualitas language model
- Roadmap: What's Next? β Langkah selanjutnya dalam deep learning
- Penutup β Selamat!
- Our Journey β Recap Pages 1-9 in one diagram
- Mini-GPT: Architecture β Decoder-only Transformer for text generation
- Implementing Mini-GPT β Embedding + Positional + Transformer Blocks
- Training Mini-GPT β Next-token prediction on Shakespeare text
- Generate Text! β The model writes on its own
- Evaluation & Perplexity β Measuring language model quality
- Roadmap: What's Next? β Next steps in deep learning
- Closing β Congratulations!
1. Perjalanan Kita β Page 1 sampai 10
1. Our Journey β Page 1 to 10
2. Mini-GPT: Arsitektur
2. Mini-GPT: Architecture
Mini-GPT kita menggunakan arsitektur decoder-only Transformer β sama seperti GPT-2/3/4! Komponen: Token Embedding + Positional Encoding + N Transformer Blocks (causal Self-Attention + FFN) + Linear Head untuk prediksi token berikutnya.
Our mini-GPT uses a decoder-only Transformer architecture β same as GPT-2/3/4! Components: Token Embedding + Positional Encoding + N Transformer Blocks (causal Self-Attention + FFN) + Linear Head for next-token prediction.
3. Implementasi Mini-GPT
3. Implementing Mini-GPT
import numpy as np class MiniGPT: """ Mini-GPT: Decoder-only Transformer Language Model Built from scratch β everything we learned in Pages 1-9! """ def __init__(self, vocab_size, d_model=64, n_heads=4, n_layers=2, d_ff=128, max_len=128): self.d_model = d_model self.vocab_size = vocab_size # Token Embedding (Page 6) self.token_emb = np.random.randn(vocab_size, d_model) * 0.02 # Positional Encoding (Page 8) self.pos_enc = np.zeros((max_len, d_model)) pos = np.arange(max_len)[:, np.newaxis] div = np.exp(np.arange(0, d_model, 2) * -(np.log(10000) / d_model)) self.pos_enc[:, 0::2] = np.sin(pos * div) self.pos_enc[:, 1::2] = np.cos(pos * div) # Transformer Blocks (Page 8) self.blocks = [] for _ in range(n_layers): self.blocks.append(TransformerBlock(d_model, n_heads, d_ff)) # Output Head self.W_out = np.random.randn(d_model, vocab_size) * 0.02 self.b_out = np.zeros((1, vocab_size)) def forward(self, token_indices): """ token_indices: (batch, seq_len) β integer indices returns: logits (batch, seq_len, vocab_size) """ B, S = token_indices.shape # Embed tokens + add position x = self.token_emb[token_indices] + self.pos_enc[:S] # Causal mask: can't look at future tokens! mask = np.tril(np.ones((S, S)))[np.newaxis, np.newaxis, :, :] # Pass through Transformer blocks for block in self.blocks: x = block.forward(x, mask) # Project to vocabulary logits = x @ self.W_out + self.b_out return logits def generate(self, start_tokens, max_new=50, temperature=0.8): """Autoregressive text generation""" tokens = list(start_tokens) for _ in range(max_new): x = np.array([tokens]) logits = self.forward(x) # Take last position, apply temperature next_logits = logits[0, -1] / temperature probs = np.exp(next_logits - np.max(next_logits)) probs /= probs.sum() # Sample next_token = np.random.choice(len(probs), p=probs) tokens.append(next_token) return tokens # ===================================================== # TRAINING on Shakespeare text # ===================================================== print("π Mini-GPT: The Grand Finale!") print(" Architecture: 2-layer Transformer, d=64, 4 heads") print(" Training: next-token prediction on Shakespeare") print(" This is the SAME architecture as GPT β just smaller!")
4. Roadmap: What's Next?
4. Roadmap: What's Next?
| Level | Apa yang Dipelajari | Resource |
|---|---|---|
| π’ Beginner++ | PyTorch / TensorFlow β pakai framework! | pytorch.org/tutorials |
| π΅ Intermediate | Hugging Face Transformers, fine-tuning BERT/GPT | huggingface.co/course |
| π£ Advanced | Distributed training, RLHF, mixture of experts | CS229, CS224n, CS231n |
| π΄ Research | Baca paper terbaru, reproduksi hasil, kontribusi | arxiv.org, Papers With Code |
| π Production | MLOps, model serving, monitoring, A/B testing | MLOps community |
| Level | What to Learn | Resource |
|---|---|---|
| π’ Beginner++ | PyTorch / TensorFlow β use frameworks! | pytorch.org/tutorials |
| π΅ Intermediate | Hugging Face Transformers, fine-tuning BERT/GPT | huggingface.co/course |
| π£ Advanced | Distributed training, RLHF, mixture of experts | CS229, CS224n, CS231n |
| π΄ Research | Read latest papers, reproduce results, contribute | arxiv.org, Papers With Code |
| π Production | MLOps, model serving, monitoring, A/B testing | MLOps community |
5. Penutup β Selamat!
5. Closing β Congratulations!
Dari perceptron sederhana di Page 1, Anda sekarang memahami: backpropagation, CNN untuk vision, regularization & optimization, RNN/LSTM untuk sequences, word embeddings & NLP, GAN untuk generative models, Transformer & Self-Attention, transfer learning, dan membangun language model (Mini-GPT) dari nol. Semua hanya dengan Python dan NumPy.
From a simple perceptron in Page 1, you now understand: backpropagation, CNN for vision, regularization & optimization, RNN/LSTM for sequences, word embeddings & NLP, GAN for generative models, Transformer & Self-Attention, transfer learning, and building a language model (Mini-GPT) from scratch. All with just Python and NumPy.
Page 9 β Transfer Learning & Fine-Tuning
π Selamat! Anda telah menyelesaikan seluruh seri Tutorial Neural Network β 10 Pages!
Dari perceptron sederhana hingga Mini-GPT β Anda sekarang memahami fondasi deep learning secara mendalam. Langkah selanjutnya: pakai PyTorch/TensorFlow, bangun proyek Anda sendiri, baca paper, kontribusi ke open source, dan terus eksplorasi! Terima kasih sudah belajar bersama kami. π
π Congratulations! You've completed the entire Neural Network Tutorial Series β all 10 Pages!
From a simple perceptron to Mini-GPT β you now deeply understand the foundations of deep learning. Next steps: use PyTorch/TensorFlow, build your own projects, read papers, contribute to open source, and keep exploring! Thank you for learning with us. π