๐Ÿ“ Artikel ini ditulis dalam Bahasa Indonesia & English
๐Ÿ“ This article is available in English & Bahasa Indonesia

๐Ÿ“˜ Tutorial Neural Network โ€” Page 2Neural Network Tutorial โ€” Page 2

Multi-Layer Network &
Dataset Dunia Nyata

Multi-Layer Networks &
Real-World Datasets

Dari neural network sederhana ke deep network sungguhan. Page 2 membahas: cara menambahkan multiple hidden layers, memuat dan memproses dataset nyata (Iris & MNIST), Mini-Batch Gradient Descent, normalisasi data, dan membangun digit classifier yang bisa mengenali angka tulisan tangan โ€” 97%+ akurasi.

From a simple neural network to a real deep network. Page 2 covers: adding multiple hidden layers, loading and processing real datasets (Iris & MNIST), Mini-Batch Gradient Descent, data normalization, and building a digit classifier that recognizes handwritten numbers โ€” 97%+ accuracy.

๐Ÿ“… MaretMarch 2026 โฑ 25 menit baca25 min read
๐Ÿท Deep NetworkIrisMNISTMini-BatchSoftmaxClassification
๐Ÿ“š Seri Tutorial Neural Network:Neural Network Tutorial Series:
1 2 3 4 5 6 7 8 9 10

๐Ÿ“‘ Daftar Isi โ€” Page 2

๐Ÿ“‘ Table of Contents โ€” Page 2

  1. Recap Page 1 โ€” Fondasi yang sudah kita bangun
  2. Deep Neural Network โ€” Menambahkan multiple hidden layers
  3. Softmax & Cross-Entropy โ€” Klasifikasi multi-kelas
  4. Data Preprocessing โ€” Normalisasi & One-Hot Encoding
  5. Mini-Batch Gradient Descent โ€” Training lebih cepat & stabil
  6. Klasifikasi Iris โ€” Dataset pertama Anda yang nyata
  7. Klasifikasi MNIST โ€” Mengenali angka tulisan tangan
  8. Ringkasan & Preview Page 3
  1. Page 1 Recap โ€” The foundation we've built
  2. Deep Neural Network โ€” Adding multiple hidden layers
  3. Softmax & Cross-Entropy โ€” Multi-class classification
  4. Data Preprocessing โ€” Normalization & One-Hot Encoding
  5. Mini-Batch Gradient Descent โ€” Faster & more stable training
  6. Iris Classification โ€” Your first real-world dataset
  7. MNIST Classification โ€” Recognizing handwritten digits
  8. Summary & Page 3 Preview
๐Ÿ”„

1. Recap Page 1 โ€” Fondasi Kita

1. Page 1 Recap โ€” Our Foundation

Perceptron โ†’ Activation โ†’ Forward โ†’ Backward โ†’ Training Loop
Perceptron โ†’ Activation โ†’ Forward โ†’ Backward โ†’ Training Loop

Di Page 1, kita sudah membangun neural network dari nol yang bisa menyelesaikan XOR dan mengenali pola sin(x). Semua menggunakan Python + NumPy saja. Kita sudah paham: Perceptron, Sigmoid/ReLU, Forward Propagation, Backpropagation, dan Gradient Descent.

In Page 1, we built a neural network from scratch that can solve XOR and recognize the sin(x) pattern. All using Python + NumPy only. We've mastered: Perceptron, Sigmoid/ReLU, Forward Propagation, Backpropagation, and Gradient Descent.

Masalahnya? Network kita baru punya 1 hidden layer dan hanya bisa binary classification (0 atau 1). Di dunia nyata, kita butuh: banyak layer (deep!), klasifikasi multi-kelas (kucing/anjing/burung), dan data yang besar. Page 2 ini menjawab semua itu.

The limitation? Our network only had 1 hidden layer and could only do binary classification (0 or 1). In the real world, we need: many layers (deep!), multi-class classification (cat/dog/bird), and large datasets. Page 2 solves all of that.

Page 1 (sederhana) Page 2 (real-world) Input โ†’ [Hidden] โ†’ Output Input โ†’ [H1] โ†’ [H2] โ†’ [H3] โ†’ Output Binary (0/1) Multi-class (0,1,2,...,9) Toy data (4 samples) Real data (60,000 images) Full-batch GD Mini-batch GD Sigmoid only Sigmoid + ReLU + Softmax
๐Ÿ—๏ธ

2. Deep Neural Network โ€” Multiple Hidden Layers

2. Deep Neural Network โ€” Multiple Hidden Layers

Lebih dalam = lebih bisa menangkap pola kompleks
Deeper = better at capturing complex patterns

Menambahkan hidden layers membuat network lebih "dalam" โ€” inilah asal kata "deep learning". Setiap layer menangkap abstraksi yang semakin tinggi. Misal untuk pengenalan wajah: layer 1 mendeteksi garis, layer 2 mendeteksi bentuk, layer 3 mendeteksi mata/hidung, layer 4 mengenali wajah.

Adding hidden layers makes the network "deeper" โ€” this is where "deep learning" gets its name. Each layer captures increasingly higher-level abstractions. For face recognition: layer 1 detects edges, layer 2 detects shapes, layer 3 detects eyes/noses, layer 4 recognizes faces.

06_deep_network.py โ€” Flexible Deep Network Class python
import numpy as np

class DeepNeuralNetwork:
    """
    Deep Neural Network โ€” any number of layers!
    Example: DeepNeuralNetwork([784, 128, 64, 10])
             โ†’ 784 input, 128 hidden, 64 hidden, 10 output
    """
    def __init__(self, layer_sizes):
        self.L = len(layer_sizes) - 1  # number of layers (excl. input)
        self.sizes = layer_sizes
        self.params = {}

        # Initialize weights (He initialization for ReLU)
        for l in range(1, self.L + 1):
            self.params[f'W{l}'] = np.random.randn(
                layer_sizes[l-1], layer_sizes[l]
            ) * np.sqrt(2.0 / layer_sizes[l-1])
            self.params[f'b{l}'] = np.zeros((1, layer_sizes[l]))

    def relu(self, z):
        return np.maximum(0, z)

    def relu_deriv(self, z):
        return (z > 0).astype(float)

    def softmax(self, z):
        """Numerically stable softmax"""
        exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
        return exp_z / np.sum(exp_z, axis=1, keepdims=True)

    def forward(self, X):
        """Forward pass through ALL layers"""
        self.cache = {'a0': X}

        for l in range(1, self.L + 1):
            z = self.cache[f'a{l-1}'] @ self.params[f'W{l}'] + self.params[f'b{l}']
            self.cache[f'z{l}'] = z

            if l == self.L:
                # Last layer: softmax (multi-class)
                self.cache[f'a{l}'] = self.softmax(z)
            else:
                # Hidden layers: ReLU
                self.cache[f'a{l}'] = self.relu(z)

        return self.cache[f'a{self.L}']

    def backward(self, y_onehot, lr=0.01):
        """Backprop through ALL layers"""
        m = y_onehot.shape[0]

        # Output layer: softmax + cross-entropy shortcut
        dz = self.cache[f'a{self.L}'] - y_onehot  # elegant!

        for l in range(self.L, 0, -1):
            dW = (1/m) * self.cache[f'a{l-1}'].T @ dz
            db = (1/m) * np.sum(dz, axis=0, keepdims=True)

            if l > 1:
                da = dz @ self.params[f'W{l}'].T
                dz = da * self.relu_deriv(self.cache[f'z{l-1}'])

            # Update
            self.params[f'W{l}'] -= lr * dW
            self.params[f'b{l}'] -= lr * db

    def predict(self, X):
        probs = self.forward(X)
        return np.argmax(probs, axis=1)

    def accuracy(self, X, y):
        return np.mean(self.predict(X) == y) * 100

๐ŸŽ“ Apa yang Berubah dari Page 1?
Fleksibel: Sekarang bisa N layer โ€” tinggal pass list [784, 128, 64, 10].
ReLU: Hidden layers pakai ReLU (lebih cepat konvergen dari sigmoid).
Softmax: Output layer pakai softmax (klasifikasi multi-kelas).
He Init: Weight initialization yang tepat untuk ReLU โ€” mencegah vanishing gradient.

๐ŸŽ“ What Changed from Page 1?
Flexible: Now supports N layers โ€” just pass a list like [784, 128, 64, 10].
ReLU: Hidden layers use ReLU (converges faster than sigmoid).
Softmax: Output layer uses softmax (multi-class classification).
He Init: Proper weight initialization for ReLU โ€” prevents vanishing gradient.

๐ŸŽฏ

3. Softmax & Cross-Entropy Loss

3. Softmax & Cross-Entropy Loss

Dari binary ke multi-class โ€” probabilitas tiap kelas
From binary to multi-class โ€” probability for each class

Di Page 1, kita pakai sigmoid (output: 0 atau 1). Sekarang untuk klasifikasi banyak kelas (misal: angka 0-9), kita butuh softmax โ€” mengubah output menjadi distribusi probabilitas yang total = 1.

In Page 1, we used sigmoid (output: 0 or 1). Now for multi-class classification (e.g., digits 0-9), we need softmax โ€” converting outputs into a probability distribution that sums to 1.

07_softmax_crossentropy.py python
import numpy as np

# ===========================
# Softmax โ€” turns scores into probabilities
# ===========================
def softmax(z):
    exp_z = np.exp(z - np.max(z, axis=1, keepdims=True))
    return exp_z / np.sum(exp_z, axis=1, keepdims=True)

# Example: raw scores from output layer
scores = np.array([[2.0, 1.0, 0.1]])  # 3 classes
probs = softmax(scores)
print(probs)
# [[0.659, 0.242, 0.099]] โ† probabilities! Sum = 1.0
print(f"Predicted class: {np.argmax(probs)}")  # 0

# ===========================
# Cross-Entropy Loss
# Better than MSE for classification!
# ===========================
def cross_entropy_loss(y_pred, y_true_onehot):
    """
    y_pred: softmax output (probabilities)
    y_true_onehot: one-hot encoded labels
    """
    m = y_pred.shape[0]
    # Clip to avoid log(0)
    y_pred = np.clip(y_pred, 1e-12, 1 - 1e-12)
    loss = -np.sum(y_true_onehot * np.log(y_pred)) / m
    return loss

# Example
y_true = np.array([[1, 0, 0]])  # true: class 0
y_pred = np.array([[0.7, 0.2, 0.1]])  # predicted probs
print(f"Loss: {cross_entropy_loss(y_pred, y_true):.4f}")  # 0.3567

# ===========================
# One-Hot Encoding helper
# ===========================
def one_hot(labels, num_classes):
    """Convert [0, 2, 1] โ†’ [[1,0,0], [0,0,1], [0,1,0]]"""
    m = labels.shape[0]
    encoded = np.zeros((m, num_classes))
    encoded[np.arange(m), labels] = 1
    return encoded

print(one_hot(np.array([0, 2, 1]), 3))
# [[1. 0. 0.]   โ† class 0
#  [0. 0. 1.]   โ† class 2
#  [0. 1. 0.]]  โ† class 1

๐ŸŽ“ Kenapa Cross-Entropy, bukan MSE?
Cross-entropy memberikan gradient yang lebih besar ketika prediksi sangat salah โ€” model belajar lebih cepat dari kesalahan besar. MSE bisa "malas" karena gradient-nya kecil di ujung-ujung sigmoid. Bonus: gradient softmax + cross-entropy sangat sederhana: dz = ลท - y.

๐ŸŽ“ Why Cross-Entropy, not MSE?
Cross-entropy produces larger gradients when predictions are very wrong โ€” the model learns faster from big mistakes. MSE can be "lazy" because gradients are small at sigmoid's extremes. Bonus: the softmax + cross-entropy gradient is beautifully simple: dz = ลท - y.

๐Ÿงน

4. Data Preprocessing โ€” Normalisasi & Encoding

4. Data Preprocessing โ€” Normalization & Encoding

Data mentah โ†’ data siap training
Raw data โ†’ training-ready data

Neural network sensitif terhadap skala data. Fitur dengan rentang 0-1000 akan mendominasi fitur dengan rentang 0-1. Normalisasi membuat semua fitur setara โ€” training lebih cepat dan stabil.

Neural networks are sensitive to data scale. A feature ranging 0-1000 will dominate one ranging 0-1. Normalization puts all features on equal footing โ€” training becomes faster and more stable.

08_preprocessing.py python
import numpy as np

# ===========================
# 1. Min-Max Normalization โ†’ scale to [0, 1]
# ===========================
def normalize(X):
    X_min = X.min(axis=0)
    X_max = X.max(axis=0)
    return (X - X_min) / (X_max - X_min + 1e-8)

# ===========================
# 2. Z-Score Standardization โ†’ mean=0, std=1
# ===========================
def standardize(X):
    mean = X.mean(axis=0)
    std = X.std(axis=0)
    return (X - mean) / (std + 1e-8)

# ===========================
# 3. Train/Test Split
# ===========================
def train_test_split(X, y, test_ratio=0.2, seed=42):
    np.random.seed(seed)
    indices = np.random.permutation(len(X))
    split = int(len(X) * (1 - test_ratio))
    train_idx, test_idx = indices[:split], indices[split:]
    return X[train_idx], X[test_idx], y[train_idx], y[test_idx]

# Example
data = np.array([[150, 0.5], [200, 0.8], [100, 0.2]])
print("Before:", data[0])         # [150, 0.5]
print("After: ", normalize(data)[0]) # [0.5, 0.5] โ† same scale!
โšก

5. Mini-Batch Gradient Descent

5. Mini-Batch Gradient Descent

Kompromi terbaik: cepat dan stabil
The best compromise: fast and stable

Di Page 1, kita pakai full-batch GD โ€” seluruh dataset dihitung sekaligus. Ini lambat untuk data besar. Alternatifnya:

In Page 1, we used full-batch GD โ€” the entire dataset computed at once. This is slow for large data. Alternatives:

3 Jenis / 3 Types of Gradient Descent Full-Batch GD Mini-Batch GD Stochastic GD (SGD) All data at once Chunks of 32/64/128 One sample at a time Slow, stable โญ Fast & stable Very fast, noisy 1 update/epoch N updates/epoch M updates/epoch Good for small data Best for most cases Good for online learning
09_minibatch.py โ€” Mini-Batch Training python
import numpy as np

def create_minibatches(X, y, batch_size=32):
    """Split data into mini-batches"""
    m = X.shape[0]
    indices = np.random.permutation(m)
    X_shuffled = X[indices]
    y_shuffled = y[indices]

    batches = []
    for i in range(0, m, batch_size):
        X_batch = X_shuffled[i:i+batch_size]
        y_batch = y_shuffled[i:i+batch_size]
        batches.append((X_batch, y_batch))
    return batches

# ===========================
# Training with mini-batches
# ===========================
def train_minibatch(model, X, y_onehot, epochs=20, lr=0.1, batch_size=32):
    for epoch in range(epochs):
        batches = create_minibatches(X, y_onehot, batch_size)
        epoch_loss = 0

        for X_batch, y_batch in batches:
            # Forward
            pred = model.forward(X_batch)

            # Loss
            pred_clipped = np.clip(pred, 1e-12, 1-1e-12)
            batch_loss = -np.sum(y_batch * np.log(pred_clipped)) / len(X_batch)
            epoch_loss += batch_loss

            # Backward + Update
            model.backward(y_batch, lr)

        if (epoch+1) % 5 == 0:
            avg_loss = epoch_loss / len(batches)
            print(f"  Epoch {epoch+1:>3} โ”‚ Loss: {avg_loss:.4f}")

๐Ÿ’ก Tip: Batch Size
32 โ€” pilihan default yang bagus untuk kebanyakan kasus.
64-128 โ€” jika punya GPU, ukuran lebih besar memanfaatkan parallelism.
Kekuatan 2 (32, 64, 128, 256) โ€” optimal untuk hardware modern.

๐Ÿ’ก Tip: Batch Size
32 โ€” a good default for most cases.
64-128 โ€” if you have a GPU, larger sizes leverage parallelism.
Powers of 2 (32, 64, 128, 256) โ€” optimal for modern hardware.

๐ŸŒธ

6. Klasifikasi Iris โ€” Dataset Pertama Anda

6. Iris Classification โ€” Your First Real Dataset

150 bunga, 4 fitur, 3 spesies โ€” klasik machine learning
150 flowers, 4 features, 3 species โ€” a machine learning classic

Dataset Iris berisi pengukuran 150 bunga iris dengan 4 fitur (panjang/lebar sepal dan petal) dan 3 kelas (Setosa, Versicolor, Virginica). Ini "Hello World"-nya machine learning โ€” kecil tapi cukup untuk membuktikan model kita bekerja pada data nyata.

The Iris dataset contains measurements of 150 iris flowers with 4 features (sepal/petal length and width) and 3 classes (Setosa, Versicolor, Virginica). It's the "Hello World" of machine learning โ€” small but enough to prove our model works on real data.

10_iris_classifier.py โ€” Iris Classification ๐ŸŒธ python
import numpy as np
from sklearn.datasets import load_iris  # just for loading data

# ===========================
# 1. Load & Prepare Data
# ===========================
iris = load_iris()
X = iris.data.astype(np.float64)     # (150, 4)
y = iris.target                       # (150,) โ†’ values: 0, 1, 2

# Normalize
X = (X - X.mean(axis=0)) / (X.std(axis=0) + 1e-8)

# One-hot encode labels
def one_hot(labels, num_classes):
    enc = np.zeros((len(labels), num_classes))
    enc[np.arange(len(labels)), labels] = 1
    return enc

y_oh = one_hot(y, 3)  # (150, 3)

# Train/test split
np.random.seed(42)
idx = np.random.permutation(150)
X_train, X_test = X[idx[:120]], X[idx[120:]]
y_train, y_test = y_oh[idx[:120]], y_oh[idx[120:]]
y_test_labels = y[idx[120:]]

# ===========================
# 2. Create & Train Network
# ===========================
# Architecture: 4 input โ†’ 16 hidden โ†’ 8 hidden โ†’ 3 output
model = DeepNeuralNetwork([4, 16, 8, 3])

print("๐ŸŒธ Training on Iris dataset...")
for epoch in range(200):
    pred = model.forward(X_train)
    model.backward(y_train, lr=0.1)

    if (epoch+1) % 50 == 0:
        loss = -np.sum(y_train * np.log(np.clip(pred,1e-12,1))) / len(X_train)
        acc = model.accuracy(X_test, y_test_labels)
        print(f"  Epoch {epoch+1:>3} โ”‚ Loss: {loss:.4f} โ”‚ Test Acc: {acc:.1f}%")

# ===========================
# 3. Final Results
# ===========================
print(f"\nโœ… Final Test Accuracy: {model.accuracy(X_test, y_test_labels):.1f}%")
# Output: โœ… Final Test Accuracy: 96.7%+ ๐ŸŽ‰

๐ŸŽ‰ 96%+ Akurasi! Neural network kita โ€” yang dibangun dari nol tanpa framework โ€” berhasil mengklasifikasikan bunga iris dengan akurasi tinggi. Arsitekturnya hanya [4, 16, 8, 3] dengan 200 epoch training.

๐ŸŽ‰ 96%+ Accuracy! Our neural network โ€” built from scratch without any framework โ€” successfully classifies iris flowers with high accuracy. The architecture is just [4, 16, 8, 3] with 200 epochs of training.

๐Ÿ”ข

7. Klasifikasi MNIST โ€” Angka Tulisan Tangan

7. MNIST Classification โ€” Handwritten Digits

60.000 gambar, 10 digit, deep network dari nol โ€” 97%+ akurasi
60,000 images, 10 digits, deep network from scratch โ€” 97%+ accuracy

MNIST adalah dataset legenda di machine learning โ€” 70.000 gambar angka tulisan tangan (28ร—28 piksel). Setiap gambar = 784 piksel = 784 input features. Tugas: klasifikasikan ke digit 0-9.

MNIST is a legendary machine learning dataset โ€” 70,000 handwritten digit images (28ร—28 pixels). Each image = 784 pixels = 784 input features. Task: classify into digits 0-9.

MNIST Dataset Image: 28 ร— 28 pixels Flattened: 784 values โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ–‘โ–‘โ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ โ”‚ [0, 0, 0.5, 0.9, 0.9, 0.1, ...] โ”‚ โ–‘โ–‘โ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ โ”‚ โ”‚ โ”‚ โ–‘โ–‘โ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ โ”‚ โ–ผ โ”‚ โ–‘โ–‘โ–ˆโ–ˆโ–‘โ–‘โ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘ โ”‚ [784] โ†’ [128] โ†’ [64] โ†’ [10] โ”‚ โ–‘โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘ โ”‚ โ”‚ โ”‚ โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ–‘โ–‘โ–‘ โ”‚ โ–ผ โ”‚ โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–ˆโ–ˆโ–‘โ–‘โ–‘ โ”‚ Prediction: "4" (confidence: 98.2%) โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
11_mnist_classifier.py โ€” MNIST from Scratch! ๐Ÿ”ฅ python
import numpy as np
from sklearn.datasets import fetch_openml

# =====================================================
# 1. LOAD MNIST DATA
# =====================================================
print("๐Ÿ“ฅ Loading MNIST dataset...")
mnist = fetch_openml('mnist_784', version=1, as_frame=False)
X, y = mnist.data.astype(np.float64), mnist.target.astype(int)

# Normalize pixels: [0, 255] โ†’ [0, 1]
X = X / 255.0

# Split: 60k train, 10k test
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]

# One-hot encode
def one_hot(labels, nc):
    enc = np.zeros((len(labels), nc))
    enc[np.arange(len(labels)), labels] = 1
    return enc

y_train_oh = one_hot(y_train, 10)

print(f"Train: {X_train.shape}, Test: {X_test.shape}")
# Train: (60000, 784), Test: (10000, 784)

# =====================================================
# 2. CREATE DEEP NETWORK
# Architecture: 784 โ†’ 128 โ†’ 64 โ†’ 10
# =====================================================
model = DeepNeuralNetwork([784, 128, 64, 10])
print("๐Ÿง  Network: [784] โ†’ [128] โ†’ [64] โ†’ [10]")

# =====================================================
# 3. TRAIN WITH MINI-BATCHES
# =====================================================
epochs = 20
batch_size = 64
lr = 0.1

print(f"\n๐Ÿ”ฅ Training for {epochs} epochs (batch={batch_size}, lr={lr})")
print("โ”€" * 50)

for epoch in range(epochs):
    # Shuffle
    idx = np.random.permutation(60000)
    X_shuf = X_train[idx]
    y_shuf = y_train_oh[idx]

    # Mini-batch loop
    for i in range(0, 60000, batch_size):
        Xb = X_shuf[i:i+batch_size]
        yb = y_shuf[i:i+batch_size]
        model.forward(Xb)
        model.backward(yb, lr)

    # Evaluate every 5 epochs
    if (epoch+1) % 5 == 0:
        train_acc = model.accuracy(X_train[:5000], y_train[:5000])
        test_acc = model.accuracy(X_test, y_test)
        print(f"  Epoch {epoch+1:>2} โ”‚ Train: {train_acc:.1f}% โ”‚ Test: {test_acc:.1f}%")

# =====================================================
# 4. FINAL RESULTS
# =====================================================
final_acc = model.accuracy(X_test, y_test)
print(f"\n๐ŸŽฏ Final Test Accuracy: {final_acc:.1f}%")
# Output: ๐ŸŽฏ Final Test Accuracy: 97.2%+ ๐ŸŽ‰

# =====================================================
# 5. DEMO: Predict single digit
# =====================================================
sample_idx = 42
pred = model.predict(X_test[sample_idx:sample_idx+1])
print(f"\nSample #{sample_idx}: Predicted={pred[0]}, Actual={y_test[sample_idx]}")

๐ŸŽ‰ 97%+ Akurasi pada MNIST!
Neural network buatan kita โ€” tanpa framework โ€” bisa mengenali angka tulisan tangan dengan akurasi 97%+. Dari 10.000 gambar test, hanya ~300 yang salah. Dan ini hanya dengan 2 hidden layers dan 20 epoch! Bayangkan apa yang bisa dicapai dengan arsitektur lebih besar (CNN, yang akan kita bahas di Page 3).

๐ŸŽ‰ 97%+ Accuracy on MNIST!
Our handcrafted neural network โ€” no framework โ€” can recognize handwritten digits with 97%+ accuracy. Out of 10,000 test images, only ~300 are wrong. And this is with just 2 hidden layers and 20 epochs! Imagine what's possible with larger architectures (CNNs, covered in Page 3).

๐Ÿ“

8. Ringkasan Page 2

8. Page 2 Summary

Apa yang sudah kita pelajari
What we've learned
KonsepApa ItuKode Kunci
Deep NetworkMultiple hidden layers โ€” pola semakin abstrakDeepNN([784,128,64,10])
ReLUActivation hidden layer โ€” cepat, simplenp.maximum(0, z)
SoftmaxOutput multi-kelas โ†’ probabilitas (sum=1)exp(z) / sum(exp(z))
Cross-EntropyLoss function untuk klasifikasi-sum(y * log(ลท))
One-Hot EncodingLabel โ†’ vektor biner [0,0,1,0...]enc[i, label] = 1
NormalisasiSemua fitur di skala yang samaX / 255.0
Mini-Batch GDUpdate per chunk data (32/64/128)for i in range(0,m,bs)
He InitializationInit weight optimal untuk ReLUrandn() * sqrt(2/n)
MNIST60k gambar digit โ†’ 97%+ akurasi[784,128,64,10]
ConceptWhat It IsKey Code
Deep NetworkMultiple hidden layers โ€” increasingly abstract patternsDeepNN([784,128,64,10])
ReLUHidden layer activation โ€” fast, simplenp.maximum(0, z)
SoftmaxMulti-class output โ†’ probabilities (sum=1)exp(z) / sum(exp(z))
Cross-EntropyLoss function for classification-sum(y * log(ลท))
One-Hot EncodingLabel โ†’ binary vector [0,0,1,0...]enc[i, label] = 1
NormalizationAll features on the same scaleX / 255.0
Mini-Batch GDUpdate per data chunk (32/64/128)for i in range(0,m,bs)
He InitializationOptimal weight init for ReLUrandn() * sqrt(2/n)
MNIST60k digit images โ†’ 97%+ accuracy[784,128,64,10]
โ† Page Sebelumnyaโ† Previous Page

Page 1 โ€” Neural Network dari Nol

๐Ÿ“˜

Coming Next: Page 3 โ€” Convolutional Neural Network (CNN)

Memahami convolution, pooling, dan feature maps. Membangun CNN dari nol untuk image classification, lalu membandingkan hasilnya dengan network biasa. MNIST โ†’ 99%+ akurasi. Stay tuned!

๐Ÿ“˜

Coming Next: Page 3 โ€” Convolutional Neural Network (CNN)

Understanding convolution, pooling, and feature maps. Building a CNN from scratch for image classification, then comparing results with a regular network. MNIST โ†’ 99%+ accuracy. Stay tuned!