๐Ÿ“ Artikel ini ditulis dalam Bahasa Indonesia & English
๐Ÿ“ This article is available in English & Bahasa Indonesia

๐Ÿ” Tutorial Neural Network โ€” Page 5Neural Network Tutorial โ€” Page 5

Recurrent Neural Network
LSTM & Sequence Data

Recurrent Neural Network
LSTM & Sequence Data

Data tidak selalu berupa gambar โ€” banyak data berurutan: teks, harga saham, musik, cuaca. Page 5 membahas: RNN dan konsep memori, vanishing gradient problem, LSTM dan GRU sebagai solusinya, membangun character-level text generator, dan sentiment analysis โ€” semuanya dari nol.

Data isn't always images โ€” much of it is sequential: text, stock prices, music, weather. Page 5 covers: RNNs and the concept of memory, the vanishing gradient problem, LSTM and GRU as solutions, building a character-level text generator, and sentiment analysis โ€” all from scratch.

๐Ÿ“… MaretMarch 2026 โฑ 30 menit baca30 min read
๐Ÿท RNNLSTMGRUText GenerationSentimentTime Series
๐Ÿ“š Seri Tutorial Neural Network:Neural Network Tutorial Series:

๐Ÿ“‘ Daftar Isi โ€” Page 5

๐Ÿ“‘ Table of Contents โ€” Page 5

  1. Kenapa RNN? โ€” Data berurutan butuh "memori"
  2. Vanilla RNN โ€” Arsitektur dasar dan forward pass
  3. Backpropagation Through Time โ€” BPTT dan vanishing gradient
  4. LSTM โ€” Long Short-Term Memory: solusi vanishing gradient
  5. GRU โ€” Gated Recurrent Unit: LSTM yang lebih ringkas
  6. Character-Level Text Generator โ€” Menulis teks karakter per karakter
  7. Sentiment Analysis โ€” Klasifikasi emosi dari teks
  8. Ringkasan & Preview Page 6
  1. Why RNN? โ€” Sequential data needs "memory"
  2. Vanilla RNN โ€” Basic architecture and forward pass
  3. Backpropagation Through Time โ€” BPTT and vanishing gradients
  4. LSTM โ€” Long Short-Term Memory: solving vanishing gradients
  5. GRU โ€” Gated Recurrent Unit: a leaner LSTM
  6. Character-Level Text Generator โ€” Writing text one character at a time
  7. Sentiment Analysis โ€” Classifying emotions from text
  8. Summary & Page 6 Preview
๐Ÿค”

1. Kenapa RNN? โ€” Data Berurutan Butuh Memori

1. Why RNN? โ€” Sequential Data Needs Memory

Feedforward network tidak punya konsep "urutan" dan "waktu"
Feedforward networks have no concept of "order" or "time"

CNN dan Dense network memperlakukan setiap input secara independen. Tapi banyak data di dunia nyata itu berurutan โ€” arti sebuah kata bergantung pada kata sebelumnya, harga saham hari ini bergantung pada hari kemarin. Kita butuh network yang punya memori.

CNNs and Dense networks treat each input independently. But much real-world data is sequential โ€” the meaning of a word depends on previous words, today's stock price depends on yesterday's. We need a network with memory.

Data Berurutan / Sequential Data Examples Teks / Text: "Saya suka [???]" โ†’ "makan" (context matters!) Time Series: ๐Ÿ“ˆ 100 โ†’ 105 โ†’ 103 โ†’ [???] โ†’ predict next Musik / Music: ๐ŸŽต C โ†’ E โ†’ G โ†’ [???] โ†’ predict next note DNA: ATCG โ†’ GCTA โ†’ [???] โ†’ predict next sequence Feedforward: setiap input independen โ†’ โŒ no memory RNN: output bergantung pada input SEKARANG + state SEBELUMNYA โ†’ โœ…

๐Ÿ’ก Analogi: Menonton Film
Feedforward = melihat satu frame foto tanpa tahu frame sebelumnya. Anda tidak bisa mengerti ceritanya.
RNN = menonton film secara berurutan โ€” Anda ingat apa yang terjadi sebelumnya dan itu mempengaruhi pemahaman Anda tentang adegan sekarang.

๐Ÿ’ก Analogy: Watching a Movie
Feedforward = seeing one frame without knowing previous frames. You can't understand the story.
RNN = watching a movie sequentially โ€” you remember what happened before and it affects your understanding of the current scene.

๐Ÿ”

2. Vanilla RNN โ€” Arsitektur Dasar

2. Vanilla RNN โ€” Basic Architecture

Setiap timestep: input baru + hidden state sebelumnya โ†’ output
Each timestep: new input + previous hidden state โ†’ output

RNN punya hidden state h(t) yang berperan sebagai "memori". Di setiap timestep, hidden state di-update berdasarkan input baru dan hidden state sebelumnya: h(t) = tanh(W_h ยท h(t-1) + W_x ยท x(t) + b).

An RNN has a hidden state h(t) that acts as "memory". At each timestep, the hidden state is updated based on the new input and the previous hidden state: h(t) = tanh(W_h ยท h(t-1) + W_x ยท x(t) + b).

RNN Unrolled Through Time h(0) โ”€โ”€โ”€โ”€โ”€โ–ถ h(1) โ”€โ”€โ”€โ”€โ”€โ–ถ h(2) โ”€โ”€โ”€โ”€โ”€โ–ถ h(3) โ†‘ โ†‘ โ†‘ โ†‘ โ”‚ โ”‚ โ”‚ โ”‚ x(0) x(1) x(2) x(3) "Saya" "suka" "makan" "nasi" โ”‚ โ–ผ output Same weights W_h, W_x are SHARED across all timesteps!
23_vanilla_rnn.py โ€” RNN from Scratch python
import numpy as np

class VanillaRNN:
    """Simple RNN cell โ€” from scratch"""

    def __init__(self, input_size, hidden_size, output_size):
        self.hidden_size = hidden_size
        # Weight matrices
        scale = 0.01
        self.Wxh = np.random.randn(input_size, hidden_size) * scale
        self.Whh = np.random.randn(hidden_size, hidden_size) * scale
        self.Why = np.random.randn(hidden_size, output_size) * scale
        self.bh = np.zeros((1, hidden_size))
        self.by = np.zeros((1, output_size))

    def forward(self, inputs, h_prev=None):
        """
        inputs: list of input vectors (one per timestep)
        h_prev: initial hidden state
        returns: outputs, hidden_states
        """
        if h_prev is None:
            h_prev = np.zeros((1, self.hidden_size))

        self.inputs = inputs
        self.hs = {-1: h_prev}  # hidden states per timestep
        self.outputs = []

        for t, x in enumerate(inputs):
            # Core RNN equation!
            self.hs[t] = np.tanh(
                x @ self.Wxh +            # input contribution
                self.hs[t-1] @ self.Whh + # memory contribution
                self.bh                    # bias
            )
            # Output at this timestep
            y = self.hs[t] @ self.Why + self.by
            self.outputs.append(y)

        return self.outputs, self.hs

# Demo: process a sequence of 4 timesteps
rnn = VanillaRNN(input_size=10, hidden_size=32, output_size=10)
inputs = [np.random.randn(1, 10) for _ in range(4)]
outputs, hidden_states = rnn.forward(inputs)
print(f"Processed {len(inputs)} timesteps")
print(f"Last hidden state shape: {hidden_states[3].shape}")
print(f"Last output shape: {outputs[-1].shape}")
๐Ÿ’€

3. Backprop Through Time & Vanishing Gradient

3. Backprop Through Time & Vanishing Gradient

Masalah kritis: gradient menghilang di sequence panjang
Critical problem: gradients vanish in long sequences

BPTT (Backpropagation Through Time) = backprop biasa, tapi di-"unroll" sepanjang waktu. Masalahnya: saat gradient mengalir mundur melewati banyak timestep, gradient terus dikalikan dengan W_hh berulang kali. Jika |W_hh| < 1 โ†’ gradient menghilang (vanishing). Jika |W_hh| > 1 โ†’ gradient meledak (exploding).

BPTT (Backpropagation Through Time) = regular backprop, but "unrolled" through time. The problem: as gradients flow backward through many timesteps, they're repeatedly multiplied by W_hh. If |W_hh| < 1 โ†’ gradients vanish (vanishing). If |W_hh| > 1 โ†’ gradients explode (exploding).

Vanishing Gradient Problem Gradient flows backward through time: t=100 t=99 t=98 t=1 t=0 โ”€โ”€โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ—€โ”€โ”€ ... โ”€โ”€โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ—€โ”€โ”€ ร—W_hh ร—W_hh ร—W_hh ร—W_hh ร—W_hh If |W_hh| = 0.9: After 100 steps: 0.9ยนโฐโฐ = 0.0000265 โ† gradient almost ZERO! Result: RNN "forgets" early inputs in long sequences Solution: LSTM and GRU (special gates to control memory)
24_gradient_clipping.py โ€” Quick Fix: Gradient Clipping python
import numpy as np

def clip_gradients(gradients, max_norm=5.0):
    """Clip gradients to prevent exploding gradient"""
    total_norm = np.sqrt(
        sum(np.sum(g ** 2) for g in gradients)
    )
    if total_norm > max_norm:
        scale = max_norm / total_norm
        gradients = [g * scale for g in gradients]
    return gradients

# This fixes EXPLODING gradients
# But VANISHING gradients need a different solution โ†’ LSTM!
๐Ÿง 

4. LSTM โ€” Long Short-Term Memory

4. LSTM โ€” Long Short-Term Memory

3 gate + cell state = memori jangka panjang yang terkontrol
3 gates + cell state = controlled long-term memory

LSTM mengatasi vanishing gradient dengan menambahkan cell state (jalur "highway" untuk informasi) dan 3 gate yang mengontrol apa yang diingat, dilupakan, dan dikeluarkan. Gate adalah sigmoid (0-1) yang bertindak seperti katup.

LSTM solves vanishing gradients by adding a cell state (an information "highway") and 3 gates that control what to remember, forget, and output. Gates are sigmoids (0-1) that act like valves.

LSTM Cell โ€” 3 Gates + Cell State โ”Œโ”€โ”€โ”€โ”€โ”€โ”€ Forget Gate โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ f(t) = ฯƒ(W_fยท[h,x]+b) โ”‚ โ”‚ "Apa yang dilupakan?" โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ–ผ c(t-1) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ร— f(t) โ”€โ”€โ”€โ”€ + i(t)ร—cฬƒ(t) โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ c(t) (cell state) โ†‘ (new cell) โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€ Input Gate โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ i(t) = ฯƒ(W_iยท[h,x]+b) โ”‚ โ”‚ cฬƒ(t) = tanh(W_cยท[h,x]) โ”‚ โ”‚ "Apa yang ditambahkan?" โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ h(t-1) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ h(t) (hidden) Output Gate = o(t) ร— tanh(c(t)) o(t) = ฯƒ(W_oยท[h,x]+b) "Apa yang dikeluarkan?"
25_lstm.py โ€” LSTM from Scratch python
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))

class LSTMCell:
    """Single LSTM cell โ€” from scratch"""

    def __init__(self, input_size, hidden_size):
        n = input_size + hidden_size
        s = np.sqrt(2.0 / n)
        self.hidden_size = hidden_size

        # Combined weights for all 4 gates (efficiency!)
        # [forget, input, candidate, output]
        self.W = np.random.randn(n, 4 * hidden_size) * s
        self.b = np.zeros((1, 4 * hidden_size))
        # Initialize forget gate bias to 1 (remember by default)
        self.b[0, :hidden_size] = 1.0

    def forward(self, x, h_prev, c_prev):
        """One timestep forward"""
        H = self.hidden_size

        # Concatenate input and previous hidden state
        combined = np.concatenate([h_prev, x], axis=1)

        # Compute all 4 gates at once
        gates = combined @ self.W + self.b

        # Split into individual gates
        f = sigmoid(gates[:, :H])          # Forget gate
        i = sigmoid(gates[:, H:2*H])       # Input gate
        c_tilde = np.tanh(gates[:, 2*H:3*H])  # Candidate
        o = sigmoid(gates[:, 3*H:])        # Output gate

        # Update cell state
        c_new = f * c_prev + i * c_tilde   # forget old + add new

        # Compute hidden state
        h_new = o * np.tanh(c_new)         # output filtered

        # Cache for backprop
        self.cache = (x, h_prev, c_prev, f, i, c_tilde, o, c_new)

        return h_new, c_new

# Demo
lstm = LSTMCell(input_size=10, hidden_size=32)
h = np.zeros((1, 32))
c = np.zeros((1, 32))

# Process 5 timesteps
for t in range(5):
    x = np.random.randn(1, 10)
    h, c = lstm.forward(x, h, c)
    print(f"t={t}: h_norm={np.linalg.norm(h):.4f}, c_norm={np.linalg.norm(c):.4f}")

๐ŸŽ“ Kenapa LSTM Mengatasi Vanishing Gradient?
Cell state c(t) adalah "highway" โ€” informasi mengalir lewat operasi perkalian dan penjumlahan saja (bukan tanh berulang). Forget gate yang mendekati 1 memungkinkan gradient mengalir tanpa penyusutan. Ini seperti membuat "jalan tol" khusus untuk informasi penting.

๐ŸŽ“ Why Does LSTM Solve Vanishing Gradients?
The cell state c(t) is a "highway" โ€” information flows through multiplication and addition only (no repeated tanh). A forget gate near 1 allows gradients to flow without shrinking. It's like creating a special "express lane" for important information.

โšก

5. GRU โ€” Gated Recurrent Unit

5. GRU โ€” Gated Recurrent Unit

LSTM yang lebih ringkas โ€” 2 gate, lebih cepat, performa serupa
A leaner LSTM โ€” 2 gates, faster, similar performance

GRU menyederhanakan LSTM: hanya 2 gate (reset dan update), tidak ada cell state terpisah. Lebih cepat di-train, parameter lebih sedikit, dan performanya sering setara dengan LSTM.

GRU simplifies LSTM: only 2 gates (reset and update), no separate cell state. Faster to train, fewer parameters, and often performs comparably to LSTM.

26_gru.py โ€” GRU from Scratch python
import numpy as np

class GRUCell:
    """GRU: 2 gates, no separate cell state"""

    def __init__(self, input_size, hidden_size):
        n = input_size + hidden_size
        s = np.sqrt(2.0 / n)
        self.H = hidden_size

        # Weights for reset & update gates
        self.Wz = np.random.randn(n, hidden_size) * s   # update gate
        self.Wr = np.random.randn(n, hidden_size) * s   # reset gate
        self.Wh = np.random.randn(n, hidden_size) * s   # candidate
        self.bz = np.zeros((1, hidden_size))
        self.br = np.zeros((1, hidden_size))
        self.bh = np.zeros((1, hidden_size))

    def forward(self, x, h_prev):
        combined = np.concatenate([h_prev, x], axis=1)

        # Update gate: how much of old state to keep
        z = sigmoid(combined @ self.Wz + self.bz)

        # Reset gate: how much of old state to use for candidate
        r = sigmoid(combined @ self.Wr + self.br)

        # Candidate hidden state
        combined_r = np.concatenate([r * h_prev, x], axis=1)
        h_tilde = np.tanh(combined_r @ self.Wh + self.bh)

        # Final hidden state: interpolate old and new
        h_new = z * h_prev + (1 - z) * h_tilde

        return h_new

# GRU vs LSTM comparison:
# LSTM: 3 gates + cell state โ†’ 4ร—(n+H)ร—H params
# GRU:  2 gates + candidate  โ†’ 3ร—(n+H)ร—H params โ† 25% fewer!

๐ŸŽ“ LSTM vs GRU โ€” Kapan Pakai Mana?
LSTM โ†’ sequence sangat panjang, butuh memori jangka panjang presisi tinggi.
GRU โ†’ dataset lebih kecil, butuh training cepat, sequence pendek-sedang.
Rule of thumb: mulai dari GRU. Kalau tidak cukup, coba LSTM.

๐ŸŽ“ LSTM vs GRU โ€” When to Use Which?
LSTM โ†’ very long sequences, need precise long-term memory.
GRU โ†’ smaller datasets, need faster training, short-medium sequences.
Rule of thumb: start with GRU. If it's not enough, try LSTM.

โœ๏ธ

6. Character-Level Text Generator

6. Character-Level Text Generator

Latih RNN menulis teks โ€” satu karakter pada satu waktu
Train an RNN to write text โ€” one character at a time

Kita latih LSTM untuk memprediksi karakter berikutnya dari sequence teks. Setelah training, model bisa generate teks baru yang mirip gaya penulisan data training โ€” karakter per karakter.

We'll train an LSTM to predict the next character from a text sequence. After training, the model can generate new text that mimics the writing style of the training data โ€” one character at a time.

27_text_generator.py โ€” Char-Level Text Gen ๐Ÿ”ฅ python
import numpy as np

# =====================================================
# 1. PREPARE TEXT DATA
# =====================================================
text = """To be or not to be that is the question
Whether tis nobler in the mind to suffer
The slings and arrows of outrageous fortune
Or to take arms against a sea of troubles"""

# Build character vocabulary
chars = sorted(set(text))
char_to_idx = {c: i for i, c in enumerate(chars)}
idx_to_char = {i: c for c, i in char_to_idx.items()}
vocab_size = len(chars)
print(f"Vocab size: {vocab_size} chars")

# One-hot encode
def one_hot_char(idx, size):
    v = np.zeros((1, size))
    v[0, idx] = 1
    return v

# =====================================================
# 2. TRAINING LOOP
# =====================================================
hidden_size = 64
seq_length = 25    # chars per training sequence
lr = 0.01
lstm = LSTMCell(vocab_size, hidden_size)
Why = np.random.randn(hidden_size, vocab_size) * 0.01
by = np.zeros((1, vocab_size))

def softmax(z):
    e = np.exp(z - np.max(z))
    return e / e.sum()

print("๐Ÿ”ฅ Training text generator...")
for iteration in range(1000):
    # Random starting position
    start = np.random.randint(0, len(text) - seq_length - 1)
    inputs = [char_to_idx[c] for c in text[start:start+seq_length]]
    targets = [char_to_idx[c] for c in text[start+1:start+seq_length+1]]

    # Forward pass through sequence
    h = np.zeros((1, hidden_size))
    c = np.zeros((1, hidden_size))
    loss = 0

    for t in range(seq_length):
        x = one_hot_char(inputs[t], vocab_size)
        h, c = lstm.forward(x, h, c)
        logits = h @ Why + by
        probs = softmax(logits)
        loss -= np.log(probs[0, targets[t]] + 1e-12)

    if (iteration + 1) % 200 == 0:
        print(f"  Iter {iteration+1:>4} โ”‚ Loss: {loss/seq_length:.4f}")

# =====================================================
# 3. GENERATE TEXT!
# =====================================================
def generate(seed_char, length=100):
    h = np.zeros((1, hidden_size))
    c = np.zeros((1, hidden_size))
    idx = char_to_idx[seed_char]
    result = seed_char

    for _ in range(length):
        x = one_hot_char(idx, vocab_size)
        h, c = lstm.forward(x, h, c)
        logits = h @ Why + by
        probs = softmax(logits).flatten()
        # Sample from probability distribution
        idx = np.random.choice(vocab_size, p=probs)
        result += idx_to_char[idx]

    return result

print("\nโœ๏ธ Generated text:")
print(generate("T", 150))

๐ŸŽ‰ Model Menulis Sendiri! Setelah cukup training, model bisa generate teks yang menyerupai Shakespeare โ€” belajar pola ejaan, spasi, bahkan struktur kalimat, hanya dari memprediksi karakter berikutnya satu per satu.

๐ŸŽ‰ The Model Writes by Itself! After enough training, the model can generate Shakespeare-like text โ€” learning spelling patterns, spacing, even sentence structure, just from predicting the next character one at a time.

๐Ÿ˜Š

7. Sentiment Analysis โ€” Klasifikasi Emosi Teks

7. Sentiment Analysis โ€” Text Emotion Classification

Baca kalimat โ†’ tentukan positif atau negatif
Read a sentence โ†’ determine positive or negative

Aplikasi RNN paling populer: membaca seluruh kalimat, ambil hidden state terakhir sebagai "ringkasan", lalu klasifikasikan sentimen. Hidden state terakhir mengandung informasi dari semua kata sebelumnya.

The most popular RNN application: read the entire sentence, take the last hidden state as a "summary", then classify the sentiment. The last hidden state contains information from all previous words.

28_sentiment_analysis.py โ€” Simple Sentiment Classifier python
import numpy as np

# =====================================================
# 1. TOY DATASET
# =====================================================
data = [
    ("this movie is great",       1),  # positive
    ("terrible waste of time",    0),  # negative
    ("i love this film",          1),
    ("worst movie ever",          0),
    ("absolutely wonderful",      1),
    ("boring and awful",          0),
    ("amazing performance",       1),
    ("horrible acting",           0),
]

# Build word vocabulary
all_words = set()
for text, _ in data:
    all_words.update(text.split())
word2idx = {w: i+1 for i, w in enumerate(sorted(all_words))}
word2idx[""] = 0
vocab_size = len(word2idx)

# =====================================================
# 2. ENCODE SENTENCES
# =====================================================
def encode_sentence(text, max_len=5):
    words = text.split()
    indices = [word2idx.get(w, 0) for w in words]
    # Pad to fixed length
    while len(indices) < max_len:
        indices.append(0)
    return indices[:max_len]

# Simple word embedding (random, learned during training)
embed_dim = 8
embeddings = np.random.randn(vocab_size, embed_dim) * 0.1

# =====================================================
# 3. SENTIMENT CLASSIFIER
# LSTM โ†’ last hidden โ†’ sigmoid โ†’ pos/neg
# =====================================================
lstm = LSTMCell(input_size=embed_dim, hidden_size=16)
W_out = np.random.randn(16, 1) * 0.1
b_out = np.zeros((1, 1))

print("๐Ÿ˜Š Training sentiment classifier...")
for epoch in range(200):
    total_loss = 0
    for text, label in data:
        indices = encode_sentence(text)
        h = np.zeros((1, 16))
        c = np.zeros((1, 16))

        # Forward through each word
        for idx in indices:
            x = embeddings[idx:idx+1]  # (1, 8)
            h, c = lstm.forward(x, h, c)

        # Classify using last hidden state
        logit = h @ W_out + b_out
        pred = sigmoid(logit)

        # Binary cross-entropy loss
        loss = -(label * np.log(pred + 1e-12)
                + (1-label) * np.log(1-pred + 1e-12))
        total_loss += loss.item()

    if (epoch+1) % 50 == 0:
        print(f"  Epoch {epoch+1:>3} โ”‚ Loss: {total_loss/len(data):.4f}")

# =====================================================
# 4. TEST
# =====================================================
print("\n๐ŸŽฏ Predictions:")
for text, label in data:
    indices = encode_sentence(text)
    h, c = np.zeros((1,16)), np.zeros((1,16))
    for idx in indices:
        h, c = lstm.forward(embeddings[idx:idx+1], h, c)
    pred = sigmoid(h @ W_out + b_out).item()
    emoji = "๐Ÿ˜Š" if pred > 0.5 else "๐Ÿ˜ž"
    print(f"  {emoji} {pred:.2f} โ”‚ {text}")

๐ŸŽ‰ RNN Memahami Konteks!
Model membaca kata per kata, membangun "pemahaman" di hidden state, lalu mengklasifikasikan emosi berdasarkan keseluruhan kalimat. Kata "love" dan "great" mendorong ke positif, "terrible" dan "worst" ke negatif โ€” dan model mempelajari ini sendiri dari data!

๐ŸŽ‰ RNN Understands Context!
The model reads word by word, building "understanding" in the hidden state, then classifies emotion based on the entire sentence. Words like "love" and "great" push toward positive, "terrible" and "worst" toward negative โ€” and the model learns this on its own from data!

๐Ÿ“

8. Ringkasan Page 5

8. Page 5 Summary

Apa yang sudah kita pelajari
What we've learned
KonsepApa ItuKode Kunci
RNNNetwork dengan hidden state (memori)h = tanh(Wxยทx + Whยทh + b)
Hidden StateVektor "memori" yang terus di-updateh(t) = f(h(t-1), x(t))
BPTTBackprop yang di-unroll sepanjang waktuฮฃ dL/dW per timestep
Vanishing GradientGradient menghilang di sequence panjang0.9ยนโฐโฐ โ‰ˆ 0
LSTM3 gate + cell state = memori jangka panjangc = f*c + i*cฬƒ
GRU2 gate, lebih ringkas dari LSTMh = z*h + (1-z)*hฬƒ
Text GenerationPrediksi karakter berikutnya โ†’ generatesample(softmax(logits))
Sentiment AnalysisBaca kalimat โ†’ klasifikasi emosisigmoid(h_last @ W)
Gradient ClippingCegah exploding gradientg * (max/norm)
ConceptWhat It IsKey Code
RNNNetwork with hidden state (memory)h = tanh(Wxยทx + Whยทh + b)
Hidden State"Memory" vector continuously updatedh(t) = f(h(t-1), x(t))
BPTTBackprop unrolled through timeฮฃ dL/dW per timestep
Vanishing GradientGradients vanish in long sequences0.9ยนโฐโฐ โ‰ˆ 0
LSTM3 gates + cell state = long-term memoryc = f*c + i*cฬƒ
GRU2 gates, leaner than LSTMh = z*h + (1-z)*hฬƒ
Text GenerationPredict next char โ†’ generatesample(softmax(logits))
Sentiment AnalysisRead sentence โ†’ classify emotionsigmoid(h_last @ W)
Gradient ClippingPrevent exploding gradientsg * (max/norm)
โ† Page Sebelumnyaโ† Previous Page

Page 4 โ€” Regularization & Advanced Optimization

๐Ÿ“˜

Coming Next: Page 6 โ€” Word Embeddings & NLP Pipeline

Dari one-hot ke word vectors: Word2Vec, GloVe, dan cara merepresentasikan kata sebagai vektor bermakna. Membangun NLP pipeline lengkap: tokenization, embedding, model, dan evaluasi. Stay tuned!

๐Ÿ“˜

Coming Next: Page 6 โ€” Word Embeddings & NLP Pipeline

From one-hot to word vectors: Word2Vec, GloVe, and how to represent words as meaningful vectors. Building a complete NLP pipeline: tokenization, embedding, model, and evaluation. Stay tuned!