šŸ“ Artikel ini ditulis dalam Bahasa Indonesia & English
šŸ“ This article is available in English & Bahasa Indonesia

šŸ“ Belajar TensorFlow — Page 5Learn TensorFlow — Page 5

Natural Language Processing
dengan TensorFlow

Natural Language Processing
with TensorFlow

Memproses teks dengan TensorFlow. Page 5 membahas secara mendalam: kenapa teks perlu dikonversi ke angka, TextVectorization layer untuk preprocessing di dalam model, Embedding layer yang mengubah kata menjadi vektor bermakna, arsitektur RNN/LSTM/GRU di Keras, Bidirectional LSTM untuk konteks dua arah, klasifikasi sentimen IMDB reviews (25k train, 87%+ akurasi), sequence padding dan masking, pre-trained embeddings dari TF Hub dan GloVe, serta building a complete NLP pipeline.

Processing text with TensorFlow. Page 5 covers in depth: why text needs to be converted to numbers, TextVectorization layer for in-model preprocessing, Embedding layer that turns words into meaningful vectors, RNN/LSTM/GRU architecture in Keras, Bidirectional LSTM for two-way context, IMDB review sentiment classification (25k train, 87%+ accuracy), sequence padding and masking, pre-trained embeddings from TF Hub and GloVe, and building a complete NLP pipeline.

šŸ“… MaretMarch 2026ā± 32 menit baca32 min read
šŸ· NLPTextVectorizationEmbeddingLSTMGRUBiLSTMIMDBTF Hub
šŸ“š Seri Belajar TensorFlow:Learn TensorFlow Series:

šŸ“‘ Daftar Isi — Page 5

šŸ“‘ Table of Contents — Page 5

  1. Teks → Angka — Kenapa dan bagaimana
  2. TextVectorization — Preprocessing di dalam model
  3. Embedding Layer — Kata → vektor bermakna
  4. RNN & LSTM di Keras — Memproses sequence
  5. GRU — Alternatif LSTM yang lebih ringan
  6. Bidirectional LSTM — Konteks maju dan mundur
  7. Proyek: IMDB Sentiment Classifier — 87%+ akurasi
  8. Padding & Masking — Menangani sequence panjang berbeda
  9. Pre-trained Embeddings — GloVe, TF Hub, Universal Sentence Encoder
  10. Complete NLP Pipeline — Dari raw text sampai prediksi
  11. Ringkasan & Preview Page 6
  1. Text → Numbers — Why and how
  2. TextVectorization — Preprocessing inside the model
  3. Embedding Layer — Words → meaningful vectors
  4. RNN & LSTM in Keras — Processing sequences
  5. GRU — Lighter LSTM alternative
  6. Bidirectional LSTM — Forward and backward context
  7. Project: IMDB Sentiment Classifier — 87%+ accuracy
  8. Padding & Masking — Handling variable-length sequences
  9. Pre-trained Embeddings — GloVe, TF Hub, Universal Sentence Encoder
  10. Complete NLP Pipeline — From raw text to prediction
  11. Summary & Page 6 Preview
šŸ”¤

1. Teks → Angka — Fondasi NLP

1. Text → Numbers — NLP Foundation

Neural network hanya bisa memproses angka — teks harus dikonversi
Neural networks can only process numbers — text must be converted

Neural network tidak bisa langsung memproses string teks. Semua teks harus dikonversi menjadi representasi numerik. Di seri Neural Network Page 6, kita membahas Word2Vec dan one-hot encoding. Di TensorFlow, prosesnya otomatis melalui TextVectorization dan Embedding layers.

Neural networks cannot directly process text strings. All text must be converted to numerical representations. In Neural Network series Page 6, we discussed Word2Vec and one-hot encoding. In TensorFlow, this process is automated through TextVectorization and Embedding layers.

NLP Pipeline: Text → Prediction "I love this movie!" ↓ Step 1: Tokenize (split into words/subwords) ["i", "love", "this", "movie"] ↓ Step 2: Encode (word → integer index) [42, 156, 8, 2041] ↓ Step 3: Pad/Truncate (fixed length, e.g., 200) [42, 156, 8, 2041, 0, 0, 0, ... 0] ↓ Step 4: Embed (integer → dense vector) [[0.12, -0.34, 0.56, ...], ← "i" = 64-dim vector [0.78, 0.23, -0.91, ...], ← "love" = 64-dim vector ...] shape: (200, 64) ↓ Step 5: Sequence Model (LSTM/GRU/Transformer) [0.82, -0.13, ...] ← sentence representation ↓ Step 6: Classify (Dense + Sigmoid/Softmax) 0.94 → Positive sentiment! āœ… TensorFlow melakukan SEMUA langkah ini dalam satu model! TextVectorization handles Steps 1-3, Embedding handles Step 4.
šŸ“¦

2. TextVectorization — Preprocessing di Dalam Model

2. TextVectorization — Preprocessing Inside the Model

Tokenize, encode, dan pad teks — semuanya dalam satu layer Keras
Tokenize, encode, and pad text — all in one Keras layer

TextVectorization adalah layer Keras yang mengubah string teks menjadi integer indices. Keuntungan besar: preprocessing ada di dalam model, jadi saat Anda export model, preprocessing ikut — tidak perlu preprocessing terpisah saat inference.

TextVectorization is a Keras layer that converts text strings to integer indices. Big advantage: preprocessing is inside the model, so when you export the model, preprocessing comes with it — no separate preprocessing needed at inference time.

30_text_vectorization.py — TextVectorization Deep Divepython
import tensorflow as tf
from tensorflow.keras import layers

# ===========================
# 1. Create TextVectorization layer
# ===========================
vectorizer = layers.TextVectorization(
    max_tokens=10000,            # vocabulary size (top 10k words)
    output_mode='int',            # output integer indices
    output_sequence_length=200,   # pad/truncate to 200 tokens
    standardize='lower_and_strip_punctuation',  # lowercase + remove punct
    split='whitespace',           # split on spaces (default)
)

# ===========================
# 2. Adapt — build vocabulary from training data
# ===========================
train_texts = [
    "I love this movie, it was amazing!",
    "Terrible film, waste of time.",
    "Great acting and wonderful story.",
    "The worst movie I have ever seen.",
    # ... thousands more
]

vectorizer.adapt(train_texts)  # builds vocabulary!

# Check vocabulary
vocab = vectorizer.get_vocabulary()
print(f"Vocab size: {len(vocab)}")
print(f"First 20: {vocab[:20]}")
# ['', '[UNK]', 'the', 'i', 'movie', 'was', 'this', ...]
# Index 0 = padding, Index 1 = unknown word

# ===========================
# 3. Vectorize text
# ===========================
sample = tf.constant(["I love this movie"])
encoded = vectorizer(sample)
print(encoded)
# tf.Tensor([[3, 42, 6, 4, 0, 0, 0, ... 0]], shape=(1, 200))
#            i  love this movie  pad  pad  pad    pad

# ===========================
# 4. Use INSIDE a model (BEST approach!)
# ===========================
model = tf.keras.Sequential([
    vectorizer,                          # text → integers (in-model!)
    layers.Embedding(10000, 64),         # integers → vectors
    layers.Bidirectional(layers.LSTM(64)),
    layers.Dense(64, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

# Train directly with raw strings!
# model.fit(train_texts, train_labels, epochs=10)
# At inference: model.predict(["Great movie!"]) → works directly!
# No separate preprocessing needed — it's ALL in the model.

# ===========================
# 5. output_mode options
# ===========================
# 'int'       → [42, 156, 8, 2041, 0, ...] (for LSTM/Transformer)
# 'multi_hot' → [0, 1, 0, 0, 1, 1, 0, ...] (bag of words)
# 'count'     → [0, 2, 0, 0, 1, 3, 0, ...] (word counts)
# 'tf_idf'    → [0, 0.7, 0, 0, 1.2, ...] (TF-IDF weights)

šŸŽ“ Kenapa TextVectorization di Dalam Model?
Tanpa: Anda perlu preprocessing terpisah saat training DAN inference. Risiko: preprocessing mismatch antara training dan production → bug tersembunyi.
Dengan: Preprocessing = bagian dari model. Export model (.keras atau SavedModel) → preprocessing ikut. model.predict("raw text") langsung bekerja. Ini best practice untuk production NLP.

šŸŽ“ Why TextVectorization Inside the Model?
Without: You need separate preprocessing during training AND inference. Risk: preprocessing mismatch between training and production → hidden bugs.
With: Preprocessing = part of the model. Export model (.keras or SavedModel) → preprocessing included. model.predict("raw text") works directly. This is the best practice for production NLP.

🧊

3. Embedding Layer — Kata → Vektor Bermakna

3. Embedding Layer — Words → Meaningful Vectors

Mengubah integer index menjadi dense vector yang menangkap makna kata
Turning integer indices into dense vectors that capture word meaning

Di seri NN Page 6, kita membahas Word2Vec dari nol — bagaimana kata-kata yang mirip memiliki vektor yang berdekatan. Embedding layer di Keras melakukan hal yang sama: setiap kata dipetakan ke vektor dense (misalnya 64 dimensi). Vektor ini dipelajari selama training, sehingga kata-kata dengan makna serupa akan memiliki vektor yang mirip.

In NN series Page 6, we discussed Word2Vec from scratch — how similar words have nearby vectors. The Embedding layer in Keras does the same thing: each word is mapped to a dense vector (e.g., 64 dimensions). These vectors are learned during training, so words with similar meanings will have similar vectors.

31_embedding_layer.py — Embedding Deep Divepython
import tensorflow as tf
from tensorflow.keras import layers
import numpy as np

# ===========================
# 1. How Embedding works
# ===========================
embedding = layers.Embedding(
    input_dim=10000,     # vocabulary size
    output_dim=64,       # embedding dimension
    input_length=200,    # sequence length (optional)
)

# Input: batch of word indices
x = tf.constant([[42, 156, 8, 2041, 0]])  # 1 sentence, 5 words
output = embedding(x)
print(output.shape)  # (1, 5, 64) — each word → 64-dim vector!

# Internally, this is just a LOOKUP TABLE:
# embedding.weights[0] has shape (10000, 64)
# Word 42 → row 42 of the table → [0.12, -0.34, ...]
# Word 156 → row 156 → [0.78, 0.23, ...]
# These rows are LEARNED during training!

print(f"Embedding matrix: {embedding.weights[0].shape}")
# (10000, 64) = 640,000 learnable parameters

# ===========================
# 2. Visualize learned embeddings
# ===========================
# After training, similar words have similar vectors:
# cosine_similarity("good", "great") ā‰ˆ 0.85
# cosine_similarity("good", "terrible") ā‰ˆ -0.72
# cosine_similarity("king" - "man" + "woman") ā‰ˆ "queen"

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Get vectors after training:
# weights = embedding.get_weights()[0]
# vec_good = weights[vocab.index("good")]
# vec_great = weights[vocab.index("great")]
# print(cosine_similarity(vec_good, vec_great))  # ā‰ˆ 0.85

# ===========================
# 3. Embedding dimension guidelines
# ===========================
# Vocab 1k-10k:   embedding_dim = 32-64
# Vocab 10k-50k:  embedding_dim = 64-128
# Vocab 50k-100k: embedding_dim = 128-256
# Rule of thumb: dim ā‰ˆ vocab^(1/4)
# Too small: can't capture nuances
# Too large: overfitting, slow training

# ===========================
# 4. Pre-trained embeddings (GloVe, Word2Vec)
# ===========================
# Load GloVe: https://nlp.stanford.edu/projects/glove/
# glove_matrix = np.zeros((10000, 100))  # build from GloVe file
# for word, idx in word_index.items():
#     if word in glove_dict:
#         glove_matrix[idx] = glove_dict[word]

# Use pre-trained weights:
# embedding = layers.Embedding(10000, 100,
#     embeddings_initializer=tf.keras.initializers.Constant(glove_matrix),
#     trainable=False)  # freeze! (or True to fine-tune)
One-Hot vs Embedding — Why Embedding Wins One-Hot Encoding (sparse, high-dim): "cat" → [0, 0, 0, 1, 0, 0, ..., 0] (10,000-dim, mostly zeros!) "dog" → [0, 0, 0, 0, 0, 1, ..., 0] (no similarity info!) similarity("cat", "dog") = 0 ← WRONG! They're both animals! Embedding (dense, low-dim): "cat" → [0.32, -0.18, 0.74, ..., 0.12] (64-dim, every value matters!) "dog" → [0.28, -0.21, 0.71, ..., 0.15] (similar to cat!) similarity("cat", "dog") = 0.93 ← CORRECT! Both are animals. Embedding advantages: āœ… Low-dimensional (64 vs 10,000) āœ… Captures semantic similarity āœ… Learned from data (or pre-trained) āœ… Works as input to LSTM/Transformer
šŸ”„

4. RNN & LSTM di Keras — Memproses Sequence

4. RNN & LSTM in Keras — Processing Sequences

Di seri NN Page 5, kita buat LSTM manual. Sekarang: satu baris Keras.
In NN series Page 5, we built LSTM manually. Now: one line of Keras.

Di seri Neural Network Page 5, kita mengimplementasi VanillaRNN dan LSTM dari nol — forget gate, input gate, output gate, cell state — ratusan baris NumPy. Sekarang di Keras: layers.LSTM(64). Tapi konsepnya tetap identik.

In Neural Network series Page 5, we implemented VanillaRNN and LSTM from scratch — forget gate, input gate, output gate, cell state — hundreds of lines of NumPy. Now in Keras: layers.LSTM(64). But the concepts are identical.

32_lstm_keras.py — LSTM & GRU in Keraspython
import tensorflow as tf
from tensorflow.keras import layers

# ===========================
# 1. Simple LSTM
# ===========================
# Input: (batch, timesteps, features)
# For text: (batch, seq_len, embedding_dim)

lstm_layer = layers.LSTM(
    units=64,                   # output dimension (hidden size)
    return_sequences=False,     # only return LAST output
    # return_sequences=True,     # return ALL timestep outputs
    dropout=0.2,                # dropout on inputs
    recurrent_dropout=0.2,      # dropout on recurrent state
)
# Input:  (batch, 200, 64)  → 200 timesteps, 64 features
# Output: (batch, 64)       → last hidden state (return_sequences=False)
# Output: (batch, 200, 64)  → all hidden states (return_sequences=True)

# ===========================
# 2. Stacking LSTM layers
# ===========================
model = tf.keras.Sequential([
    layers.Embedding(10000, 64, input_length=200),

    # Layer 1: return ALL timestep outputs → feed to next LSTM
    layers.LSTM(128, return_sequences=True, dropout=0.2),

    # Layer 2: return only LAST output → feed to Dense
    layers.LSTM(64, return_sequences=False, dropout=0.2),

    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

# CRITICAL: return_sequences
# When stacking LSTMs: all layers EXCEPT the last need return_sequences=True
# Last LSTM: return_sequences=False (or use GlobalAveragePooling1D)

# ===========================
# 3. Parameter count
# ===========================
# LSTM(64) with input_dim=64:
# Parameters = 4 Ɨ ((input_dim + units + 1) Ɨ units)
#            = 4 Ɨ ((64 + 64 + 1) Ɨ 64)
#            = 4 Ɨ 8,256 = 33,024
# The "4" = 4 gates: forget, input, cell candidate, output

model.summary()
# Embedding:  10000 Ɨ 64 = 640,000
# LSTM 128:   4 Ɨ (64+128+1) Ɨ 128 = 98,816
# LSTM 64:    4 Ɨ (128+64+1) Ɨ 64 = 49,408
# Dense:      64 Ɨ 64 + 64 = 4,160
# Total:      ~792k parameters

šŸŽ“ return_sequences: Kapan True vs False?
return_sequences=False (default): output hanya hidden state dari timestep terakhir. Shape: (batch, units). Gunakan saat: layer LSTM terakhir sebelum Dense, atau untuk classification.
return_sequences=True: output hidden state dari setiap timestep. Shape: (batch, timesteps, units). Gunakan saat: stacking LSTM layers, atau untuk sequence-to-sequence tasks (translation, tagging).

Rule: Stacking LSTM? Semua return_sequences=True kecuali LSTM terakhir.

šŸŽ“ return_sequences: When True vs False?
return_sequences=False (default): output only the hidden state from the last timestep. Shape: (batch, units). Use when: last LSTM layer before Dense, or for classification.
return_sequences=True: output hidden state from every timestep. Shape: (batch, timesteps, units). Use when: stacking LSTM layers, or for sequence-to-sequence tasks (translation, tagging).

Rule: Stacking LSTMs? All return_sequences=True except the last LSTM.

⚔

5. GRU — Alternatif LSTM yang Lebih Ringan

5. GRU — Lighter LSTM Alternative

2 gates vs 4 gates — lebih cepat, sering sama bagusnya
2 gates vs 4 gates — faster, often just as good
33_gru.py — GRU vs LSTMpython
from tensorflow.keras import layers

# GRU: 2 gates (reset, update) vs LSTM: 3 gates + cell state
gru_model = tf.keras.Sequential([
    layers.Embedding(10000, 64),
    layers.GRU(64, return_sequences=True, dropout=0.2),
    layers.GRU(32, dropout=0.2),
    layers.Dense(1, activation='sigmoid')
])

# GRU(64) params = 3 Ɨ ((64 + 64 + 1) Ɨ 64) = 24,768
# LSTM(64) params = 4 Ɨ ((64 + 64 + 1) Ɨ 64) = 33,024
# GRU has 25% fewer parameters → faster training!

# ===========================
# When to use which?
# ===========================
# GRU:  faster, fewer params, good for shorter sequences
#       Try GRU first for speed, switch to LSTM if needed
# LSTM: more expressive (separate cell state), better for
#       very long sequences, slightly more accurate on some tasks
# In practice: difference is often < 1% accuracy
AspekLSTMGRU
Gates3 (forget, input, output) + cell2 (reset, update)
Parameters4 Ɨ (input + hidden + 1) Ɨ hidden3 Ɨ (input + hidden + 1) Ɨ hidden
SpeedSlower~25% faster
MemorySeparate cell stateCombined hidden/cell
Long SequencesSedikit lebih baikGood enough
RecommendationDefault untuk NLPCoba dulu untuk speed
AspectLSTMGRU
Gates3 (forget, input, output) + cell2 (reset, update)
Parameters4 Ɨ (input + hidden + 1) Ɨ hidden3 Ɨ (input + hidden + 1) Ɨ hidden
SpeedSlower~25% faster
MemorySeparate cell stateCombined hidden/cell
Long SequencesSlightly betterGood enough
RecommendationDefault for NLPTry first for speed
ā†”ļø

6. Bidirectional LSTM — Konteks Dua Arah

6. Bidirectional LSTM — Two-Way Context

Baca sequence maju DAN mundur — tangkap konteks dari kedua arah
Read sequence forward AND backward — capture context from both directions

LSTM biasa hanya membaca dari kiri ke kanan. Tapi dalam bahasa, konteks sering datang dari kedua arah. Contoh: "The movie was not good" — kata "not" mengubah makna "good" yang datang setelahnya. Bidirectional LSTM menjalankan dua LSTM secara paralel: satu maju (→) dan satu mundur (←), lalu menggabungkan output keduanya.

Regular LSTM only reads left to right. But in language, context often comes from both directions. Example: "The movie was not good" — the word "not" changes the meaning of "good" that comes after it. Bidirectional LSTM runs two LSTMs in parallel: one forward (→) and one backward (←), then combines both outputs.

34_bidirectional.py — BiLSTM Completepython
from tensorflow.keras import layers
import tensorflow as tf

# ===========================
# 1. Bidirectional wrapper
# ===========================
bilstm = layers.Bidirectional(
    layers.LSTM(64, return_sequences=True),
    merge_mode='concat'  # default: concatenate forward + backward
    # merge_mode='sum'   # add forward + backward
    # merge_mode='mul'   # multiply
    # merge_mode='ave'   # average
)
# Input:  (batch, 200, 64)    → 200 timesteps, 64 features
# Output: (batch, 200, 128)   → 64 forward + 64 backward = 128!

# ===========================
# 2. Full BiLSTM model for text classification
# ===========================
model = tf.keras.Sequential([
    layers.Embedding(10000, 64, input_length=200),

    # BiLSTM layer 1: captures bidirectional patterns
    layers.Bidirectional(layers.LSTM(64, return_sequences=True,
                                     dropout=0.2, recurrent_dropout=0.2)),
    # Output: (batch, 200, 128) — 64 forward + 64 backward

    # BiLSTM layer 2: further refine
    layers.Bidirectional(layers.LSTM(32, dropout=0.2)),
    # Output: (batch, 64) — 32 forward + 32 backward

    # Classifier
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy',
              metrics=['accuracy'])
model.summary()
# Embedding: 640k | BiLSTM1: 66k | BiLSTM2: 25k | Dense: 4k
# Total: ~735k parameters

# ===========================
# 3. How Bidirectional helps
# ===========================
# "The movie was NOT good"
# Forward LSTM (→): sees "not" BEFORE "good" → context captured
# Backward LSTM (←): sees "good" BEFORE "not" → also captured!
# Combined: model understands "not good" = negative from BOTH sides

# Compare accuracy:
# Unidirectional LSTM: ~84% on IMDB
# Bidirectional LSTM:  ~87% on IMDB ← +3% from bidirectional!
šŸŽ¬

7. Proyek: IMDB Sentiment Classifier — 87%+ Akurasi

7. Project: IMDB Sentiment Classifier — 87%+ Accuracy

25,000 reviews positif + 25,000 negatif — binary sentiment classification
25,000 positive reviews + 25,000 negative — binary sentiment classification
35_imdb_complete.py — IMDB Sentiment Analysis šŸ”„python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np

# ===========================
# 1. LOAD IMDB DATASET
# ===========================
VOCAB_SIZE = 10000    # top 10k most common words
MAX_LEN = 200         # pad/truncate to 200 words

(X_train, y_train), (X_test, y_test) = keras.datasets.imdb.load_data(
    num_words=VOCAB_SIZE)

print(f"Train: {len(X_train)} reviews")   # 25,000
print(f"Test:  {len(X_test)} reviews")    # 25,000
print(f"Sample lengths: {[len(x) for x in X_train[:5]]}")
# [218, 189, 141, 550, 147] — variable length!

# Pad sequences to fixed length
X_train = keras.utils.pad_sequences(X_train, maxlen=MAX_LEN,
                                     padding='post', truncating='post')
X_test = keras.utils.pad_sequences(X_test, maxlen=MAX_LEN,
                                    padding='post', truncating='post')
print(f"After padding: {X_train.shape}")  # (25000, 200)

# ===========================
# 2. BUILD BiLSTM MODEL
# ===========================
model = keras.Sequential([
    # Embedding: word index → dense vector
    layers.Embedding(VOCAB_SIZE, 64, input_length=MAX_LEN,
                     mask_zero=True),  # mask padding (0s)!

    # BiLSTM layers
    layers.Bidirectional(layers.LSTM(64, return_sequences=True,
                                     dropout=0.2, recurrent_dropout=0.2)),
    layers.Bidirectional(layers.LSTM(32, dropout=0.2)),

    # Classifier
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

# ===========================
# 3. COMPILE
# ===========================
model.compile(
    optimizer=keras.optimizers.Adam(1e-3),
    loss='binary_crossentropy',
    metrics=['accuracy']
)
model.summary()

# ===========================
# 4. TRAIN
# ===========================
history = model.fit(
    X_train, y_train,
    epochs=15,
    batch_size=64,
    validation_split=0.2,
    callbacks=[
        keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True),
        keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=2)
    ]
)

# ===========================
# 5. EVALUATE
# ===========================
test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print(f"\nšŸŽ¬ Test Accuracy: {test_acc:.1%}")
# šŸŽ¬ Test Accuracy: 87.2% with BiLSTM! šŸŽ‰
# Compare: simple LSTM = 84%, BiLSTM = 87%, BERT (Page 6) = 95%+

# ===========================
# 6. PREDICT on new reviews
# ===========================
word_index = keras.datasets.imdb.get_word_index()
reverse_index = {v+3: k for k, v in word_index.items()}

def encode_review(text):
    words = text.lower().split()
    encoded = [word_index.get(w, 2) + 3 for w in words]
    padded = keras.utils.pad_sequences([encoded], maxlen=MAX_LEN)
    return padded

review = "This film was absolutely terrible waste of time"
pred = model.predict(encode_review(review))[0,0]
print(f"'{review}'")
print(f"Sentiment: {'Positive' if pred > 0.5 else 'Negative'} ({pred:.1%})")
# Sentiment: Negative (8.3%) āœ“

šŸŽ¬ 87.2% Akurasi dengan BiLSTM!
Bandingkan evolusi akurasi kita:
• Seri NN (manual NumPy): ~80% (ratusan baris kode)
• Simple LSTM Keras: ~84% (20 baris kode)
• BiLSTM Keras: 87.2% (25 baris kode)
• BERT fine-tuned (Page 6): 95%+ (akan datang!)
Setiap teknik memberikan peningkatan — tapi BERT akan menjadi game-changer total.

šŸŽ¬ 87.2% Accuracy with BiLSTM!
Compare our accuracy evolution:
• NN Series (manual NumPy): ~80% (hundreds of lines)
• Simple LSTM Keras: ~84% (20 lines of code)
• BiLSTM Keras: 87.2% (25 lines of code)
• BERT fine-tuned (Page 6): 95%+ (coming next!)
Each technique brings improvement — but BERT will be a total game-changer.

šŸ“

8. Padding & Masking — Sequence Panjang Berbeda

8. Padding & Masking — Variable-Length Sequences

Menangani kalimat yang panjangnya berbeda-beda tanpa mencemari model
Handling sentences of different lengths without polluting the model
36_padding_masking.py — Padding & Maskingpython
import tensorflow as tf
from tensorflow.keras import layers

# ===========================
# 1. Padding — make all sequences same length
# ===========================
sequences = [[4, 2, 8],               # 3 words
             [1, 5, 9, 3, 7],        # 5 words
             [6]]                     # 1 word

padded = tf.keras.utils.pad_sequences(sequences, maxlen=5,
    padding='post',      # add zeros at END (recommended for RNN)
    truncating='post',   # if too long, cut from END
    value=0              # padding value (0 = default)
)
print(padded)
# [[4, 2, 8, 0, 0],
#  [1, 5, 9, 3, 7],
#  [6, 0, 0, 0, 0]]

# ===========================
# 2. Masking — tell model to IGNORE padding
# ===========================
# Method 1: mask_zero=True in Embedding
embedding = layers.Embedding(10000, 64, mask_zero=True)
# This tells downstream layers: "index 0 = padding, ignore it!"
# LSTM and GRU automatically use this mask → skip padded timesteps

# Method 2: Masking layer (explicit)
model = tf.keras.Sequential([
    layers.Embedding(10000, 64),
    layers.Masking(mask_value=0.0),   # explicit mask on zero vectors
    layers.LSTM(64),
    layers.Dense(1, activation='sigmoid')
])

# ===========================
# 3. Why masking matters
# ===========================
# Without masking: LSTM processes padding tokens as real data
#   → "I love this movie 0 0 0 0 0 0" 
#   → LSTM "reads" 6 zeros → pollutes hidden state!
# With masking: LSTM skips padding tokens completely
#   → Only processes "I love this movie" → cleaner output!
# Impact: +1-2% accuracy improvement, especially for short texts
🌐

9. Pre-trained Embeddings — GloVe & TF Hub

9. Pre-trained Embeddings — GloVe & TF Hub

Manfaatkan embedding yang sudah belajar dari miliaran kata
Leverage embeddings already trained on billions of words
37_pretrained_embeddings.py — GloVe & TF Hubpython
import tensorflow as tf
import numpy as np

# ===========================
# Method 1: Load GloVe embeddings
# Download: https://nlp.stanford.edu/projects/glove/
# ===========================
def load_glove(filepath, embedding_dim=100):
    """Load GloVe vectors from file"""
    embeddings = {}
    with open(filepath, encoding='utf-8') as f:
        for line in f:
            values = line.split()
            word = values[0]
            vector = np.asarray(values[1:], dtype='float32')
            embeddings[word] = vector
    print(f"Loaded {len(embeddings)} word vectors")
    return embeddings

# glove = load_glove('glove.6B.100d.txt')  # 400k words, 100-dim

# Build embedding matrix for your vocabulary
def build_embedding_matrix(word_index, glove, vocab_size, embed_dim):
    matrix = np.zeros((vocab_size, embed_dim))
    for word, idx in word_index.items():
        if idx < vocab_size and word in glove:
            matrix[idx] = glove[word]
    return matrix

# embedding_matrix = build_embedding_matrix(word_index, glove, 10000, 100)

# Use in model (freeze or fine-tune):
# embedding = layers.Embedding(10000, 100,
#     embeddings_initializer=tf.keras.initializers.Constant(embedding_matrix),
#     trainable=False)  # freeze pre-trained weights

# ===========================
# Method 2: TF Hub — pre-trained sentence encoders
# ===========================
# Universal Sentence Encoder — maps ANY sentence to 512-dim vector
# import tensorflow_hub as hub
# 
# embed = hub.KerasLayer(
#     "https://tfhub.dev/google/universal-sentence-encoder/4",
#     trainable=False)  # 512-dim output per sentence
# 
# # Super simple classifier:
# model = tf.keras.Sequential([
#     embed,                                  # sentence → 512-dim
#     layers.Dense(64, activation='relu'),
#     layers.Dropout(0.3),
#     layers.Dense(1, activation='sigmoid')
# ])
# → 90%+ accuracy with minimal code!
# → Works with raw string input — no tokenization needed!

# ===========================
# Method 3: NNLM (Neural Network Language Model)
# ===========================
# embed = hub.KerasLayer(
#     "https://tfhub.dev/google/nnlm-en-dim128/2",
#     trainable=True)  # 128-dim, fine-tunable

šŸŽ“ Kapan Pakai Pre-trained Embeddings?
Dataset kecil (<10k samples): Selalu gunakan pre-trained! Model Anda tidak punya cukup data untuk belajar embedding yang bagus dari nol.
Dataset besar (>100k samples): Bisa train from scratch, tapi pre-trained + fine-tune sering tetap lebih baik.
TF Hub USE vs GloVe: USE lebih mudah (input = raw string), GloVe lebih fleksibel (per-word embeddings). Untuk classification cepat: USE. Untuk research: GloVe/Word2Vec.

šŸŽ“ When to Use Pre-trained Embeddings?
Small dataset (<10k samples): Always use pre-trained! Your model doesn't have enough data to learn good embeddings from scratch.
Large dataset (>100k samples): Can train from scratch, but pre-trained + fine-tune is often still better.
TF Hub USE vs GloVe: USE is easier (input = raw strings), GloVe is more flexible (per-word embeddings). For quick classification: USE. For research: GloVe/Word2Vec.

šŸ”§

10. Complete NLP Pipeline — Raw Text → Prediksi

10. Complete NLP Pipeline — Raw Text → Prediction

Template production: TextVectorization + Embedding + BiLSTM + inference
Production template: TextVectorization + Embedding + BiLSTM + inference
38_nlp_pipeline.py — Production NLP Template šŸ”„python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# ═══════════════════════════════════════
# šŸ”„ PRODUCTION NLP PIPELINE TEMPLATE
# Input: raw text strings
# Output: sentiment/class prediction
# ═══════════════════════════════════════

# Config
VOCAB_SIZE = 20000
MAX_LEN = 200
EMBED_DIM = 64

# 1. Vectorizer (adapt on training data)
vectorizer = layers.TextVectorization(
    max_tokens=VOCAB_SIZE,
    output_sequence_length=MAX_LEN)
vectorizer.adapt(train_texts)  # build vocabulary

# 2. Model (preprocessing INSIDE!)
model = keras.Sequential([
    vectorizer,                                      # text → integers
    layers.Embedding(VOCAB_SIZE, EMBED_DIM, mask_zero=True),
    layers.Bidirectional(layers.LSTM(64, return_sequences=True,
                                     dropout=0.2)),
    layers.Bidirectional(layers.LSTM(32, dropout=0.2)),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')          # binary
    # layers.Dense(NUM_CLASSES, activation='softmax')  # multi-class
])

# 3. Train with RAW STRINGS!
model.compile(optimizer='adam', loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(
    train_texts, train_labels,   # raw strings + labels!
    epochs=15, batch_size=64,
    validation_split=0.2,
    callbacks=[keras.callbacks.EarlyStopping(patience=3,
               restore_best_weights=True)]
)

# 4. Predict on raw text — NO preprocessing needed!
reviews = [
    "This movie was absolutely fantastic! Best film of the year.",
    "Terrible waste of time. Worst movie ever made.",
    "It was okay, nothing special but not bad either.",
]

predictions = model.predict(reviews)
for review, pred in zip(reviews, predictions):
    sentiment = "Positive" if pred > 0.5 else "Negative"
    print(f"{sentiment} ({pred[0]:.1%}): {review[:50]}...")
# Positive (94.2%): This movie was absolutely fantastic! Best fi...
# Negative (3.1%):  Terrible waste of time. Worst movie ever ma...
# Positive (61.3%): It was okay, nothing special but not bad ei...

# 5. Save — preprocessing included!
model.save("sentiment_analyzer.keras")
# loaded = keras.models.load_model("sentiment_analyzer.keras")
# loaded.predict(["Great movie!"]) → works with raw text! āœ…
šŸ“

11. Ringkasan Page 5

11. Page 5 Summary

Semua yang sudah kita pelajari
Everything we learned
KonsepApa ItuKode Kunci
TextVectorizationTeks → integer di dalam modelTextVectorization(max_tokens=10000)
EmbeddingInteger → dense vector bermaknaEmbedding(10000, 64, mask_zero=True)
LSTMSequence model: 3 gates + cellLSTM(64, return_sequences=True)
GRULSTM lebih ringan: 2 gatesGRU(64, dropout=0.2)
BidirectionalBaca maju + mundurBidirectional(LSTM(64))
PaddingSamakan panjang sequencepad_sequences(X, maxlen=200)
MaskingAbaikan padding di modelmask_zero=True
Pre-trainedEmbedding yang sudah terlatihGloVe, TF Hub USE, NNLM
Production PipelineRaw text → predictionTextVectorization → Embedding → BiLSTM
ConceptWhat It IsKey Code
TextVectorizationText → integer inside modelTextVectorization(max_tokens=10000)
EmbeddingInteger → meaningful dense vectorEmbedding(10000, 64, mask_zero=True)
LSTMSequence model: 3 gates + cellLSTM(64, return_sequences=True)
GRULighter LSTM: 2 gatesGRU(64, dropout=0.2)
BidirectionalRead forward + backwardBidirectional(LSTM(64))
PaddingMake sequences same lengthpad_sequences(X, maxlen=200)
MaskingIgnore padding in modelmask_zero=True
Pre-trainedAlready-trained embeddingsGloVe, TF Hub USE, NNLM
Production PipelineRaw text → predictionTextVectorization → Embedding → BiLSTM
← Page Sebelumnya← Previous Page

Page 4 — tf.data Pipeline & Performance

šŸ“˜

Coming Next: Page 6 — Transformer & BERT di TensorFlow

Arsitektur modern yang merevolusi NLP: Multi-Head Attention layer (built-in Keras!), membangun Transformer Encoder dari building blocks, Positional Encoding, fine-tuning BERT dari TF Hub, integrasi Hugging Face + TensorFlow, text classification 95%+ dengan BERT, dan perbandingan LSTM vs Transformer. Game-changer yang membuat LSTM terlihat kuno!

šŸ“˜

Coming Next: Page 6 — Transformer & BERT in TensorFlow

The modern architecture that revolutionized NLP: Multi-Head Attention layer (built-in Keras!), building Transformer Encoder from building blocks, Positional Encoding, fine-tuning BERT from TF Hub, Hugging Face + TensorFlow integration, 95%+ text classification with BERT, and LSTM vs Transformer comparison. The game-changer that makes LSTM look outdated!