๐Ÿ“ Artikel ini ditulis dalam Bahasa Indonesia & English
๐Ÿ“ This article is available in English & Bahasa Indonesia

๐Ÿง  Belajar Hugging Face โ€” Page 3Learn Hugging Face โ€” Page 3

Fine-Tuning GPT &
Text Generation

Fine-Tuning GPT &
Text Generation

Dari BERT (memahami teks) ke GPT (menghasilkan teks). Page 3 membahas super detail: perbedaan fundamental encoder (BERT) vs decoder (GPT) โ€” arsitektur, attention mask, dan use case masing-masing, Causal Language Modeling (CLM) โ€” bagaimana GPT belajar memprediksi kata berikutnya, text generation dari GPT-2 menggunakan pipeline dan manual generate(), SETIAP parameter generation dijelaskan (temperature, top_k, top_p, repetition_penalty, beam search, sampling strategies), fine-tuning GPT-2 pada custom text corpus (puisi, kode, dialog), Instruction Tuning โ€” mengubah GPT menjadi assistant yang patuh instruksi, prompt templates dan formatting, membangun chatbot CLI sederhana dengan GPT-2 fine-tuned, di mana menjalankan (Colab setup untuk GPT), dan perbandingan model generatif (GPT-2, Bloom, LLaMA, Mistral).

From BERT (understanding text) to GPT (generating text). Page 3 covers in super detail: fundamental difference between encoder (BERT) vs decoder (GPT) โ€” architecture, attention masks, and use cases, Causal Language Modeling (CLM) โ€” how GPT learns to predict the next word, text generation from GPT-2 using pipeline and manual generate(), EVERY generation parameter explained (temperature, top_k, top_p, repetition_penalty, beam search, sampling strategies), fine-tuning GPT-2 on custom text corpus (poetry, code, dialogue), Instruction Tuning โ€” turning GPT into an instruction-following assistant, prompt templates and formatting, building a simple CLI chatbot with fine-tuned GPT-2, where to run it (Colab setup for GPT), and generative model comparison (GPT-2, Bloom, LLaMA, Mistral).

๐Ÿ“… MaretMarch 2026โฑ 45 menit baca45 min read
๐Ÿท GPT-2Text GenerationCausal LMTemperatureTop-KTop-PBeam SearchInstruction TuningChatbot
๐Ÿ“š Seri Belajar Hugging Face:Learn Hugging Face Series:

๐Ÿ“‘ Daftar Isi โ€” Page 3

๐Ÿ“‘ Table of Contents โ€” Page 3

  1. Encoder vs Decoder โ€” BERT vs GPT: dua dunia berbeda
  2. Causal Language Modeling โ€” Bagaimana GPT belajar
  3. Di Mana Jalankan GPT? โ€” Colab setup, VRAM, model sizes
  4. Text Generation dengan Pipeline โ€” 1 baris โ†’ generate teks
  5. Manual generate() โ€” Full control: token by token
  6. SETIAP Parameter Generation โ€” temperature, top_k, top_p, beam, dll
  7. Sampling Strategies Visual โ€” Kenapa temperature 0.7 vs 1.5 berbeda drastis
  8. Fine-Tuning GPT-2 pada Custom Corpus โ€” Puisi, kode, dialog
  9. Instruction Tuning โ€” GPT โ†’ assistant yang patuh
  10. Proyek: Chatbot CLI Sederhana โ€” GPT-2 fine-tuned interaktif
  11. Model Generatif Lain โ€” Bloom, LLaMA, Mistral, Gemma
  12. Ringkasan & Preview Page 4
  1. Encoder vs Decoder โ€” BERT vs GPT: two different worlds
  2. Causal Language Modeling โ€” How GPT learns
  3. Where to Run GPT? โ€” Colab setup, VRAM, model sizes
  4. Text Generation with Pipeline โ€” 1 line โ†’ generate text
  5. Manual generate() โ€” Full control: token by token
  6. EVERY Generation Parameter โ€” temperature, top_k, top_p, beam, etc
  7. Sampling Strategies Visual โ€” Why temperature 0.7 vs 1.5 differs drastically
  8. Fine-Tuning GPT-2 on Custom Corpus โ€” Poetry, code, dialogue
  9. Instruction Tuning โ€” GPT โ†’ instruction-following assistant
  10. Project: Simple CLI Chatbot โ€” Interactive fine-tuned GPT-2
  11. Other Generative Models โ€” Bloom, LLaMA, Mistral, Gemma
  12. Summary & Page 4 Preview
โš–๏ธ

1. Encoder vs Decoder โ€” BERT vs GPT: Dua Dunia Berbeda

1. Encoder vs Decoder โ€” BERT vs GPT: Two Different Worlds

BERT membaca SEMUA kata sekaligus. GPT membaca satu per satu dari kiri ke kanan.
BERT reads ALL words at once. GPT reads one by one, left to right.

Di Page 2, kita fine-tune BERT (encoder) untuk memahami teks โ€” klasifikasi sentimen, NER, QA. Sekarang kita beralih ke GPT (decoder) untuk menghasilkan teks โ€” menulis cerita, menjawab pertanyaan, coding. Perbedaannya bukan hanya tugas โ€” arsitekturnya fundamental berbeda.

In Page 2, we fine-tuned BERT (encoder) to understand text โ€” sentiment classification, NER, QA. Now we switch to GPT (decoder) to generate text โ€” writing stories, answering questions, coding. The difference isn't just the task โ€” the architecture is fundamentally different.

BERT (Encoder) vs GPT (Decoder) โ€” Perbedaan Fundamental BERT (Encoder) โ€” Bidirectional โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Input: "The movie was [MASK] good" Attention: SETIAP kata melihat SEMUA kata lain (bidirectional) "The" attends to โ†’ "The" "movie" "was" "[MASK]" "good" "movie" attends to โ†’ "The" "movie" "was" "[MASK]" "good" "was" attends to โ†’ "The" "movie" "was" "[MASK]" "good" "[MASK]" attends to โ†’ "The" "movie" "was" "[MASK]" "good" "good" attends to โ†’ "The" "movie" "was" "[MASK]" "good" Task: Tebak [MASK] โ†’ "really" (Masked Language Model) Use: Classification, NER, QA, Similarity โ† MEMAHAMI teks GPT (Decoder) โ€” Causal / Left-to-Right โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Input: "The movie was really" Attention: Setiap kata HANYA melihat kata SEBELUMNYA (causal) "The" attends to โ†’ "The" "movie" attends to โ†’ "The" "movie" "was" attends to โ†’ "The" "movie" "was" "really" attends to โ†’ "The" "movie" "was" "really" โ†’ Predict next: "good" โ† kata berikutnya! Task: Prediksi kata BERIKUTNYA โ†’ "good" (Causal Language Model) Use: Text generation, Chat, Code, Story โ† MENGHASILKAN teks Key Insight: BERT melihat masa depan (bidirectional) โ†’ bagus untuk MEMAHAMI GPT TIDAK melihat masa depan (causal) โ†’ bagus untuk MENGHASILKAN (karena saat generate, kata berikutnya memang belum ada!)
AspekBERT (Encoder)GPT (Decoder)T5 (Encoder-Decoder)
AttentionBidirectional (lihat semua)Causal (lihat kiri saja)Encoder: bi, Decoder: causal
Pre-trainingMasked LM: tebak [MASK]Next token: prediksi berikutnyaSpan corruption
OutputRepresentation (embedding)Next token probabilitySequence output
Best ForClassification, NER, QAGeneration, Chat, CodeTranslation, Summarization
Contoh ModelBERT, RoBERTa, DeBERTaGPT-2, LLaMA, MistralT5, BART, mBART
HF Auto ClassAutoModelForSequenceClassificationAutoModelForCausalLMAutoModelForSeq2SeqLM
Page di Seri IniPage 2 (fine-tune BERT)Page 3 (ini!)Page 4 (T5, translation)
AspectBERT (Encoder)GPT (Decoder)T5 (Encoder-Decoder)
AttentionBidirectional (sees all)Causal (sees left only)Encoder: bi, Decoder: causal
Pre-trainingMasked LM: guess [MASK]Next token: predict nextSpan corruption
OutputRepresentation (embedding)Next token probabilitySequence output
Best ForClassification, NER, QAGeneration, Chat, CodeTranslation, Summarization
Model ExamplesBERT, RoBERTa, DeBERTaGPT-2, LLaMA, MistralT5, BART, mBART
HF Auto ClassAutoModelForSequenceClassificationAutoModelForCausalLMAutoModelForSeq2SeqLM
Page in This SeriesPage 2 (fine-tune BERT)Page 3 (this!)Page 4 (T5, translation)

๐ŸŽ“ Kenapa GPT Tidak Bisa "Melihat ke Depan"?
Bayangkan Anda sedang menulis kalimat โ€” Anda menulis satu kata pada satu waktu, dari kiri ke kanan. Saat menulis kata ke-5, kata ke-6 belum ada! GPT bekerja persis seperti ini: ia memprediksi kata berikutnya berdasarkan kata-kata sebelumnya saja.

Jika GPT bisa melihat ke depan (seperti BERT), ia akan "menyontek" โ€” tidak perlu belajar memprediksi, tinggal copy dari masa depan. Inilah kenapa causal attention mask sangat penting: memblokir informasi dari posisi masa depan selama training dan inference.

๐ŸŽ“ Why Can't GPT "See the Future"?
Imagine you're writing a sentence โ€” you write one word at a time, left to right. When writing word 5, word 6 doesn't exist yet! GPT works exactly like this: it predicts the next word based on previous words only.

If GPT could see ahead (like BERT), it would "cheat" โ€” no need to learn prediction, just copy from the future. This is why the causal attention mask is so important: it blocks information from future positions during training and inference.

๐Ÿ”ฎ

2. Causal Language Modeling โ€” Bagaimana GPT Belajar

2. Causal Language Modeling โ€” How GPT Learns

Tugas paling sederhana yang paling powerful: prediksi kata berikutnya, berulang-ulang
The simplest yet most powerful task: predict the next word, over and over

Causal Language Modeling (CLM) adalah tugas training GPT: diberikan sequence kata, prediksi kata berikutnya. Ini diulang untuk setiap posisi dalam sequence. Contoh: dari kalimat "Saya suka makan nasi goreng", GPT belajar:

Causal Language Modeling (CLM) is GPT's training task: given a sequence of words, predict the next word. This is repeated for every position in the sequence. Example: from the sentence "I love eating fried rice", GPT learns:

Causal Language Modeling โ€” Prediksi Kata Berikutnya Input: "Saya suka makan nasi goreng" Training targets (untuk SETIAP posisi): Input tokens: โ†’ Predict: โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ [BOS] โ†’ "Saya" (dari nothing, prediksi kata pertama) [BOS] Saya โ†’ "suka" (dari "Saya", prediksi kata kedua) [BOS] Saya suka โ†’ "makan" (dari "Saya suka", prediksi kata ketiga) [BOS] Saya suka makan โ†’ "nasi" [BOS] Saya suka makan nasi โ†’ "goreng" [BOS] Saya suka makan nasi goreng โ†’ [EOS] Loss = CrossEntropy antara predicted token dan actual next token Dirata-rata untuk SEMUA posisi dalam sequence Inilah yang membuat CLM powerful: Dari SATU kalimat, GPT mendapat 6 training examples! Dari Wikipedia (3.3 miliar kata) โ†’ miliaran training examples. GPT belajar grammar, fakta, reasoning โ€” semua dari prediksi kata.
18_causal_lm_concept.py โ€” Causal LM Loss Calculationpython
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# ===========================
# 1. Load GPT-2
# ===========================
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# GPT-2 sizes:
# "gpt2"        โ†’ 117M params, ~500MB  (small, fits Colab easily)
# "gpt2-medium" โ†’ 345M params, ~1.4GB  (medium)
# "gpt2-large"  โ†’ 774M params, ~3.1GB  (large, tight on T4)
# "gpt2-xl"     โ†’ 1.5B params, ~6.2GB  (XL, needs >16GB VRAM)

# ===========================
# 2. Tokenize & compute loss
# ===========================
text = "The capital of France is Paris"
inputs = tokenizer(text, return_tensors="pt")

# For CLM: labels = input_ids (shifted internally by the model!)
# Model predicts token[i+1] from tokens[0:i]
outputs = model(**inputs, labels=inputs["input_ids"])

print(f"Loss: {outputs.loss.item():.4f}")
# Loss: 3.2145 (lower = better at predicting next tokens)
print(f"Perplexity: {torch.exp(outputs.loss).item():.2f}")
# Perplexity: 24.89 (lower = better, 1.0 = perfect prediction)

# ===========================
# 3. What the model sees internally:
# ===========================
tokens = tokenizer.tokenize(text)
print(f"Tokens: {tokens}")
# ['The', 'ฤ capital', 'ฤ of', 'ฤ France', 'ฤ is', 'ฤ Paris']
# ฤ  = space prefix (GPT-2 BPE tokenizer)

# Model internally shifts labels:
# Position 0: sees "The"         โ†’ should predict "capital"
# Position 1: sees "The capital"  โ†’ should predict "of"
# Position 2: sees "The capital of" โ†’ should predict "France"
# Position 3: sees "The capital of France" โ†’ should predict "is"
# Position 4: sees "The capital of France is" โ†’ should predict "Paris"
# Loss = average cross-entropy over all positions

# ===========================
# 4. Check what GPT-2 predicts at each position
# ===========================
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits  # (1, seq_len, vocab_size=50257)

for i in range(len(tokens) - 1):
    predicted_id = logits[0, i].argmax().item()
    predicted_token = tokenizer.decode(predicted_id)
    actual_token = tokens[i + 1]
    match = "โœ…" if predicted_token.strip() == actual_token.replace("ฤ ", "") else "โŒ"
    context = " ".join(tokens[:i+1]).replace("ฤ ", "")
    print(f"  '{context}' โ†’ predicted: '{predicted_token}' | actual: '{actual_token}' {match}")
# 'The'                        โ†’ predicted: ' first'   | actual: 'ฤ capital' โŒ
# 'The capital'                โ†’ predicted: ' of'      | actual: 'ฤ of'      โœ…
# 'The capital of'             โ†’ predicted: ' the'     | actual: 'ฤ France'  โŒ
# 'The capital of France'      โ†’ predicted: ' is'      | actual: 'ฤ is'      โœ…
# 'The capital of France is'   โ†’ predicted: ' Paris'   | actual: 'ฤ Paris'   โœ…
# GPT-2 knows Paris is the capital of France! ๐ŸŽ‰
๐Ÿ’ป

3. Di Mana Jalankan GPT? โ€” Setup, VRAM, dan Batasan

3. Where to Run GPT? โ€” Setup, VRAM, and Limitations

GPT-2 small muat di Colab gratis. GPT-2 medium/large perlu trik. LLaMA butuh cloud.
GPT-2 small fits on free Colab. GPT-2 medium/large needs tricks. LLaMA needs cloud.
ModelParamsVRAM InferenceVRAM Fine-Tune FP16Colab T4 (16GB)?
GPT-2 small117M~1 GB~5 GBโœ… Sangat nyaman
GPT-2 medium345M~2 GB~13 GBโš ๏ธ Batch kecil + grad accum
GPT-2 large774M~4 GB>16 GBโŒ Butuh gradient checkpointing
Bloom-560M560M~2 GB~14 GBโš ๏ธ Ketat
LLaMA 3.2 1B1B~4 GB~16 GB (LoRA)โš ๏ธ LoRA only (Page 8)
Mistral 7B7B~15 GB~40 GBโŒ Butuh A100
ModelParamsVRAM InferenceVRAM Fine-Tune FP16Colab T4 (16GB)?
GPT-2 small117M~1 GB~5 GBโœ… Very comfortable
GPT-2 medium345M~2 GB~13 GBโš ๏ธ Small batch + grad accum
GPT-2 large774M~4 GB>16 GBโŒ Needs gradient checkpointing
Bloom-560M560M~2 GB~14 GBโš ๏ธ Tight
LLaMA 3.2 1B1B~4 GB~16 GB (LoRA)โš ๏ธ LoRA only (Page 8)
Mistral 7B7B~15 GB~40 GBโŒ Needs A100
Colab Setup untuk GPT-2python
# Cell 1: Verify GPU
!nvidia-smi
import torch
print(f"GPU: {torch.cuda.get_device_name(0)}, VRAM: {torch.cuda.get_device_properties(0).total_mem/1e9:.1f} GB")

# Cell 2: Install
!pip install -q transformers datasets accelerate

# Cell 3: Test GPT-2 inference (< 1 menit download)
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2", device=0)
print(generator("Artificial intelligence will", max_new_tokens=30)[0]["generated_text"])
# โœ… Ready! GPT-2 small = ~500MB, fits easily on T4

# PENTING: GPT-2 tokenizer TIDAK punya pad token!
# Harus set manual sebelum fine-tuning:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token  # โ† WAJIB untuk GPT-2!
# Tanpa ini: error "Cannot handle padding" saat fine-tuning

๐Ÿ’ก Semua kode di Page 3 ini menggunakan GPT-2 small (117M) yang berjalan nyaman di Google Colab gratis. Inference: <1 detik per generation. Fine-tuning: ~10-20 menit. Mau model lebih pintar? Ganti ke "gpt2-medium" (sama syntax, butuh batch lebih kecil) atau tunggu Page 8 (LoRA untuk LLaMA/Mistral).

๐Ÿ’ก All code in this Page 3 uses GPT-2 small (117M) which runs comfortably on free Google Colab. Inference: <1 second per generation. Fine-tuning: ~10-20 minutes. Want a smarter model? Switch to "gpt2-medium" (same syntax, needs smaller batch) or wait for Page 8 (LoRA for LLaMA/Mistral).

๐Ÿš€

4. Text Generation dengan Pipeline โ€” 1 Baris Magic

4. Text Generation with Pipeline โ€” 1-Line Magic

Berikan prompt, GPT melanjutkan โ€” semudah itu
Give a prompt, GPT continues it โ€” that simple
19_text_generation_pipeline.py โ€” Pipeline Generation ๐Ÿ”ฅpython
from transformers import pipeline

# ===========================
# 1. Basic generation
# ===========================
generator = pipeline("text-generation", model="gpt2", device=0)

result = generator("The future of artificial intelligence is", max_new_tokens=50)
print(result[0]["generated_text"])
# "The future of artificial intelligence is not just about the technology,
#  but about how we use it. The question is whether we can build systems..."

# ===========================
# 2. Multiple completions
# ===========================
results = generator(
    "Once upon a time in Jakarta,",
    max_new_tokens=80,
    num_return_sequences=3,     # generate 3 different completions!
    do_sample=True,              # enable random sampling
    temperature=0.8,             # creativity level
)
for i, r in enumerate(results):
    print(f"\\n--- Completion {i+1} ---")
    print(r["generated_text"])
# Setiap completion berbeda! (karena sampling random)

# ===========================
# 3. Different generation strategies
# ===========================
# Deterministic (greedy โ€” selalu sama)
result_greedy = generator("AI is", max_new_tokens=20, do_sample=False)

# Creative (high temperature sampling)
result_creative = generator("AI is", max_new_tokens=20,
    do_sample=True, temperature=1.2, top_p=0.9)

# Focused (low temperature)
result_focused = generator("AI is", max_new_tokens=20,
    do_sample=True, temperature=0.3)

print(f"Greedy:   {result_greedy[0]['generated_text']}")
print(f"Creative: {result_creative[0]['generated_text']}")
print(f"Focused:  {result_focused[0]['generated_text']}")
# Greedy:   "AI is a very important part of the future of the world."
# Creative: "AI is an existential rollercoaster of digital consciousness..."
# Focused:  "AI is a field of computer science that focuses on..."
๐Ÿ”ฌ

5. Manual generate() โ€” Full Control Token by Token

5. Manual generate() โ€” Full Control Token by Token

Pipeline membungkus generate(). Sekarang kita akses langsung untuk control penuh.
Pipeline wraps generate(). Now we access it directly for full control.
20_manual_generate.py โ€” model.generate() Deep Divepython
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")

# ===========================
# 1. Basic generate()
# ===========================
prompt = "Indonesia is a beautiful country with"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=50,       # generate 50 NEW tokens
        do_sample=True,           # enable sampling
        temperature=0.7,          # creativity
        top_k=50,                 # consider top 50 tokens
        top_p=0.9,                # nucleus sampling
        repetition_penalty=1.2,   # penalize repetition
        pad_token_id=tokenizer.eos_token_id,
    )

# Decode โ€” skip the prompt tokens
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(generated_text)

# ===========================
# 2. Stream generation (token by token) โ€” like ChatGPT!
# ===========================
from transformers import TextIteratorStreamer
from threading import Thread

streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=True)

inputs = tokenizer("The meaning of life is", return_tensors="pt").to("cuda")
gen_kwargs = {**inputs, "max_new_tokens": 100, "streamer": streamer,
              "do_sample": True, "temperature": 0.7}

# Run in separate thread (non-blocking)
thread = Thread(target=model.generate, kwargs=gen_kwargs)
thread.start()

# Print tokens as they arrive!
for text in streamer:
    print(text, end="", flush=True)
# "The meaning of life is" โ† appears token by token, like ChatGPT!

# ===========================
# 3. Access generation logits (for custom post-processing)
# ===========================
outputs = model.generate(
    **inputs,
    max_new_tokens=5,
    output_scores=True,          # return logits for each step!
    return_dict_in_generate=True,
)

# outputs.scores = tuple of (vocab_size,) tensors, one per generated token
for i, scores in enumerate(outputs.scores):
    probs = torch.softmax(scores[0], dim=-1)
    top5 = torch.topk(probs, 5)
    print(f"\\nStep {i+1} top-5 candidates:")
    for prob, idx in zip(top5.values, top5.indices):
        token = tokenizer.decode(idx)
        print(f"  '{token}': {prob:.1%}")
๐ŸŽ›๏ธ

6. SETIAP Parameter Generation โ€” Dijelaskan Detail

6. EVERY Generation Parameter โ€” Explained in Detail

temperature, top_k, top_p, repetition_penalty, beam search โ€” apa efeknya, kapan pakai
temperature, top_k, top_p, repetition_penalty, beam search โ€” effects and when to use
21_generation_params.py โ€” Parameter Encyclopedia ๐Ÿ“–python
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# SETIAP PARAMETER GENERATION โ€” EXPLAINED
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

output = model.generate(
    **inputs,

    # โ”€โ”€ LENGTH CONTROL โ”€โ”€
    max_new_tokens=100,     # generate MAX 100 new tokens
    min_new_tokens=10,      # generate MIN 10 tokens (prevent empty output)
    # max_length=150,        # alternative: total length (prompt + generated)

    # โ”€โ”€ SAMPLING vs GREEDY โ”€โ”€
    do_sample=True,         # True=random sampling, False=greedy (deterministic)
    # Greedy: always pick highest probability token โ†’ boring, repetitive
    # Sampling: randomly pick from probability distribution โ†’ creative, varied

    # โ”€โ”€ TEMPERATURE โ”€โ”€ (hanya berlaku jika do_sample=True)
    temperature=0.7,        # Controls randomness of sampling
    # temperature=0.1 โ†’ almost greedy (very focused, repetitive)
    # temperature=0.7 โ†’ balanced (creative but coherent) โ† RECOMMENDED
    # temperature=1.0 โ†’ standard (model's natural distribution)
    # temperature=1.5 โ†’ very random (wild, often incoherent)
    # temperature=2.0 โ†’ chaos (mostly nonsense)
    #
    # HOW IT WORKS:
    # logits_adjusted = logits / temperature
    # probs = softmax(logits_adjusted)
    # Low temp โ†’ sharper distribution โ†’ top token dominates
    # High temp โ†’ flatter distribution โ†’ more variety

    # โ”€โ”€ TOP-K SAMPLING โ”€โ”€ (hanya jika do_sample=True)
    top_k=50,               # Only consider top K highest-probability tokens
    # top_k=1   โ†’ greedy (only top 1 token)
    # top_k=10  โ†’ conservative (limited vocabulary)
    # top_k=50  โ†’ balanced โ† DEFAULT
    # top_k=0   โ†’ disabled (consider ALL tokens)
    #
    # PROBLEM: top_k=50 treats all distributions equally.
    # If model is very confident: top 5 tokens have 95% probability
    # โ†’ tokens 6-50 are almost random noise!
    # SOLUTION: use top_p instead (or together)

    # โ”€โ”€ TOP-P (NUCLEUS) SAMPLING โ”€โ”€ (hanya jika do_sample=True)
    top_p=0.9,              # Keep tokens until cumulative probability reaches P
    # top_p=0.9 โ†’ keep tokens that sum to 90% probability
    # If model confident: might keep only 3 tokens (they sum to 90%)
    # If model uncertain: might keep 50 tokens (all needed for 90%)
    # ADAPTS to model's confidence! Better than fixed top_k.
    #
    # top_p=1.0 โ†’ disabled (keep all tokens)
    # top_p=0.95 โ†’ slightly conservative
    # top_p=0.9  โ†’ balanced โ† RECOMMENDED
    # top_p=0.5  โ†’ very focused

    # โ”€โ”€ REPETITION CONTROL โ”€โ”€
    repetition_penalty=1.2, # Penalize tokens that already appeared
    # 1.0 = no penalty (can repeat freely)
    # 1.1 = mild (some repetition OK)
    # 1.2 = moderate โ† RECOMMENDED for most use cases
    # 1.5 = strong (almost never repeats)
    # 2.0+ = too strong (forced to use rare words)

    no_repeat_ngram_size=3, # Never repeat the same 3-word phrase
    # 0 = disabled, 2 = no repeated bigrams, 3 = no repeated trigrams

    # โ”€โ”€ BEAM SEARCH โ”€โ”€ (alternative to sampling)
    # num_beams=5,           # explore 5 paths simultaneously
    # early_stopping=True,   # stop when all beams finish
    # length_penalty=1.0,    # >1 = prefer longer, <1 = prefer shorter
    # Beam search: more coherent but LESS creative than sampling
    # Good for: translation, summarization
    # Bad for: creative writing, chat (too boring)
    # NOTE: beam search is INCOMPATIBLE with do_sample=True!

    # โ”€โ”€ STOP CONDITIONS โ”€โ”€
    pad_token_id=tokenizer.eos_token_id,
    eos_token_id=tokenizer.eos_token_id,
    # stop_strings=["Human:", "\n\n"],  # stop at these strings
)
ParameterDefaultRecommendedEfek
temperature1.00.7โ†“ = fokus, โ†‘ = kreatif
top_k5050Jumlah token kandidat tetap
top_p1.00.9Kumulatif probability threshold (adaptif)
repetition_penalty1.01.1-1.3Cegah pengulangan kata
no_repeat_ngram03Cegah frasa berulang
num_beams11 (sampling) / 5 (translation)Pencarian lebih luas tapi deterministic
max_new_tokens2050-500Panjang output maksimum
ParameterDefaultRecommendedEffect
temperature1.00.7โ†“ = focused, โ†‘ = creative
top_k5050Fixed number of candidate tokens
top_p1.00.9Cumulative probability threshold (adaptive)
repetition_penalty1.01.1-1.3Prevent word repetition
no_repeat_ngram03Prevent phrase repetition
num_beams11 (sampling) / 5 (translation)Wider search but deterministic
max_new_tokens2050-500Maximum output length

๐ŸŽ“ Resep Cepat untuk Berbagai Use Case:
Chatbot: temperature=0.7, top_p=0.9, repetition_penalty=1.2
Creative writing: temperature=0.9, top_p=0.95, top_k=100
Code generation: temperature=0.2, top_p=0.9 (harus akurat!)
Factual text: do_sample=False (greedy, deterministic)
Translation: num_beams=5, do_sample=False, length_penalty=1.0

๐ŸŽ“ Quick Recipes for Various Use Cases:
Chatbot: temperature=0.7, top_p=0.9, repetition_penalty=1.2
Creative writing: temperature=0.9, top_p=0.95, top_k=100
Code generation: temperature=0.2, top_p=0.9 (needs accuracy!)
Factual text: do_sample=False (greedy, deterministic)
Translation: num_beams=5, do_sample=False, length_penalty=1.0

๐Ÿ“Š

7. Sampling Strategies Visual โ€” Temperature & Top-P Divisualisasikan

7. Sampling Strategies Visual โ€” Temperature & Top-P Visualized

Melihat secara visual bagaimana parameter mengubah distribusi probabilitas
Visually seeing how parameters change the probability distribution
Temperature Effect โ€” Distribusi Probabilitas untuk Next Token Context: "The cat sat on the" Model's raw logits โ†’ softmax โ†’ probabilities: Temperature = 0.1 (hampir greedy): "mat" โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 92% "floor" โ–ˆโ–ˆโ–ˆ 5% "bed" โ–ˆ 2% "table" โ– 1% lainnya โ– 0% โ†’ Hampir selalu "mat" โ€” boring tapi safe Temperature = 0.7 (balanced, RECOMMENDED): "mat" โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 52% "floor" โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 22% "bed" โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 12% "table" โ–ˆโ–ˆโ–ˆ 8% "roof" โ–ˆ 3% lainnya โ–ˆ 3% โ†’ Biasanya "mat" tapi kadang "floor"/"bed" โ€” natural! Temperature = 1.5 (sangat kreatif): "mat" โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 18% "floor" โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 15% "bed" โ–ˆโ–ˆโ–ˆโ–ˆ 12% "table" โ–ˆโ–ˆโ–ˆโ–ˆ 11% "roof" โ–ˆโ–ˆโ–ˆ 10% "moon" โ–ˆโ–ˆ 8% "pizza" โ–ˆโ–ˆ 7% lainnya โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 19% โ†’ Bisa "moon" atau "pizza" โ€” kreatif tapi sering aneh!
Top-P (Nucleus) vs Top-K โ€” Adaptive vs Fixed Skenario 1: Model YAKIN (confident) "The capital of France is" โ†’ next token: "Paris" โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 90% "the" โ–ˆโ–ˆ 5% "a" โ–ˆ 3% "located" โ– 1% ... Top-K=50: keeps 50 tokens (45 tokens almost 0% โ€” WASTED!) Top-P=0.9: keeps HANYA 1 token ("Paris"=90% โ‰ฅ 0.9) โ†’ efficient! Skenario 2: Model TIDAK YAKIN (uncertain) "I went to the" โ†’ next token: "store" โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 18% "park" โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 15% "hospital" โ–ˆโ–ˆโ–ˆโ–ˆ 12% "school" โ–ˆโ–ˆโ–ˆโ–ˆ 11% "gym" โ–ˆโ–ˆโ–ˆ 10% "library" โ–ˆโ–ˆโ–ˆ 9% "office" โ–ˆโ–ˆ 8% ... Top-K=50: keeps 50 tokens (OK here, but K=50 is arbitrary) Top-P=0.9: keeps ~8 tokens (sum to 90%) โ†’ adapts to uncertainty! Takeaway: Top-P ADAPTS to model confidence. Top-K does not. Best practice: use BOTH โ€” top_p=0.9, top_k=50
๐Ÿ”ฅ

8. Fine-Tuning GPT-2 pada Custom Corpus โ€” Buat GPT Anda Sendiri

8. Fine-Tuning GPT-2 on Custom Corpus โ€” Build Your Own GPT

Ajarkan GPT-2 gaya penulisan Anda: puisi, kode, dialog, bahasa tertentu
Teach GPT-2 your writing style: poetry, code, dialogue, specific language

Fine-tuning GPT-2 = memberikan contoh teks, lalu GPT belajar memprediksi kata berikutnya di domain Anda. Setelah fine-tuning, GPT bisa generate teks yang mirip dengan training data Anda. Contoh: fine-tune pada puisi โ†’ GPT jadi "penyair". Fine-tune pada kode Python โ†’ GPT jadi "programmer".

Fine-tuning GPT-2 = giving example text, then GPT learns to predict next words in your domain. After fine-tuning, GPT can generate text similar to your training data. Example: fine-tune on poetry โ†’ GPT becomes a "poet". Fine-tune on Python code โ†’ GPT becomes a "programmer".

22_finetune_gpt2.py โ€” Fine-Tune GPT-2 Complete ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅpython
from transformers import (
    AutoTokenizer, AutoModelForCausalLM,
    TrainingArguments, Trainer, DataCollatorForLanguageModeling
)
from datasets import load_dataset

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# STEP 1: LOAD MODEL & TOKENIZER
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
model_name = "gpt2"  # 117M params, fits Colab T4
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# CRITICAL: GPT-2 has NO pad token! Must set it!
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = tokenizer.eos_token_id

print(f"Model params: {model.num_parameters():,}")  # 124,439,808

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# STEP 2: LOAD & PREPARE DATASET
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# Option A: From Hugging Face Hub
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")
# Option B: From your own text file
# dataset = load_dataset("text", data_files="my_corpus.txt")
# Option C: From CSV with "text" column
# dataset = load_dataset("csv", data_files="poems.csv")

print(dataset)
print(f"Sample: {dataset['train'][0]['text'][:100]}...")

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# STEP 3: TOKENIZE
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        max_length=512,          # GPT-2 max context = 1024
        return_overflowing_tokens=True,  # split long texts into chunks!
        return_length=True,
    )

tokenized = dataset.map(tokenize_function, batched=True,
                         remove_columns=dataset["train"].column_names)

# Filter out very short sequences
tokenized = tokenized.filter(lambda x: len(x["input_ids"]) > 10)
print(f"Training examples: {len(tokenized['train'])}")

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# STEP 4: DATA COLLATOR (special for CLM!)
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,                  # False = Causal LM (GPT)
    # mlm=True โ†’ Masked LM (BERT) โ€” NOT for GPT!
)
# DataCollatorForLanguageModeling automatically:
# 1. Pads sequences in each batch
# 2. Creates labels = input_ids (shifted by 1 internally)
# 3. Sets label=-100 for padding tokens (ignored in loss)

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# STEP 5: TRAINING
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
args = TrainingArguments(
    output_dir="./gpt2-finetuned",
    num_train_epochs=3,
    per_device_train_batch_size=4,       # smaller batch for generation models
    gradient_accumulation_steps=8,       # effective batch = 4 ร— 8 = 32
    learning_rate=5e-5,                  # slightly higher than BERT (5e-5 vs 2e-5)
    weight_decay=0.01,
    warmup_ratio=0.1,
    fp16=True,
    logging_steps=50,
    save_strategy="epoch",
    save_total_limit=2,
    prediction_loss_only=True,           # don't compute metrics (CLM only needs loss)
    report_to="none",
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized["train"],
    eval_dataset=tokenized["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)

print("๐Ÿ‹๏ธ Training GPT-2...")
trainer.train()

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# STEP 6: EVALUATE (Perplexity)
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
import math
eval_results = trainer.evaluate()
perplexity = math.exp(eval_results["eval_loss"])
print(f"Perplexity: {perplexity:.2f}")
# Lower = better. GPT-2 base on WikiText: ~30. Fine-tuned: ~20-25.
# Human text: ~20-50 depending on domain.

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# STEP 7: SAVE & GENERATE!
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
trainer.save_model("./gpt2-finetuned-final")
tokenizer.save_pretrained("./gpt2-finetuned-final")

# Test generation!
from transformers import pipeline
gen = pipeline("text-generation", model="./gpt2-finetuned-final", device=0)

prompts = ["In the field of machine learning,", "The history of Indonesia"]
for p in prompts:
    result = gen(p, max_new_tokens=80, do_sample=True, temperature=0.7,
                 top_p=0.9, repetition_penalty=1.2)
    print(f"\\nPrompt: {p}")
    print(f"Output: {result[0]['generated_text']}")

print("\\n๐Ÿ† GPT-2 fine-tuning complete!")

๐ŸŽ“ Perbedaan Kunci dari Fine-Tuning BERT (Page 2):
1. Auto Class: AutoModelForCausalLM bukan AutoModelForSequenceClassification
2. Data Collator: DataCollatorForLanguageModeling(mlm=False) bukan DataCollatorWithPadding
3. Labels: Otomatis (labels = shifted input_ids). Tidak perlu kolom "label" di dataset.
4. Pad token: tokenizer.pad_token = tokenizer.eos_token โ€” GPT-2 tidak punya pad token default!
5. Batch size: Lebih kecil (4-8 vs 16-32) karena sequence panjang = lebih banyak VRAM.
6. LR: Sedikit lebih tinggi (5e-5 vs 2e-5) โ€” GPT fine-tuning umumnya butuh LR lebih besar.
7. Metric: Perplexity (bukan accuracy/F1) โ€” karena tidak ada "label benar/salah" di text generation.

๐ŸŽ“ Key Differences from BERT Fine-Tuning (Page 2):
1. Auto Class: AutoModelForCausalLM not AutoModelForSequenceClassification
2. Data Collator: DataCollatorForLanguageModeling(mlm=False) not DataCollatorWithPadding
3. Labels: Automatic (labels = shifted input_ids). No "label" column needed in dataset.
4. Pad token: tokenizer.pad_token = tokenizer.eos_token โ€” GPT-2 has no default pad token!
5. Batch size: Smaller (4-8 vs 16-32) because long sequences = more VRAM.
6. LR: Slightly higher (5e-5 vs 2e-5) โ€” GPT fine-tuning generally needs larger LR.
7. Metric: Perplexity (not accuracy/F1) โ€” because there's no "right/wrong label" in text generation.

๐Ÿ“‹

9. Instruction Tuning โ€” GPT โ†’ Assistant yang Patuh

9. Instruction Tuning โ€” GPT โ†’ Obedient Assistant

Dari text completion biasa menjadi model yang mengikuti instruksi โ€” fondasi ChatGPT
From plain text completion to an instruction-following model โ€” the foundation of ChatGPT

GPT-2 biasa hanya melanjutkan teks โ€” ia tidak "menjawab pertanyaan" atau "mengikuti instruksi". Instruction tuning mengajarkan GPT untuk memahami format instruksi dan memberikan respons yang sesuai. Ini adalah teknik yang membuat GPT-3 menjadi ChatGPT.

Plain GPT-2 only continues text โ€” it doesn't "answer questions" or "follow instructions". Instruction tuning teaches GPT to understand instruction formats and give appropriate responses. This is the technique that turned GPT-3 into ChatGPT.

23_instruction_tuning.py โ€” Format Data untuk Instruction Tuningpython
# ===========================
# 1. Format data instruction tuning
# ===========================
# Setiap training example = instruksi + respons dalam satu string

# Format Alpaca-style (paling populer):
training_examples = [
    """### Instruction:
Summarize the following text in one sentence.

### Input:
Hugging Face is a company that provides tools and platforms for machine learning. They are best known for their Transformers library, which provides thousands of pre-trained models for natural language processing, computer vision, and audio tasks.

### Response:
Hugging Face is an ML company known for their Transformers library offering thousands of pre-trained models for NLP, vision, and audio.""",

    """### Instruction:
Translate the following English text to Indonesian.

### Input:
I love learning about artificial intelligence.

### Response:
Saya suka belajar tentang kecerdasan buatan.""",

    """### Instruction:
What is the capital of Japan?

### Response:
The capital of Japan is Tokyo.""",
]

# Format ChatML (used by many chat models):
chat_examples = [
    """<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is Python?<|im_end|>
<|im_start|>assistant
Python is a high-level programming language known for its readability and versatility.<|im_end|>""",
]

# ===========================
# 2. Prepare dataset
# ===========================
from datasets import Dataset

# Dari list of strings:
dataset = Dataset.from_dict({"text": training_examples})

# Atau dari JSONL file:
# {"instruction": "...", "input": "...", "output": "..."}
# dataset = load_dataset("json", data_files="instructions.jsonl")

# Format setiap row menjadi satu string
def format_instruction(example):
    if example.get("input"):
        text = f"""### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}\n\n### Response:\n{example['output']}{tokenizer.eos_token}"""
    else:
        text = f"""### Instruction:\n{example['instruction']}\n\n### Response:\n{example['output']}{tokenizer.eos_token}"""
    return {"text": text}

# dataset = dataset.map(format_instruction)
# Then tokenize & train exactly like Section 8!

# ===========================
# 3. Inference with instruction format
# ===========================
def ask(instruction, input_text=""):
    if input_text:
        prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:\n"
    else:
        prompt = f"### Instruction:\n{instruction}\n\n### Response:\n"

    result = gen(prompt, max_new_tokens=200, do_sample=True,
                 temperature=0.7, top_p=0.9,
                 repetition_penalty=1.2)
    response = result[0]["generated_text"][len(prompt):]
    # Stop at next "###" (prevent generating another instruction)
    if "###" in response:
        response = response[:response.index("###")]
    return response.strip()

print(ask("What is the largest planet in our solar system?"))
# "Jupiter is the largest planet in our solar system."
๐Ÿ’ฌ

10. Proyek: Chatbot CLI Sederhana โ€” GPT-2 Interaktif

10. Project: Simple CLI Chatbot โ€” Interactive GPT-2

Terminal chatbot yang bisa Anda ajak bicara โ€” menggunakan GPT-2 fine-tuned
Terminal chatbot you can talk to โ€” using fine-tuned GPT-2
24_chatbot_cli.py โ€” Interactive CLI Chatbot ๐Ÿ”ฅpython
from transformers import pipeline

# Load fine-tuned model (atau GPT-2 biasa untuk demo)
gen = pipeline("text-generation",
    model="./gpt2-finetuned-final",  # atau "gpt2" untuk demo
    device=0)

def chat(user_input, history=""):
    """Generate chatbot response."""
    prompt = history + f"### Human: {user_input}\n### Assistant:"

    result = gen(
        prompt,
        max_new_tokens=150,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.3,
        pad_token_id=gen.tokenizer.eos_token_id,
    )

    full_text = result[0]["generated_text"]
    response = full_text[len(prompt):].strip()

    # Stop at next "### Human:" or newline
    for stop in ["### Human:", "###", "\n\n"]:
        if stop in response:
            response = response[:response.index(stop)]

    # Update history for context
    new_history = prompt + " " + response + "\n"
    return response.strip(), new_history

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# Interactive loop
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
print("๐Ÿค– GPT-2 Chatbot (type 'quit' to exit)")
print("=" * 50)

history = ""
while True:
    user_input = input("\\n๐Ÿ‘ค You: ")
    if user_input.lower() in ["quit", "exit", "q"]:
        print("๐Ÿ‘‹ Bye!")
        break

    response, history = chat(user_input, history)
    print(f"๐Ÿค– Bot: {response}")

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# Sample conversation:
# ๐Ÿ‘ค You: What is machine learning?
# ๐Ÿค– Bot: Machine learning is a subset of AI that enables systems
#         to learn from data without being explicitly programmed.
# ๐Ÿ‘ค You: Give me an example.
# ๐Ÿค– Bot: A spam filter that learns to identify spam emails by
#         analyzing thousands of examples is a common ML application.
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

๐Ÿ’ก Catatan Realistis: GPT-2 (117M) adalah model kecil โ€” jawabannya sering kurang akurat dan koheren dibandingkan ChatGPT (175B+). Ini untuk belajar konsep. Untuk chatbot production, gunakan LLaMA/Mistral 7B+ dengan LoRA fine-tuning (Page 8) atau API model besar (GPT-4, Claude).

๐Ÿ’ก Realistic Note: GPT-2 (117M) is a small model โ€” its answers are often less accurate and coherent compared to ChatGPT (175B+). This is for learning concepts. For production chatbots, use LLaMA/Mistral 7B+ with LoRA fine-tuning (Page 8) or large model APIs (GPT-4, Claude).

๐ŸŒ

11. Model Generatif Lain โ€” Bloom, LLaMA, Mistral, Gemma

11. Other Generative Models โ€” Bloom, LLaMA, Mistral, Gemma

GPT-2 = belajar. Untuk production, ada model yang jauh lebih powerful.
GPT-2 = learning. For production, there are far more powerful models.
ModelParamsBahasaLicenseBest For
GPT-2117M-1.5BEnglishMIT (free)Belajar, eksperimen โญ
Bloom560M-176B46 bahasaOpen RAIL-MMultilingual generation
LLaMA 3.21B-90BMulti + IDMeta LicenseState-of-the-art open โญ
Mistral7B-8x22BMultiApache 2.0Best ratio size/quality โญ
Gemma 22B-27BMultiGemma LicenseGoogle's open model
Qwen 2.50.5B-72BMulti + IDApache 2.0Strong multilingual + code
Phi-33.8B-14BEnglishMITSmall but powerful
ModelParamsLanguagesLicenseBest For
GPT-2117M-1.5BEnglishMIT (free)Learning, experiments โญ
Bloom560M-176B46 languagesOpen RAIL-MMultilingual generation
LLaMA 3.21B-90BMulti + IDMeta LicenseState-of-the-art open โญ
Mistral7B-8x22BMultiApache 2.0Best size/quality ratio โญ
Gemma 22B-27BMultiGemma LicenseGoogle's open model
Qwen 2.50.5B-72BMulti + IDApache 2.0Strong multilingual + code
Phi-33.8B-14BEnglishMITSmall but powerful

๐ŸŽ“ Roadmap Model Generatif di Seri Ini:
Page 3 (ini): GPT-2 (117M) โ€” belajar konsep CLM, generation params, fine-tuning
Page 8: LoRA & QLoRA โ€” fine-tune LLaMA/Mistral 7B di Colab!
Page 9: RLHF โ€” align model dengan preferensi manusia (ChatGPT method)
Anda sedang membangun fondasi untuk fine-tuning model besar di page-page berikutnya.

๐ŸŽ“ Generative Model Roadmap in This Series:
Page 3 (this): GPT-2 (117M) โ€” learn CLM concepts, generation params, fine-tuning
Page 8: LoRA & QLoRA โ€” fine-tune LLaMA/Mistral 7B on Colab!
Page 9: RLHF โ€” align models with human preferences (ChatGPT method)
You're building the foundation for fine-tuning large models in upcoming pages.

๐Ÿ“

12. Ringkasan Page 3

12. Page 3 Summary

Semua yang sudah kita pelajari
Everything we learned
KonsepApa ItuKode Kunci
Encoder vs DecoderBERT (bidirectional) vs GPT (causal)AutoModelForCausalLM
Causal LMPrediksi kata berikutnyalabels=input_ids (auto-shifted)
Pipeline Generation1-line text generationpipeline("text-generation")
model.generate()Full control generationmodel.generate(**inputs, ...)
TemperatureKontrol kreativitas (0.1-2.0)temperature=0.7
Top-P (Nucleus)Adaptive probability cutofftop_p=0.9
Top-KFixed candidate counttop_k=50
Repetition PenaltyCegah pengulanganrepetition_penalty=1.2
Fine-Tune GPT-2Custom corpus โ†’ custom GPTDataCollatorForLanguageModeling(mlm=False)
Instruction TuningGPT โ†’ instruction follower"### Instruction:\n...\n### Response:\n"
StreamingToken-by-token outputTextIteratorStreamer
PerplexityGeneration quality metricexp(eval_loss)
ConceptWhat It IsKey Code
Encoder vs DecoderBERT (bidirectional) vs GPT (causal)AutoModelForCausalLM
Causal LMNext token predictionlabels=input_ids (auto-shifted)
Pipeline Generation1-line text generationpipeline("text-generation")
model.generate()Full control generationmodel.generate(**inputs, ...)
TemperatureCreativity control (0.1-2.0)temperature=0.7
Top-P (Nucleus)Adaptive probability cutofftop_p=0.9
Top-KFixed candidate counttop_k=50
Repetition PenaltyPrevent repetitionrepetition_penalty=1.2
Fine-Tune GPT-2Custom corpus โ†’ custom GPTDataCollatorForLanguageModeling(mlm=False)
Instruction TuningGPT โ†’ instruction follower"### Instruction:\n...\n### Response:\n"
StreamingToken-by-token outputTextIteratorStreamer
PerplexityGeneration quality metricexp(eval_loss)
โ† Page Sebelumnyaโ† Previous Page

Page 2 โ€” Fine-Tuning BERT & Trainer API

๐Ÿ“˜

Coming Next: Page 4 โ€” Token Classification & NER

Dari klasifikasi kalimat (Page 2) ke klasifikasi per-token! Page 4 membahas: Named Entity Recognition (NER) โ€” identifikasi orang, tempat, organisasi, POS Tagging, BIO/IOB2 labeling scheme, tokenization alignment (subword โ†’ word labels), fine-tuning BERT untuk NER pada custom dataset, evaluasi per-entity (seqeval), dan building a NER pipeline production.

๐Ÿ“˜

Coming Next: Page 4 โ€” Token Classification & NER

From sentence classification (Page 2) to per-token classification! Page 4 covers: Named Entity Recognition (NER) โ€” identifying people, places, organizations, POS Tagging, BIO/IOB2 labeling scheme, tokenization alignment (subword โ†’ word labels), fine-tuning BERT for NER on custom datasets, per-entity evaluation (seqeval), and building a production NER pipeline.