Belajar Hugging Face Page 10 — Capstone: End-to-End LLM Project

📑 Daftar Isi — Page 10 (Final!)

📑 Table of Contents — Page 10 (Final!)

Perjalanan Kita — 10 Pages dalam satu pandangan
Capstone: RAG Chatbot Production — Full pipeline code
Step 1: QLoRA SFT — Instruction tuning (Page 3+8)
Step 2: DPO Alignment — Human preferences (Page 9)
Step 3: RAG Knowledge Base — Embeddings + FAISS (Page 6)
Step 4: Gradio Chatbot App — Interactive UI (Page 7)
Step 5: Deploy ke HF Spaces — URL publik gratis
Roadmap: What's Next? — Agents, multi-modal, MLOps
Career Paths di AI/ML
Penutup — Selamat! 🎉🏆

Our Journey — 10 Pages at a glance
Capstone: Production RAG Chatbot — Full pipeline code
Step 1: QLoRA SFT — Instruction tuning (Pages 3+8)
Step 2: DPO Alignment — Human preferences (Page 9)
Step 3: RAG Knowledge Base — Embeddings + FAISS (Page 6)
Step 4: Gradio Chatbot App — Interactive UI (Page 7)
Step 5: Deploy to HF Spaces — Free public URL
Roadmap: What's Next? — Agents, multi-modal, MLOps
Career Paths in AI/ML
Closing — Congratulations! 🎉🏆

🗺️

1. Perjalanan Kita — 10 Pages dalam Satu Pandangan

1. Our Journey — 10 Pages at a Glance

🏆 Your Hugging Face Journey — All 10 Pages ┌──────────────────────────────────────────────────────────────────┐ │ │ │ FOUNDATION (Pages 1-4) │ │ Page 1 ▸ Pipeline, Hub, AutoModel, Tokenizer, Cara Pakai HF │ │ Page 2 ▸ Fine-Tune BERT, Trainer, Datasets, VRAM/OOM │ │ Page 3 ▸ Fine-Tune GPT, Text Generation, Sampling, Chatbot │ │ Page 4 ▸ NER, BIO Tagging, Subword Alignment, seqeval │ │ │ │ ADVANCED (Pages 5-7) │ │ Page 5 ▸ QA (SQuAD), T5/BART Seq2Seq, Translation, BLEU/ROUGE │ │ Page 6 ▸ Embeddings, Cosine Sim, FAISS, Semantic Search, RAG │ │ Page 7 ▸ Gradio, Spaces, ChatInterface, Demo Apps, Deploy │ │ │ │ LLM MASTERY (Pages 8-10) │ │ Page 8 ▸ LoRA, QLoRA, 4-bit, PEFT, Fine-Tune 7B on Colab │ │ Page 9 ▸ RLHF, DPO, Alignment, TRL, Safety │ │ Page 10 ▸ Capstone: RAG Chatbot + Roadmap (ANDA DI SINI!) │ │ │ │ Skills Acquired: │ │ ✅ Pipeline API untuk 20+ NLP tasks (inference in 1 line) │ │ ✅ Fine-tune BERT untuk classification & NER (93%+ acc) │ │ ✅ Fine-tune GPT untuk text generation & chatbot │ │ ✅ T5/BART untuk translation & summarization │ │ ✅ Sentence embeddings, FAISS, semantic search, RAG │ │ ✅ Gradio apps deployed to HF Spaces (free!) │ │ ✅ QLoRA: fine-tune 7B LLM di Colab gratis (previously $256/hr)│ │ ✅ DPO alignment: ChatGPT-style training │ │ ✅ 70+ code files, 10 complete projects │ │ │ └──────────────────────────────────────────────────────────────────┘ 🏆 GRAND FINALE — Capstone Project!

🔥

2-6. Capstone: Production RAG Chatbot — Full Pipeline

Gabungkan Page 3+6+7+8+9 dalam satu script production-grade

Combine Pages 3+6+7+8+9 in one production-grade script

70_capstone_rag_chatbot.py — End-to-End RAG Chatbot 🏆🔥🔥python

#!/usr/bin/env python3
"""
🏆 CAPSTONE: End-to-End RAG Chatbot
Combines ALL techniques from Pages 1-9:
  Page 1: Pipeline, Hub, AutoModel
  Page 3: Text generation, prompt format
  Page 6: Sentence embeddings, FAISS, RAG
  Page 7: Gradio ChatInterface, Spaces
  Page 8: QLoRA, 4-bit quantization, PEFT
  Page 9: DPO alignment concepts

This is a DEPLOYABLE app for HF Spaces!
"""

import gradio as gr
import faiss
import numpy as np
import torch
from sentence_transformers import SentenceTransformer
from transformers import pipeline

# ═══════════════════════════════════════════════════
# COMPONENT 1: RETRIEVER (Page 6 — Embeddings + FAISS)
# ═══════════════════════════════════════════════════
print("📚 Loading retriever...")
retriever = SentenceTransformer("all-MiniLM-L6-v2")

# Knowledge base — replace with YOUR documents!
KNOWLEDGE_BASE = [
    "Hugging Face was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf.",
    "The Transformers library supports over 200 model architectures including BERT, GPT, T5, and LLaMA.",
    "Fine-tuning BERT on IMDB sentiment analysis typically achieves 93%+ accuracy.",
    "LoRA (Low-Rank Adaptation) allows fine-tuning large models by training only 0.1-1% of parameters.",
    "QLoRA combines 4-bit quantization with LoRA, enabling 7B model fine-tuning on a single T4 GPU.",
    "DPO (Direct Preference Optimization) is a simpler alternative to RLHF for model alignment.",
    "Gradio allows creating ML demo web apps with just Python, deployable to HF Spaces for free.",
    "FAISS is Facebook's library for efficient similarity search across millions of vectors in milliseconds.",
    "The Trainer API handles training loops, evaluation, logging, checkpointing, and multi-GPU automatically.",
    "Sentence Transformers encode text into dense vectors for semantic similarity and search tasks.",
    "Named Entity Recognition (NER) identifies people, locations, organizations in text using BIO tagging.",
    "T5 treats all NLP tasks as text-to-text: summarize, translate, classify — all with the same model.",
    "BERT uses bidirectional attention for understanding. GPT uses causal attention for generation.",
    "The Model Hub hosts over 500,000 pre-trained models for NLP, vision, audio, and multimodal tasks.",
    "Jakarta is the capital of Indonesia with a population of approximately 10.56 million people.",
    "Indonesia has over 17,000 islands and declared independence on August 17, 1945.",
    "Python is the most popular programming language for machine learning and data science.",
    "TensorFlow and PyTorch are the two most popular deep learning frameworks.",
]

# Build FAISS index
kb_embeddings = retriever.encode(KNOWLEDGE_BASE, convert_to_numpy=True)
faiss.normalize_L2(kb_embeddings)
index = faiss.IndexFlatIP(kb_embeddings.shape[1])
index.add(kb_embeddings)
print(f"  Indexed {index.ntotal} documents")

# ═══════════════════════════════════════════════════
# COMPONENT 2: GENERATOR (Page 3 — Text Generation)
# ═══════════════════════════════════════════════════
print("🤖 Loading generator...")
generator = pipeline("text2text-generation", model="google/flan-t5-base", device=0)
# FLAN-T5 base: 250M params, fits CPU/GPU easily
# For better quality: use fine-tuned model from Page 8!

# ═══════════════════════════════════════════════════
# COMPONENT 3: RAG PIPELINE (Page 6 — Retrieve + Generate)
# ═══════════════════════════════════════════════════
def retrieve(query, top_k=3):
    """Retrieve top-k relevant documents."""
    q_emb = retriever.encode([query], convert_to_numpy=True)
    faiss.normalize_L2(q_emb)
    scores, indices = index.search(q_emb, top_k)
    docs = [(KNOWLEDGE_BASE[i], float(s)) for i, s in zip(indices[0], scores[0]) if i >= 0]
    return docs

def generate_answer(question, context):
    """Generate answer using retrieved context."""
    prompt = f"""Answer the question based on the context below. If the answer is not in the context, say "I don't have information about that."

Context: {context}

Question: {question}

Answer:"""
    result = generator(prompt, max_length=200)
    return result[0]["generated_text"]

# ═══════════════════════════════════════════════════
# COMPONENT 4: GRADIO CHATBOT (Page 7 — ChatInterface)
# ═══════════════════════════════════════════════════
def chat(message, history):
    """RAG chatbot: retrieve → generate → respond with sources."""
    # Retrieve
    docs = retrieve(message, top_k=3)
    context = " ".join([doc for doc, score in docs])

    # Generate
    answer = generate_answer(message, context)

    # Format with sources
    sources = "\n\n📚 **Sources:**\n" + "\n".join(
        [f"- _{doc[:80]}..._ (relevance: {score:.0%})" for doc, score in docs])

    return answer + sources

# ═══════════════════════════════════════════════════
# COMPONENT 5: LAUNCH APP (Page 7 — Deploy to Spaces!)
# ═══════════════════════════════════════════════════
demo = gr.ChatInterface(
    fn=chat,
    title="🏆 RAG Chatbot — Hugging Face Knowledge Assistant",
    description="""Ask me anything about Hugging Face, Transformers, fine-tuning, NLP, or Indonesia!
Powered by: Sentence Transformers (retrieval) + FLAN-T5 (generation) + FAISS (vector search).
Built with techniques from the entire Learn Hugging Face series (Pages 1-9).""",
    examples=[
        "What is LoRA and how does it work?",
        "How accurate is BERT on IMDB sentiment analysis?",
        "What is the capital of Indonesia?",
        "What is the difference between BERT and GPT?",
        "How do I deploy a model to Hugging Face Spaces?",
        "What is DPO?",
    ],
    retry_btn="🔄 Retry",
    undo_btn="↩️ Undo",
    clear_btn="🗑️ Clear",
    theme=gr.themes.Soft(),
)

print("🚀 Launching RAG Chatbot...")
demo.launch()
# → Deploy to HF Spaces: upload app.py + requirements.txt
# → Free public URL: https://username-rag-chatbot.hf.space
# → ANYONE can chat with your knowledge-grounded AI! 🎉

🏆 Ini Adalah Proyek Production Anda!
Script di atas menggabungkan teknik dari 6 pages berbeda:
• Page 1: Pipeline API, model loading
• Page 3: Text generation, prompt formatting
• Page 6: Sentence embeddings, FAISS vector search, RAG
• Page 7: Gradio ChatInterface, HF Spaces deployment
• Page 8: Bisa upgrade ke QLoRA fine-tuned model
• Page 9: Bisa upgrade ke DPO-aligned model
Upload ke HF Spaces → chatbot AI Anda live di internet dalam 5 menit, gratis!

🏆 This Is Your Production Project!
The script above combines techniques from 6 different pages:
• Page 1: Pipeline API, model loading
• Page 3: Text generation, prompt formatting
• Page 6: Sentence embeddings, FAISS vector search, RAG
• Page 7: Gradio ChatInterface, HF Spaces deployment
• Page 8: Can upgrade to QLoRA fine-tuned model
• Page 9: Can upgrade to DPO-aligned model
Upload to HF Spaces → your AI chatbot live on the internet in 5 minutes, free!

🗺️

7. Roadmap: What's Next?

Level	Topik	Apa Itu	Tools
🟢	AI Agents	LLM yang bisa pakai tools (search, code, API calls)	LangChain, CrewAI, AutoGen, Smolagents
🟢	RAG Advanced	Chunking strategies, re-ranking, hybrid search, evaluation	LlamaIndex, LangChain, Ragas
🟢	Multi-modal	Vision-Language models (LLaVA, GPT-4V, Gemini)	HF Transformers, OpenAI API
🟡	Structured Output	LLM generate JSON/SQL/code yang valid	Outlines, Instructor, LMQL
🟡	Model Serving	Production inference: vLLM, TGI, Triton	vLLM, TGI, NVIDIA Triton
🟡	Evaluation	Benchmark LLM: MT-Bench, AlpacaEval, MMLU	lm-evaluation-harness, HELM
🔴	Pre-training	Train LLM dari nol (ratusan GPU, jutaan $)	Megatron-LM, DeepSpeed
🔴	MLOps	CI/CD untuk ML, monitoring, retraining pipelines	MLflow, Weights & Biases, Kubeflow

Level	Topic	What It Is	Tools
🟢	AI Agents	LLMs that can use tools (search, code, API calls)	LangChain, CrewAI, AutoGen, Smolagents
🟢	RAG Advanced	Chunking strategies, re-ranking, hybrid search, evaluation	LlamaIndex, LangChain, Ragas
🟢	Multi-modal	Vision-Language models (LLaVA, GPT-4V, Gemini)	HF Transformers, OpenAI API
🟡	Structured Output	LLM generate valid JSON/SQL/code	Outlines, Instructor, LMQL
🟡	Model Serving	Production inference: vLLM, TGI, Triton	vLLM, TGI, NVIDIA Triton
🟡	Evaluation	Benchmark LLMs: MT-Bench, AlpacaEval, MMLU	lm-evaluation-harness, HELM
🔴	Pre-training	Train LLM from scratch (hundreds of GPUs, millions $)	Megatron-LM, DeepSpeed
🔴	MLOps	CI/CD for ML, monitoring, retraining pipelines	MLflow, Weights & Biases, Kubeflow

💼

8. Career Paths di AI/ML

8. Career Paths in AI/ML

Role	Focus	Skills dari Seri Ini	Tambahan
NLP Engineer	Text processing systems	P1-6: BERT, GPT, NER, QA, T5, embeddings	LangChain, RAG production, evaluation
LLM Engineer	Fine-tune & deploy LLMs	P8-9: QLoRA, DPO, SFT + P7: Gradio deploy	vLLM, TGI, agents, prompt engineering
ML Engineer	Build & deploy ML systems	P1-10: semua! End-to-end pipeline	MLOps, Kubernetes, CI/CD, monitoring
AI Researcher	Novel methods & papers	P8-9: LoRA math, DPO theory, alignment	Paper reading, JAX, math/stats deep
Full-Stack AI	Complete AI applications	P7: Gradio + P6: RAG + P3: generation	React/Next.js, databases, API design

Role	Focus	Skills from This Series	Additional
NLP Engineer	Text processing systems	P1-6: BERT, GPT, NER, QA, T5, embeddings	LangChain, RAG production, evaluation
LLM Engineer	Fine-tune & deploy LLMs	P8-9: QLoRA, DPO, SFT + P7: Gradio deploy	vLLM, TGI, agents, prompt engineering
ML Engineer	Build & deploy ML systems	P1-10: everything! End-to-end pipeline	MLOps, Kubernetes, CI/CD, monitoring
AI Researcher	Novel methods & papers	P8-9: LoRA math, DPO theory, alignment	Paper reading, JAX, deep math/stats
Full-Stack AI	Complete AI applications	P7: Gradio + P6: RAG + P3: generation	React/Next.js, databases, API design

🎉

9. Penutup — Selamat! 🎉🏆

9. Closing — Congratulations! 🎉🏆

🎉🏆 SELAMAT! Anda telah menyelesaikan SELURUH seri Belajar Hugging Face — 10 Pages!

Dari pipeline pertama di Page 1 hingga DPO alignment di Page 9 dan capstone project di Page 10, Anda sekarang menguasai ekosistem Hugging Face secara menyeluruh:

✅ Inference: Pipeline API untuk 20+ tasks dalam 1 baris kode
✅ Fine-Tune BERT: Classification (93%+), NER (92%+ F1), QA
✅ Fine-Tune GPT: Text generation, instruction tuning, chatbot
✅ Seq2Seq: T5/BART untuk translation dan summarization
✅ Embeddings: Sentence similarity, FAISS vector search, RAG
✅ Deploy: Gradio apps, HF Spaces (free public URL!)
✅ QLoRA: Fine-tune 7B LLM di Colab gratis (yang sebelumnya $256/jam)
✅ DPO Alignment: ChatGPT-style training technique
✅ 70+ code files, 10 complete projects, 1 production RAG chatbot

Anda juga sudah menyelesaikan seri Neural Network (10 pages) dan seri TensorFlow (10 pages) sebelumnya — total 30 pages, ~2 MB konten tutorial!

Ini bukan akhir — ini baru awal perjalanan AI Anda. Gunakan roadmap di atas, terus eksperimen, dan bangun sesuatu yang luar biasa! 🚀

"The best time to start learning AI was yesterday. The second best time is now."

🎉🏆 CONGRATULATIONS! You've completed the ENTIRE Learn Hugging Face series — all 10 Pages!

From your first pipeline in Page 1 to DPO alignment in Page 9 and this capstone in Page 10, you now have comprehensive mastery of the Hugging Face ecosystem:

✅ Inference: Pipeline API for 20+ tasks in 1 line of code
✅ Fine-Tune BERT: Classification (93%+), NER (92%+ F1), QA
✅ Fine-Tune GPT: Text generation, instruction tuning, chatbot
✅ Seq2Seq: T5/BART for translation and summarization
✅ Embeddings: Sentence similarity, FAISS vector search, RAG
✅ Deploy: Gradio apps, HF Spaces (free public URL!)
✅ QLoRA: Fine-tune 7B LLM on free Colab (previously $256/hr)
✅ DPO Alignment: ChatGPT-style training technique
✅ 70+ code files, 10 complete projects, 1 production RAG chatbot

You also completed the Neural Network series (10 pages) and TensorFlow series (10 pages) before — total 30 pages, ~2 MB of tutorial content!

This is not the end — it's just the beginning of your AI journey. Use the roadmap above, keep experimenting, and build something extraordinary! 🚀

"The best time to start learning AI was yesterday. The second best time is now."

← Page Sebelumnya← Previous Page

Capstone Project:
End-to-End LLM

Capstone Project:
End-to-End LLM

📑 Daftar Isi — Page 10 (Final!)

📑 Table of Contents — Page 10 (Final!)

1. Perjalanan Kita — 10 Pages dalam Satu Pandangan

1. Our Journey — 10 Pages at a Glance

2-6. Capstone: Production RAG Chatbot — Full Pipeline

2-6. Capstone: Production RAG Chatbot — Full Pipeline

7. Roadmap: What's Next?

7. Roadmap: What's Next?

8. Career Paths di AI/ML

8. Career Paths in AI/ML

9. Penutup — Selamat! 🎉🏆

9. Closing — Congratulations! 🎉🏆

Page 9 — RLHF, DPO & Alignment