Seri Belajar PyTorch Part 6: Deployment

📚 Seri Belajar PyTorch:

1 2 3 4 5 6 7 8 9 10

📑 Daftar Isi — Part 6

Gap: Research → Production
TorchScript — Export tanpa Python dependency
ONNX Export — Format universal, 3× latency reduction
FastAPI Inference Server — REST API untuk model
Docker Containerization — Package & deploy anywhere
Quantization & Pruning — Model lebih kecil, lebih cepat
Deployment Checklist
Ringkasan & Preview Part 7

🚀

1. Gap: Research → Production

77% model ML gagal sampai production. Ini cara menutup gap-nya.

🚀 Deployment Pipeline — Dari Notebook ke Production

📦

2. TorchScript — Export Tanpa Python

Serialize model agar bisa jalan tanpa Python interpreter

21_torchscript_export.py

import torch

# Load trained model
model = MyModel()
model.load_state_dict(torch.load("model_weights.pt"))
model.eval()

# Method 1: Tracing (recommended untuk model tanpa control flow)
example_input = torch.randn(1, 3, 224, 224)
traced = torch.jit.trace(model, example_input)
traced.save("model_traced.pt")

# Method 2: Scripting (untuk model dengan if/for)
scripted = torch.jit.script(model)
scripted.save("model_scripted.pt")

# Load di production (tanpa definisi class!)
loaded = torch.jit.load("model_traced.pt")
output = loaded(example_input)
# ✅ Bisa run di C++, mobile, tanpa Python!

⚡

3. ONNX Export — 3× Faster Inference

Open Neural Network Exchange: format universal untuk semua platform

22_onnx_export.py

import torch
import onnxruntime as ort

# ===== EXPORT ke ONNX =====
model.eval()
dummy = torch.randn(1, 3, 224, 224)

torch.onnx.export(
    model, dummy, "model.onnx",
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch"}},  # Dynamic batch size
    dynamo=True  # PyTorch 2.5+ recommended exporter
)

# ===== INFERENCE dengan ONNX Runtime =====
session = ort.InferenceSession("model.onnx")
result = session.run(
    None, {"input": dummy.numpy()}
)
# ✅ 3× faster than PyTorch eager mode!
# ✅ Bisa deploy di: CPU, GPU, TensorRT, OpenVINO, mobile

📊 PyTorch Eager

Latency: ~45ms. Size: 44.7MB. Butuh Python + PyTorch. Flexible tapi lambat untuk production.

⚡ ONNX Runtime

Latency: ~15ms (3× faster). Size: 44.7MB. Tanpa Python dependency. Graph optimization otomatis.

🌐

4. FastAPI Inference Server

REST API: kirim gambar, terima prediksi. Production-ready.

23_fastapi_server.py — Production API

# pip install fastapi uvicorn onnxruntime pillow
from fastapi import FastAPI, UploadFile
import onnxruntime as ort
import numpy as np
from PIL import Image
import io

app = FastAPI(title="Image Classifier API")

# Load ONNX model SEKALI saat startup
session = ort.InferenceSession("model.onnx")
CLASSES = ["cat", "dog", "bird", "fish", "horse"]

def preprocess(image_bytes):
    img = Image.open(io.BytesIO(image_bytes)).resize((224, 224))
    arr = np.array(img).transpose(2,0,1).astype(np.float32) / 255.0
    arr = (arr - [.485,.456,.406]) / [.229,.224,.225]
    return arr[np.newaxis]

@app.post("/predict")
async def predict(file: UploadFile):
    data = await file.read()
    tensor = preprocess(data)
    result = session.run(None, {"input": tensor})
    probs = np.exp(result[0]) / np.exp(result[0]).sum()
    idx = probs.argmax()
    return {"class": CLASSES[idx], "confidence": float(probs[0][idx])}

# Run: uvicorn 23_fastapi_server:app --host 0.0.0.0 --port 8000
# Test: curl -X POST -F "file=@cat.jpg" http://localhost:8000/predict
# → {"class": "cat", "confidence": 0.9847}

🐳

5. Docker Containerization

Package everything → deploy anywhere

Dockerfile

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model.onnx .
COPY 23_fastapi_server.py main.py

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

# Build & Run:
# docker build -t ml-api .
# docker run -p 8000:8000 ml-api
# Image size: ~350MB (tanpa GPU)

📐

6. Quantization — Model Lebih Kecil & Cepat

Float32 → Int8: ukuran 4× lebih kecil, inference 2-4× lebih cepat

24_quantization.py

import torch.quantization

# Dynamic Quantization (paling mudah, CPU only)
quantized = torch.quantization.quantize_dynamic(
    model,
    {torch.nn.Linear, torch.nn.LSTM},  # Layer yang di-quantize
    dtype=torch.qint8                   # Float32 → Int8
)

# Bandingkan ukuran
torch.save(model.state_dict(), "original.pt")     # 44.7 MB
torch.save(quantized.state_dict(), "quantized.pt") # 11.2 MB ← 4× lebih kecil!

# Benchmark
# Original:   45ms/inference,  44.7 MB
# Quantized:  18ms/inference,  11.2 MB  ← 2.5× faster, 4× smaller!
# Akurasi drop: ~0.5-1%  (hampir tidak terasa)

Teknik	Ukuran	Speed	Akurasi Drop	Difficulty
Dynamic Quantization	4× kecil	2-4× cepat	~0.5%	🟢 Mudah
Static Quantization	4× kecil	3-5× cepat	~0.3%	🟡 Medium
Pruning	2-10× kecil	1.5-3× cepat	~1%	🟡 Medium
ONNX + Quantization	4× kecil	5-8× cepat	~0.5%	🟢 Mudah
Knowledge Distillation	10-50× kecil	10× cepat	~2%	🔴 Sulit

✅

7. Deployment Checklist

10 langkah sebelum production

#	Step	Tool
1	model.eval() + torch.no_grad()	PyTorch
2	Export ke ONNX atau TorchScript	torch.onnx.export
3	Validate: output ONNX ≈ output PyTorch	np.allclose()
4	Quantize untuk speed/size	quantize_dynamic
5	Build FastAPI inference server	FastAPI + Uvicorn
6	Add input validation & error handling	Pydantic
7	Containerize dengan Docker	Dockerfile
8	Load test (p50, p95, p99 latency)	Locust / wrk
9	Setup monitoring & logging	Prometheus + Grafana
10	Deploy ke cloud + auto-scaling	AWS / GCP / Azure

📝

8. Ringkasan Part 6

Deployment essentials

Konsep	Apa Itu	Kode Kunci
TorchScript	Serialize model tanpa Python	`torch.jit.trace(model, input)`
ONNX Export	Format universal, 3× faster	`torch.onnx.export(model, ...)`
ONNX Runtime	Optimized inference engine	`ort.InferenceSession("model.onnx")`
FastAPI	REST API server	`@app.post("/predict")`
Docker	Container → deploy anywhere	`docker build -t ml-api .`
Quantization	Float32 → Int8: 4× smaller	`quantize_dynamic(model, ...)`

📗

Next: Part 7 — Generative AI: GANs & Autoencoders

Dari classify ke create! Belajar membuat model yang bisa menghasilkan gambar baru: Variational Autoencoder (VAE), DCGAN, dan generate wajah/angka/seni dari noise.

🔥

Tech Review Desk — Seri Belajar PyTorch

Sumber: pytorch.org, onnxruntime.ai, fastapi.tiangolo.com, PyImageSearch, Markaicode, DailyDoseOfDS.

📧 rominur@gmail.com • ✈️ t.me/Jekardah_AI