πŸ“ Artikel ini ditulis dalam Bahasa Indonesia & English
πŸ“ This article is available in English & Bahasa Indonesia

πŸš€ Belajar TensorFlow β€” Page 9Learn TensorFlow β€” Page 9

TF Serving &
Model Deployment

TF Serving &
Model Deployment

Model di notebook tidak berguna sampai di-deploy. Page 9 membahas secara mendalam: SavedModel format dan cara export, TF Serving untuk production REST & gRPC API, TFLite untuk deployment di Android/iOS/edge devices dengan quantization, TF.js untuk inference di browser, Docker containerization untuk deployment konsisten, model versioning dan A/B testing, batch prediction untuk throughput tinggi, monitoring production models, dan complete deployment checklist.

A model in a notebook is useless until deployed. Page 9 covers in depth: SavedModel format and export, TF Serving for production REST & gRPC API, TFLite for Android/iOS/edge deployment with quantization, TF.js for browser inference, Docker containerization for consistent deployment, model versioning and A/B testing, batch prediction for high throughput, production model monitoring, and a complete deployment checklist.

πŸ“… MaretMarch 2026⏱ 32 menit baca32 min read
🏷 SavedModelTF ServingTFLiteTF.jsDockerREST APIgRPCQuantization
πŸ“š Seri Belajar TensorFlow:Learn TensorFlow Series:

πŸ“‘ Daftar Isi β€” Page 9

πŸ“‘ Table of Contents β€” Page 9

  1. Deployment Overview β€” Dari notebook ke production
  2. SavedModel Format β€” Standard export untuk semua platform
  3. TF Serving β€” REST API & gRPC production server
  4. Docker Deployment β€” Container untuk konsistensi
  5. TFLite β€” Mobile & edge: quantization, optimization
  6. TF.js β€” Model di browser: convert & load
  7. Model Versioning β€” A/B testing & safe rollback
  8. Batch Prediction β€” High throughput offline inference
  9. Production Monitoring β€” Data drift, latency, accuracy
  10. Deployment Checklist β€” Sebelum ke production
  11. Ringkasan & Preview Page 10
  1. Deployment Overview β€” From notebook to production
  2. SavedModel Format β€” Standard export for all platforms
  3. TF Serving β€” REST API & gRPC production server
  4. Docker Deployment β€” Containers for consistency
  5. TFLite β€” Mobile & edge: quantization, optimization
  6. TF.js β€” Model in browser: convert & load
  7. Model Versioning β€” A/B testing & safe rollback
  8. Batch Prediction β€” High throughput offline inference
  9. Production Monitoring β€” Data drift, latency, accuracy
  10. Deployment Checklist β€” Before going to production
  11. Summary & Page 10 Preview
πŸ—ΊοΈ

1. Deployment Overview β€” Dari Notebook ke Dunia Nyata

1. Deployment Overview β€” From Notebook to the Real World

Training = 10% dari ML project. Deployment & maintenance = 90%.
Training = 10% of an ML project. Deployment & maintenance = 90%.

Anda sudah bisa train model yang akurat (Pages 1-8). Tapi model di Jupyter notebook tidak bisa melayani user. Deployment berarti membuat model Anda bisa menerima input dan mengembalikan prediksi secara real-time, reliable, dan scalable. TensorFlow memiliki ekosistem deployment terlengkap β€” satu model bisa di-deploy ke server, handphone, browser, dan edge device.

You can now train accurate models (Pages 1-8). But a model in a Jupyter notebook can't serve users. Deployment means making your model accept input and return predictions in a real-time, reliable, and scalable way. TensorFlow has the most complete deployment ecosystem β€” one model can be deployed to server, phone, browser, and edge devices.

TensorFlow Deployment Ecosystem β€” One Model, Every Platform Train Export Deploy ───── ────── ────── model.fit() β†’ model.save() β†’ SavedModel/ (Pages 1-8) (standard format) β”‚ β”œβ”€β”€ TF Serving β†’ REST/gRPC API (server) β”‚ └── Docker container β”‚ β”œβ”€β”€ TFLite β†’ Android / iOS / RPi β”‚ └── .tflite (quantized, 5-10Γ— smaller) β”‚ β”œβ”€β”€ TF.js β†’ Browser (client-side) β”‚ └── model.json + weights β”‚ └── TFX Pipeline β†’ Full MLOps (Page 10) └── Vertex AI (Google Cloud) All paths start from SavedModel β€” the universal exchange format.
πŸ’Ύ

2. SavedModel Format β€” Export Standard

2. SavedModel Format β€” Standard Export

Satu format untuk semua: serving, TFLite, TF.js, TFX β€” semuanya mulai dari sini
One format for all: serving, TFLite, TF.js, TFX β€” everything starts here
60_savedmodel.py β€” Export & Load Modelspython
import tensorflow as tf
from tensorflow import keras

# ===========================
# 1. Save β€” SavedModel format (RECOMMENDED for deployment)
# ===========================
model.save("saved_model/my_classifier")
# Creates directory structure:
# saved_model/my_classifier/
#   β”œβ”€β”€ saved_model.pb          ← computation graph
#   β”œβ”€β”€ fingerprint.pb          ← integrity check
#   └── variables/
#       β”œβ”€β”€ variables.data-00000-of-00001  ← weights
#       └── variables.index

# ===========================
# 2. Save β€” Keras native format (.keras)
# ===========================
model.save("my_model.keras")         # single file, includes architecture
# Good for: development, sharing models with Keras users
# Bad for: TF Serving (needs SavedModel format)

# ===========================
# 3. Save β€” Weights only
# ===========================
model.save_weights("weights/my_weights.weights.h5")
# Only weights, no architecture. Must rebuild model first when loading.
# Good for: checkpointing during training, transfer learning

# ===========================
# 4. Load models
# ===========================
# SavedModel
loaded_sm = tf.keras.models.load_model("saved_model/my_classifier")
predictions = loaded_sm.predict(X_test[:5])

# Keras format
loaded_keras = tf.keras.models.load_model("my_model.keras")

# Weights only (must have identical architecture!)
new_model = build_model()  # same architecture
new_model.load_weights("weights/my_weights.weights.h5")

# ===========================
# 5. Inspect SavedModel with CLI
# ===========================
# saved_model_cli show --dir saved_model/my_classifier --all
# Shows: input/output signatures, shapes, dtypes
# This is what TF Serving uses to know the API!

# ===========================
# 6. Add custom serving signature
# ===========================
# For models with custom preprocessing:
class ServableModel(tf.Module):
    def __init__(self, model):
        self.model = model

    @tf.function(input_signature=[tf.TensorSpec(shape=[None, 224, 224, 3], dtype=tf.float32)])
    def serve(self, images):
        # Preprocessing included in serving!
        images = images / 255.0
        predictions = self.model(images, training=False)
        return {"predictions": predictions}

servable = ServableModel(model)
tf.saved_model.save(servable, "saved_model/servable",
                    signatures={"serving_default": servable.serve})

πŸŽ“ SavedModel vs .keras vs .h5 β€” Kapan Pakai Apa?
SavedModel/ (directory): Deployment ke TF Serving, TFLite, TF.js. Standar production.
.keras (single file): Development, sharing, prototyping. Standar Keras.
.weights.h5 (weights only): Checkpointing, transfer learning. Butuh arsitektur identik.
Rule: Untuk production β†’ selalu SavedModel. Untuk development β†’ .keras.

πŸŽ“ SavedModel vs .keras vs .h5 β€” When to Use What?
SavedModel/ (directory): Deploy to TF Serving, TFLite, TF.js. Production standard.
.keras (single file): Development, sharing, prototyping. Keras standard.
.weights.h5 (weights only): Checkpointing, transfer learning. Needs identical architecture.
Rule: For production β†’ always SavedModel. For development β†’ .keras.

πŸ–₯️

3. TF Serving β€” Production REST & gRPC Server

3. TF Serving β€” Production REST & gRPC Server

Server production-grade dari Google: auto batching, model versioning, GPU support
Production-grade server from Google: auto batching, model versioning, GPU support

TF Serving adalah server C++ high-performance untuk melayani model TensorFlow. Didesain untuk production: auto-batching request, GPU support, model hot-swapping (update model tanpa downtime), dan monitoring built-in. Dipakai oleh Google untuk menyajikan miliaran prediksi per hari.

TF Serving is a high-performance C++ server for serving TensorFlow models. Designed for production: auto-batching requests, GPU support, model hot-swapping (update models without downtime), and built-in monitoring. Used by Google to serve billions of predictions per day.

61_tf_serving.py β€” Setup & Client Codepython
import tensorflow as tf
import requests
import json
import numpy as np

# ===========================
# 1. Save model with version number
# ===========================
model.save("saved_model/my_classifier/1")   # version 1
# Later: model_v2.save("saved_model/my_classifier/2")  # version 2
# TF Serving auto-detects and serves the LATEST version!

# Directory structure for TF Serving:
# saved_model/my_classifier/
#   β”œβ”€β”€ 1/                    ← version 1
#   β”‚   β”œβ”€β”€ saved_model.pb
#   β”‚   └── variables/
#   └── 2/                    ← version 2 (latest, auto-served)
#       β”œβ”€β”€ saved_model.pb
#       └── variables/

# ===========================
# 2. REST API Client
# ===========================
# TF Serving runs on port 8501 (REST) and 8500 (gRPC)

# Prepare input data
test_images = X_test[:5].tolist()  # must be JSON-serializable

# Send REST request
url = "http://localhost:8501/v1/models/my_classifier:predict"
payload = json.dumps({"instances": test_images})
headers = {"Content-Type": "application/json"}

response = requests.post(url, data=payload, headers=headers)
result = response.json()

predictions = np.array(result["predictions"])
predicted_classes = np.argmax(predictions, axis=1)
print(f"Predicted: {predicted_classes}")
# Predicted: [7, 2, 1, 0, 4]

# Request specific version:
# url = "http://localhost:8501/v1/models/my_classifier/versions/1:predict"

# ===========================
# 3. gRPC Client (faster for production!)
# ===========================
# pip install tensorflow-serving-api
import grpc
# from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc
# 
# channel = grpc.insecure_channel('localhost:8500')
# stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
# 
# request = predict_pb2.PredictRequest()
# request.model_spec.name = 'my_classifier'
# request.inputs['input_1'].CopyFrom(
#     tf.make_tensor_proto(test_images, dtype=tf.float32))
# 
# response = stub.Predict(request, timeout=10.0)
# predictions = tf.make_ndarray(response.outputs['dense_1'])

# gRPC vs REST:
# gRPC: ~2-5Γ— faster (binary protocol, no JSON serialization)
# REST: easier to debug, works with curl, browser-friendly
# Production recommendation: gRPC for internal services, REST for external API
🐳

4. Docker Deployment β€” Container untuk Konsistensi

4. Docker Deployment β€” Containers for Consistency

"Works on my machine" β†’ works everywhere dengan Docker
"Works on my machine" β†’ works everywhere with Docker
62_docker_deploy.sh β€” Docker TF Serving Setupbash
# ===========================
# 1. Pull TF Serving Docker image
# ===========================
docker pull tensorflow/serving              # CPU version
docker pull tensorflow/serving:latest-gpu   # GPU version (needs nvidia-docker)

# ===========================
# 2. Run TF Serving container
# ===========================
docker run -d --name tf_serving   -p 8501:8501   -p 8500:8500   --mount type=bind,source=$(pwd)/saved_model/my_classifier,target=/models/my_classifier   -e MODEL_NAME=my_classifier   -e MODEL_BASE_PATH=/models/my_classifier   tensorflow/serving

# ===========================
# 3. Test with curl
# ===========================
curl -X POST http://localhost:8501/v1/models/my_classifier:predict   -H "Content-Type: application/json"   -d '{"instances": [[0.1, 0.2, 0.3, 0.4]]}'

# Check model status
curl http://localhost:8501/v1/models/my_classifier

# ===========================
# 4. Custom Dockerfile (with model baked in)
# ===========================
# FROM tensorflow/serving
# COPY saved_model/my_classifier /models/my_classifier
# ENV MODEL_NAME=my_classifier
# EXPOSE 8501 8500
# 
# docker build -t my-ml-service .
# docker run -p 8501:8501 my-ml-service

# ===========================
# 5. Docker Compose (multiple services)
# ===========================
# version: '3'
# services:
#   tf-serving:
#     image: tensorflow/serving
#     ports: ["8501:8501", "8500:8500"]
#     volumes: ["./saved_model:/models"]
#     environment:
#       - MODEL_NAME=my_classifier
#   api:
#     build: ./api
#     ports: ["5000:5000"]
#     depends_on: [tf-serving]
πŸ“±

5. TFLite β€” Mobile & Edge Deployment

5. TFLite β€” Mobile & Edge Deployment

Model 5-10Γ— lebih kecil, inference di Android/iOS/Raspberry Pi tanpa internet
5-10Γ— smaller models, inference on Android/iOS/Raspberry Pi without internet
63_tflite.py β€” Convert, Optimize, & Deploy to Mobilepython
import tensorflow as tf
import numpy as np

# ===========================
# 1. Basic conversion (no optimization)
# ===========================
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model/my_classifier/1")
tflite_model = converter.convert()

with open("model.tflite", "wb") as f:
    f.write(tflite_model)

original_size = 25.6  # MB (SavedModel)
tflite_size = len(tflite_model) / (1024 * 1024)
print(f"Original: {original_size:.1f} MB")
print(f"TFLite:   {tflite_size:.1f} MB")
print(f"Reduction: {original_size/tflite_size:.1f}Γ—")

# ===========================
# 2. Dynamic range quantization (RECOMMENDED default)
# ===========================
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model/my_classifier/1")
converter.optimizations = [tf.lite.Optimize.DEFAULT]  # dynamic range!
tflite_quant = converter.convert()

print(f"Quantized: {len(tflite_quant)/1024/1024:.1f} MB")
# ~4Γ— smaller than original! (float32 β†’ int8 weights)
# Accuracy loss: typically < 1%

# ===========================
# 3. Full integer quantization (smallest, fastest)
# ===========================
def representative_dataset():
    """Provide sample data for calibration"""
    for i in range(100):
        sample = X_train[i:i+1].astype(np.float32)
        yield [sample]

converter = tf.lite.TFLiteConverter.from_saved_model("saved_model/my_classifier/1")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8   # or tf.uint8
converter.inference_output_type = tf.int8
tflite_int8 = converter.convert()

print(f"Full INT8: {len(tflite_int8)/1024/1024:.1f} MB")
# ~4Γ— smaller AND ~2-4Γ— faster inference on mobile!
# Runs on CPU integer units β€” no GPU needed!

# ===========================
# 4. Float16 quantization (GPU-friendly mobile)
# ===========================
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model/my_classifier/1")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_fp16 = converter.convert()
# ~2Γ— smaller, runs on mobile GPU (faster than INT8 on GPU)

# ===========================
# 5. Test TFLite model in Python
# ===========================
interpreter = tf.lite.Interpreter(model_content=tflite_quant)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Run inference
test_input = X_test[0:1].astype(np.float32)
interpreter.set_tensor(input_details[0]['index'], test_input)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
print(f"TFLite prediction: {np.argmax(output)}")
# Should match original model prediction!
QuantizationSize ReductionSpeedAccuracy ImpactBest For
No quantization1Γ— (baseline)1Γ— (baseline)0%Maximum accuracy
Dynamic range~4Γ— smaller~2Γ— faster< 1%Default choice ⭐
Float16~2Γ— smaller~1.5Γ— faster~0%Mobile GPU
Full INT8~4Γ— smaller~3Γ— faster1-3%Edge/IoT devices
QuantizationSize ReductionSpeedAccuracy ImpactBest For
No quantization1Γ— (baseline)1Γ— (baseline)0%Maximum accuracy
Dynamic range~4Γ— smaller~2Γ— faster< 1%Default choice ⭐
Float16~2Γ— smaller~1.5Γ— faster~0%Mobile GPU
Full INT8~4Γ— smaller~3Γ— faster1-3%Edge/IoT devices
🌐

6. TF.js β€” Model di Browser

6. TF.js β€” Model in the Browser

Inference client-side: no server needed, privacy-preserving, instant response
Client-side inference: no server needed, privacy-preserving, instant response
64_tfjs.sh β€” Convert to TF.jsbash
# Install converter
pip install tensorflowjs

# Convert SavedModel β†’ TF.js format
tensorflowjs_converter   --input_format=tf_saved_model   --output_format=tfjs_graph_model   --quantize_uint8   saved_model/my_classifier/1   web_model/

# Output files:
# web_model/
#   β”œβ”€β”€ model.json          ← architecture + weight manifest
#   └── group1-shard1of1.bin ← weights binary
65_tfjs_browser.html β€” Load & Predict in Browserhtml
<!-- Load TF.js library -->
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>

<script>
async function loadAndPredict() {
    // Load model from your web server
    const model = await tf.loadGraphModel('web_model/model.json');

    // Create input tensor (e.g., from canvas/image)
    const input = tf.zeros([1, 224, 224, 3]);

    // Run inference
    const prediction = model.predict(input);
    const classIndex = prediction.argMax(1).dataSync()[0];

    console.log(`Predicted class: ${classIndex}`);

    // Clean up tensors (prevent memory leak!)
    input.dispose();
    prediction.dispose();
}

loadAndPredict();
</script>

<!-- Use cases: -->
<!-- β€’ Real-time webcam classification -->
<!-- β€’ Image editing/filtering -->
<!-- β€’ Text sentiment analysis -->
<!-- β€’ Pose detection (PoseNet) -->
<!-- β€’ No server costs! No data leaves user's device! -->
πŸ“‹

7. Model Versioning β€” A/B Testing & Rollback

7. Model Versioning β€” A/B Testing & Rollback

Deploy model baru tanpa downtime, bandingkan performa, rollback jika bermasalah
Deploy new models without downtime, compare performance, rollback if issues arise
66_versioning.py β€” Model Version Managementpython
import tensorflow as tf
import requests
import json

# ===========================
# 1. Save with version numbers
# ===========================
model_v1.save("models/classifier/1")    # January model
model_v2.save("models/classifier/2")    # February model (improved)
model_v3.save("models/classifier/3")    # March model (latest)

# TF Serving auto-serves LATEST version (v3)
# But you can request ANY version:

# ===========================
# 2. Request specific version
# ===========================
# Latest (default):
url_latest = "http://localhost:8501/v1/models/classifier:predict"

# Specific version:
url_v1 = "http://localhost:8501/v1/models/classifier/versions/1:predict"
url_v2 = "http://localhost:8501/v1/models/classifier/versions/2:predict"

# ===========================
# 3. A/B Testing β€” compare versions
# ===========================
import random

def predict_with_ab_test(input_data, traffic_split=0.1):
    """Send 10% traffic to new model, 90% to stable model"""
    if random.random() < traffic_split:
        url = url_v3          # new model (10%)
        version = "v3_new"
    else:
        url = url_v2          # stable model (90%)
        version = "v2_stable"

    resp = requests.post(url, json={"instances": input_data})
    result = resp.json()["predictions"]

    # Log for comparison
    log_prediction(version, input_data, result)
    return result

# ===========================
# 4. Rollback β€” if new model performs poorly
# ===========================
# Option 1: Delete version folder
# rm -rf models/classifier/3   β†’ TF Serving auto-falls back to v2

# Option 2: Model config file
# model_config.txt:
# model_config_list {
#   config {
#     name: 'classifier'
#     base_path: '/models/classifier'
#     model_platform: 'tensorflow'
#     model_version_policy {
#       specific { versions: 2 }  ← force serve v2 only
#     }
#   }
# }
πŸ“Š

8. Batch Prediction β€” High Throughput Offline

8. Batch Prediction β€” High Throughput Offline

Proses jutaan prediksi sekaligus β€” untuk analytics, scoring, pipeline
Process millions of predictions at once β€” for analytics, scoring, pipelines
67_batch_prediction.py β€” Efficient Batch Inferencepython
import tensorflow as tf
import numpy as np

# ===========================
# 1. Simple batch prediction
# ===========================
model = tf.keras.models.load_model("saved_model/my_classifier")

# Predict on large dataset
all_predictions = model.predict(X_large, batch_size=256, verbose=1)
# Progress bar: 100000/100000 [==============================] - 45s

# ===========================
# 2. tf.data pipeline for batch prediction (memory-efficient)
# ===========================
predict_ds = (tf.data.Dataset.from_tensor_slices(X_large)
    .batch(256)
    .prefetch(tf.data.AUTOTUNE))

all_preds = []
for batch in predict_ds:
    preds = model(batch, training=False)
    all_preds.append(preds.numpy())

all_predictions = np.concatenate(all_preds, axis=0)
print(f"Predicted {len(all_predictions)} samples")

# ===========================
# 3. Predict from files (no need to load all into RAM)
# ===========================
file_ds = (tf.data.TFRecordDataset('data/large_dataset.tfrecord')
    .map(parse_fn, num_parallel_calls=tf.data.AUTOTUNE)
    .batch(256)
    .prefetch(tf.data.AUTOTUNE))

results = model.predict(file_ds)
# Processes terabytes of data without running out of memory!
πŸ“‘

9. Production Monitoring β€” Deteksi Masalah Sebelum Terlambat

9. Production Monitoring β€” Detect Problems Before It's Too Late

Model yang bagus di lab bisa buruk di production β€” monitor terus!
A model that's great in the lab can fail in production β€” monitor continuously!

πŸŽ“ 5 Hal yang Harus Dimonitor di Production:
1. Latency: Berapa ms per prediksi? Target: <100ms untuk real-time. Alert jika >500ms.
2. Throughput: Berapa requests/second? Apakah server kewalahan?
3. Error Rate: Berapa % request yang gagal (500 error, timeout)?
4. Data Drift: Apakah distribusi input berubah dari training data? Jika ya, model mungkin sudah tidak relevan.
5. Prediction Drift: Apakah distribusi output berubah? Misalnya, tiba-tiba 90% prediksi jadi kelas A β€” mungkin ada bug atau data shift.

πŸŽ“ 5 Things to Monitor in Production:
1. Latency: How many ms per prediction? Target: <100ms for real-time. Alert if >500ms.
2. Throughput: How many requests/second? Is the server overwhelmed?
3. Error Rate: What % of requests fail (500 errors, timeouts)?
4. Data Drift: Has input distribution changed from training data? If so, model may be stale.
5. Prediction Drift: Has output distribution changed? E.g., suddenly 90% predictions are class A β€” might be a bug or data shift.

68_monitoring.py β€” Basic Prediction Monitoringpython
import time
import numpy as np
from collections import defaultdict

# Simple prediction logger
class PredictionMonitor:
    def __init__(self):
        self.latencies = []
        self.predictions = defaultdict(int)
        self.errors = 0
        self.total = 0

    def predict_and_log(self, model, input_data):
        self.total += 1
        start = time.time()
        try:
            pred = model.predict(input_data, verbose=0)
            latency = (time.time() - start) * 1000  # ms
            self.latencies.append(latency)

            pred_class = np.argmax(pred, axis=1)[0]
            self.predictions[pred_class] += 1

            # Alert on high latency
            if latency > 500:
                print(f"⚠️ HIGH LATENCY: {latency:.0f}ms")

            return pred
        except Exception as e:
            self.errors += 1
            print(f"❌ ERROR: {e}")
            return None

    def report(self):
        print(f"\nπŸ“Š Monitoring Report:")
        print(f"  Total requests: {self.total}")
        print(f"  Error rate: {self.errors/max(self.total,1):.1%}")
        if self.latencies:
            print(f"  Avg latency: {np.mean(self.latencies):.1f}ms")
            print(f"  P99 latency: {np.percentile(self.latencies, 99):.1f}ms")
        print(f"  Class distribution: {dict(self.predictions)}")
βœ…

10. Deployment Checklist β€” Sebelum ke Production

10. Deployment Checklist β€” Before Going to Production

10 langkah wajib sebelum model Anda melayani user sungguhan
10 mandatory steps before your model serves real users
#LangkahDetailCheck
1Test akurasi di held-out setPastikan performa sesuai ekspektasi☐
2Test dengan data edge caseInput kosong, gambar blank, teks sangat panjang☐
3Benchmark latencyTarget: <100ms untuk real-time, <1s untuk batch☐
4Quantize jika mobileDynamic range quantization β†’ 4Γ— smaller☐
5Test TFLite accuracyPastikan quantization tidak merusak akurasi☐
6Setup versioningSavedModel/1, /2, /3 β€” selalu bisa rollback☐
7Docker containerReproducible environment, easy scaling☐
8Load testingBerapa concurrent requests sebelum crash?☐
9Monitoring setupLatency, error rate, prediction distribution☐
10Rollback planJika model baru bermasalah, bisa kembali ke v-1☐
#StepDetailCheck
1Test accuracy on held-out setEnsure performance meets expectations☐
2Test with edge case dataEmpty input, blank images, very long text☐
3Benchmark latencyTarget: <100ms real-time, <1s batch☐
4Quantize if mobileDynamic range quantization β†’ 4Γ— smaller☐
5Test TFLite accuracyEnsure quantization doesn't hurt accuracy☐
6Setup versioningSavedModel/1, /2, /3 β€” always rollback-ready☐
7Docker containerReproducible environment, easy scaling☐
8Load testingHow many concurrent requests before crash?☐
9Monitoring setupLatency, error rate, prediction distribution☐
10Rollback planIf new model fails, can revert to v-1☐
πŸ“

11. Ringkasan Page 9

11. Page 9 Summary

Semua yang sudah kita pelajari
Everything we learned
Deployment Decision Tree β€” Pilih Platform yang Tepat Di mana model akan berjalan? β”œβ”€β”€ Server/Cloud β†’ TF Serving + Docker β”‚ β”œβ”€β”€ Real-time API? β†’ REST (simple) atau gRPC (fast) β”‚ └── Batch processing? β†’ model.predict() + tf.data pipeline β”‚ β”œβ”€β”€ Mobile/Edge β†’ TFLite β”‚ β”œβ”€β”€ Android/iOS app? β†’ .tflite + TFLite interpreter β”‚ β”œβ”€β”€ Raspberry Pi/IoT? β†’ .tflite + INT8 quantization β”‚ └── Coral Edge TPU? β†’ .tflite + EdgeTPU compiler β”‚ └── Browser β†’ TF.js β”œβ”€β”€ Client-side inference? β†’ tfjs_graph_model β”œβ”€β”€ Privacy requirement? β†’ no data leaves device! └── No server costs? β†’ all computation on user's device All paths start from: model.save("saved_model/name/version")
KonsepApa ItuKode Kunci
SavedModelFormat export universalmodel.save("saved_model/v1")
TF ServingProduction REST/gRPC serverdocker run tensorflow/serving
TFLiteMobile & edge deploymentTFLiteConverter + quantize
TF.jsBrowser inferencetensorflowjs_converter
QuantizationModel compression 2-4Γ—Optimize.DEFAULT
DockerContainer deploymentdocker run -p 8501:8501
VersioningA/B test & rollbacksaved_model/name/1, /2, /3
MonitoringLatency, drift, errorsPredictionMonitor class
ConceptWhat It IsKey Code
SavedModelUniversal export formatmodel.save("saved_model/v1")
TF ServingProduction REST/gRPC serverdocker run tensorflow/serving
TFLiteMobile & edge deploymentTFLiteConverter + quantize
TF.jsBrowser inferencetensorflowjs_converter
QuantizationModel compression 2-4Γ—Optimize.DEFAULT
DockerContainer deploymentdocker run -p 8501:8501
VersioningA/B test & rollbacksaved_model/name/1, /2, /3
MonitoringLatency, drift, errorsPredictionMonitor class
← Page Sebelumnya← Previous Page

Page 8 β€” GAN & Generative Models

πŸ“˜

Coming Next: Page 10 β€” Capstone: End-to-End ML Project πŸ†

Grand finale! Gabungkan SEMUA dari Page 1-9 dalam satu proyek lengkap: tf.data pipeline β†’ augmentation β†’ EfficientNet transfer learning β†’ custom training β†’ TensorBoard monitoring β†’ SavedModel export β†’ TFLite β†’ TF Serving β†’ Docker deployment. Plus roadmap TFX, Vertex AI, dan JAX. Penutup seri!

πŸ“˜

Coming Next: Page 10 β€” Capstone: End-to-End ML Project πŸ†

Grand finale! Combine EVERYTHING from Pages 1-9 in one complete project: tf.data pipeline β†’ augmentation β†’ EfficientNet transfer learning β†’ custom training β†’ TensorBoard monitoring β†’ SavedModel export β†’ TFLite β†’ TF Serving β†’ Docker deployment. Plus TFX, Vertex AI, and JAX roadmap. Series finale!