Bob Builds Serverless AI Applications with Kubernetes on AlmaLinux

How to combine serverless architecture and AI-powered services on Kubernetes, enabling scalable, cost-efficient, and intelligent applications.

By İbrahim Korucuoğlu ( @siberoloji) | Thursday, November 21, 2024

Categories:

Linux

4 minute read

Let’s dive into Chapter 50, “Bob Builds Serverless AI Applications with Kubernetes!”. In this chapter, Bob explores how to combine serverless architecture and AI-powered services on Kubernetes, enabling scalable, cost-efficient, and intelligent applications.

1. Introduction: Why Serverless for AI Applications?

Bob’s company wants to build AI-powered services that scale dynamically based on demand, while keeping infrastructure costs low. Serverless architecture on Kubernetes is the perfect solution, enabling resource-efficient, event-driven AI applications.

“Serverless and AI—low overhead, high intelligence. Let’s make it happen!” Bob says, eager to begin.

2. Setting Up a Serverless Platform

Bob starts by deploying Knative, a Kubernetes-based serverless platform.

Installing Knative:

Bob installs Knative Serving and Eventing:

kubectl apply -f https://github.com/knative/serving/releases/download/v1.9.0/serving-core.yaml
kubectl apply -f https://github.com/knative/eventing/releases/download/v1.9.0/eventing-core.yaml

Verifying Installation:

kubectl get pods -n knative-serving
kubectl get pods -n knative-eventing

“Knative brings serverless capabilities to my Kubernetes cluster!” Bob says.

3. Deploying an AI-Powered Serverless Application

Bob builds a serverless function for image recognition using a pre-trained AI model.

Creating the Function:

Bob writes a Python serverless function:

from flask import Flask, request
from tensorflow.keras.models import load_model

app = Flask(__name__)
model = load_model("image_recognition_model.h5")

@app.route('/predict', methods=['POST'])
def predict():
    image = request.files['image']
    prediction = model.predict(image)
    return {"prediction": prediction.tolist()}

Packaging and Deploying:

Bob containerizes the function:

FROM python:3.9
RUN pip install flask tensorflow
ADD app.py /app.py
CMD ["python", "app.py"]

He deploys it with Knative Serving:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: image-recognition
spec:
  template:
    spec:
      containers:
      - image: myrepo/image-recognition:latest

“Serverless AI is live and ready to process images on demand!” Bob says.

4. Scaling AI Workloads Dynamically

Bob ensures the AI function scales automatically based on user demand.

Configuring Autoscaling:

Bob adds Knative autoscaling annotations:

metadata:
  annotations:
    autoscaling.knative.dev/minScale: "1"
    autoscaling.knative.dev/maxScale: "10"

Testing Load:

He uses a load-testing tool to simulate multiple requests:

hey -z 30s -c 50 http://image-recognition.default.example.com/predict

“Dynamic scaling keeps my AI service efficient and responsive!” Bob says.

5. Adding Event-Driven Processing

Bob integrates Knative Eventing to trigger AI functions based on events.

Creating an Event Source:

Bob sets up a PingSource to send periodic events:

apiVersion: sources.knative.dev/v1
kind: PingSource
metadata:
  name: periodic-trigger
spec:
  schedule: "*/5 * * * *"
  contentType: "application/json"
  data: '{"action": "process_new_images"}'
  sink:
    ref:
      apiVersion: serving.knative.dev/v1
      kind: Service
      name: image-recognition

Testing Event Flow:
```
kubectl get events
```

“Event-driven architecture makes my AI functions smarter and more reactive!” Bob notes.

6. Storing AI Model Predictions

Bob sets up a database to store predictions for analysis.

Deploying PostgreSQL:

Bob uses Helm to deploy a PostgreSQL database:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install postgresql bitnami/postgresql

Saving Predictions:

He writes a script to save predictions:

import psycopg2

conn = psycopg2.connect("dbname=predictions user=admin password=secret host=postgresql-service")
cur = conn.cursor()
cur.execute("INSERT INTO predictions (image_id, result) VALUES (%s, %s)", (image_id, result))
conn.commit()

“Stored predictions make analysis and future improvements easier!” Bob says.

7. Monitoring and Debugging

Bob integrates monitoring tools to track performance and troubleshoot issues.

Using Prometheus and Grafana:
- Bob collects metrics from Knative services and creates dashboards for:
  - Request latency.
  - Scaling behavior.
  - Error rates.

Configuring Alerts:

He adds alerts for function timeouts:

groups:
- name: serverless-alerts
  rules:
  - alert: FunctionTimeout
    expr: request_duration_seconds > 1
    for: 1m
    labels:
      severity: warning

“Monitoring keeps my serverless AI applications reliable!” Bob says.

8. Securing Serverless AI Applications

Bob ensures the security of his serverless workloads.

Using HTTPS:

Bob enables HTTPS for the AI function:

kubectl apply -f https://cert-manager.io/docs/installation/

Managing Secrets with Kubernetes:

He stores database credentials securely:

kubectl create secret generic db-credentials --from-literal=username=admin --from-literal=password=secret123

“Security is paramount for user trust and data protection!” Bob says.

9. Optimizing Costs for Serverless AI

Bob explores cost-saving strategies for his serverless AI applications.

Using Spot Instances for Low-Priority Functions:
- Bob deploys non-critical functions on Spot Instances:
```
nodeSelector:
  cloud.google.com/gke-preemptible: "true"
```
Reviewing Function Costs:
- He uses tools like Kubecost to analyze function expenses:
```
helm install kubecost kubecost/cost-analyzer
```

“Serverless architecture keeps costs under control without sacrificing performance!” Bob notes.

10. Conclusion: Bob’s Serverless AI Breakthrough

With Knative, dynamic scaling, event-driven triggers, and secure integrations, Bob has successfully built intelligent serverless AI applications. His setup is highly scalable, cost-effective, and ready for real-world workloads.

Next, Bob plans to explore Kubernetes for Quantum Computing Workloads, venturing into the future of computing.

Stay tuned for the next chapter: “Bob Explores Quantum Computing with Kubernetes!”

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

< AR/VR Workloads with Kubernetes Quantum Computing with Kubernetes >

Last modified 05.03.2025: blog post article changes (59793db)