Bob Explores AI/ML Workloads on AlmaLinux

Delve into artificial intelligence (AI) and machine learning (ML) workloads on AlmaLinux.

By İbrahim Korucuoğlu ( @siberoloji) | Tuesday, December 10, 2024

Categories:

Linux

4 minute read

Bob’s next adventure was to delve into artificial intelligence (AI) and machine learning (ML) workloads on AlmaLinux. With growing interest in data-driven applications, Bob aimed to configure his AlmaLinux server to handle data processing, model training, and inference tasks efficiently.

“AI and ML are the future of computing—let’s see what AlmaLinux can do!” Bob said, ready to explore.

Chapter Outline: “Bob Explores AI/ML Workloads on AlmaLinux”

Introduction: Why AI/ML on AlmaLinux?
- Overview of AI/ML workloads.
- Why AlmaLinux is a solid choice for AI/ML.
Setting Up an AI/ML Environment
- Installing Python, Jupyter, and common ML libraries.
- Configuring GPU support with CUDA and cuDNN.
Running AI/ML Workloads
- Using TensorFlow and PyTorch.
- Training and testing a simple ML model.
Optimizing Performance for AI/ML
- Managing resources with Docker and Podman.
- Fine-tuning CPU and GPU performance.
Deploying AI Models
- Setting up a REST API with Flask for model inference.
- Automating model deployment with Ansible.
Monitoring and Scaling AI/ML Applications
- Using Prometheus and Grafana to monitor workloads.
- Scaling ML services with Kubernetes.
Conclusion: Bob Reflects on AI/ML Mastery

Part 1: Introduction: Why AI/ML on AlmaLinux?

Bob learned that AI/ML workloads are computationally intensive, requiring powerful hardware and optimized software environments. AlmaLinux offers stability and compatibility, making it ideal for running AI/ML frameworks.

Why Use AlmaLinux for AI/ML?

Open-source: No licensing fees, full control over the environment.
Stable: Based on RHEL, ensuring reliability.
Scalable: Supports modern tools like Docker, Kubernetes, and TensorFlow.

“AlmaLinux provides a solid foundation for AI innovation!” Bob said.

Part 2: Setting Up an AI/ML Environment

Step 1: Installing Python and Jupyter

Install Python and essential tools:

sudo dnf install -y python3 python3-pip

Install Jupyter Notebook:
```
pip3 install jupyterlab
```

Start Jupyter:

jupyter-lab --no-browser --ip=0.0.0.0 --port=8888

Step 2: Installing ML Libraries

Install common libraries:

pip3 install numpy pandas matplotlib scikit-learn

Install TensorFlow and PyTorch:

pip3 install tensorflow torch torchvision

Step 3: Configuring GPU Support

If Bob’s server had an NVIDIA GPU:

Install CUDA:
```
sudo dnf install -y nvidia-driver cuda
```
Install cuDNN:
```
sudo dnf install -y libcudnn8
```

Verify GPU support in TensorFlow:

import tensorflow as tf
print("GPUs Available: ", tf.config.list_physical_devices('GPU'))

“The AI environment is ready—time to build something cool!” Bob said.

Part 3: Running AI/ML Workloads

Step 1: Training a Simple Model

Bob created a basic TensorFlow script to train a model on the MNIST dataset.

Save the following Python script as mnist_train.py:

import tensorflow as tf
from tensorflow.keras import layers, models

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build a simple model
model = models.Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile and train the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc}")

Run the script:
```
python3 mnist_train.py
```

Step 2: Visualizing Results

Bob used Matplotlib to plot training results:

Add to the script:

import matplotlib.pyplot as plt

plt.plot(history.history['accuracy'], label='accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

“Training a model was easier than I thought!” Bob said.

Part 4: Optimizing Performance for AI/ML

Step 1: Using Docker or Podman

Bob containerized his AI workloads for portability:

Create a Dockerfile:

FROM tensorflow/tensorflow:latest-gpu
WORKDIR /app
COPY mnist_train.py .
CMD ["python", "mnist_train.py"]

Build and run the container:

podman build -t ai-workload .
podman run --gpus all ai-workload

Step 2: Fine-Tuning Hardware Performance

Monitor GPU usage:
```
nvidia-smi
```

Optimize TensorFlow for the GPU:

from tensorflow.config import experimental

experimental.set_memory_growth(tf.config.list_physical_devices('GPU')[0], True)

“Optimized hardware ensures maximum speed for training!” Bob said.

Part 5: Deploying AI Models

Step 1: Building a REST API

Install Flask:
```
pip3 install flask
```

Create an API script:

from flask import Flask, request, jsonify
import tensorflow as tf

app = Flask(__name__)
model = tf.keras.models.load_model('mnist_model.h5')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.json
    prediction = model.predict(data['input'])
    return jsonify({'prediction': prediction.tolist()})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Run the API:
```
python3 api.py
```

Step 2: Automating Deployment with Ansible

Bob created an Ansible playbook to deploy the API across multiple servers:

Example playbook:

---
- name: Deploy AI API
  hosts: ai-servers
  tasks:
    - name: Copy API script
      copy:
        src: /home/bob/api.py
        dest: /opt/ai/api.py
    - name: Install dependencies
      pip:
        name: flask tensorflow
    - name: Start API
      command: python3 /opt/ai/api.py &

Part 6: Monitoring and Scaling AI/ML Applications

Step 1: Monitoring Workloads

Use Prometheus to track GPU and CPU metrics.
Visualize with Grafana.

Step 2: Scaling with Kubernetes

Bob used Kubernetes to manage multiple instances of his AI API:

Create a deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-api
  template:
    metadata:
      labels:
        app: ai-api
    spec:
      containers:
      - name: ai-api
        image: ai-workload
        ports:
        - containerPort: 5000

Conclusion: Bob Reflects on AI/ML Mastery

Bob successfully configured AlmaLinux to handle AI/ML workloads, from training models to deploying them as scalable APIs. He felt confident in AlmaLinux’s capabilities for data-driven applications.