Skip to content

Deploy custom model

1. Overview

This tutorial will guide you through deploying a custom model onto Highwind.

Tutorial steps

  1. To do this, we will containerise an existing trained model using a Docker Repo asset
  2. Then we will perform some local testing to prove to ourselves that our containerised model works as expected
  3. Finally, we will deploy our model onto Highwind

Prerequisites

Note

This tutorial assumes that you have finished your initial experimentation phase in which you have established your data preprocessing and model training processes in something like a jupyter notebook environment. Continue if you have a model that you want to deploy for inference.

  • You need to have followed the Getting Started page
  • You need a trained model that you have saved to disk
  • You require a small input sample that you can give to your model for inference
  • We suggest creating a new folder in your project called deployment that will contain your deployment resources

Notes

This tutorial is based on the following project folder structure. This is not a strict requirement but it would help to structure your project in a similar way (at least to follow along with this tutorial):

.
├── deployment # Deployment resources
├── notebooks # Initial experimentation notebooks
├── saved_model # Saved trained model
└── src # Auxiliary code

2. Containerise your model

2.1 Create a main.py file

In the deployment folder, create a main.py file which describes how to use the model for inference using Kserve.

This is where you capture the details of how your model integrates with the Kserve API (required for serving models on Highwind). This includes details of accepting input payloads, processing them, and using them as features to make predictions. This process mainly involves putting the relevant sections of your existing inference code in the correct place so that your model will work on Kserve.

Note

Your custom model class in main.py will extend the kserve.Model base class where several handlers are defined: load, preprocess, predict and postprocess. These handlers are executed in sequence. The output of preprocess is passed to predict as the input, the predict handler executes the inference for your model, the postprocess handler then turns the raw prediction result into user-friendly inference response. The load handler is used for writing custom code to load your model into the memory from local file system or remote model storage, a general good practice is to call the load handler in the model server class __init__ function, so your model is loaded on startup and ready to serve prediction requests. For more information on these Kserve handlers, please visit the Kserve docs.

  • In your main.py file, create a new class that extends the kserve.Model base class and update the following methods:
    • load(): How to load your trained model from disk
    • preprocess(): Any preprocessing to be done before calling predict
    • predict(): The inference procedure
    • postprocess(): Any post-processing to do after calling predict

Here is an example main.py file that you can use for inspiration:

Tips

You must accept an argument named model_name in your main.py script as shown below.

import argparse
from typing import Dict
import numpy as np
import joblib
from kserve import Model, ModelServer, model_server, InferRequest, InferOutput, InferResponse
from kserve.utils.utils import generate_uuid


class MyModel(Model):

    def __init__(self, name: str):
        super().__init__(name)
        self.name = name
        self.model = None
        self.ready = False
        self.load()

    def load(self):
        # Load feature scaler and trained model
        self.scaler = joblib.load("/app/saved_model/scaler.joblib")
        self.model = joblib.load("/app/saved_model/model.joblib")
        self.ready = True

    def preprocess(self, payload: InferRequest, **kwargs) -> np.ndarray:
        # Scale input features
        infer_input = payload.inputs[0]
        raw_data = np.array(infer_input.data)
        scaled_data = self.scaler.transform(raw_data)
        return scaled_data

    def predict(self, data: np.ndarray, **kwargs) -> InferResponse:
        # Model prediction on scaled features
        result = self.model.predict(data)
        response_id = generate_uuid()
        infer_output = InferOutput(name="output-0", shape=list(result.shape), datatype="FP32", data=result)
        infer_response = InferResponse(model_name=self.name, infer_outputs=[infer_output], response_id=response_id)
        return infer_response

    # def postprocess(self, payload, **kwargs):
    #     # Optionally postprocess payload
    #     return payload


parser = argparse.ArgumentParser(parents=[model_server.parser])
parser.add_argument(
    "--model_name",
    default="model",
    help="The name that the model is served under."
)
args, _ = parser.parse_known_args()

if __name__ == "__main__":
    model = MyModel(args.model_name)
    ModelServer().start([model])

2.2 Create a Dockerfile

In the deployment folder, create a Dockerfile that wraps the following:

  • Inference dependencies
  • Model definition code (and any auxiliary utils)
  • Trained model file
  • The main.py file that describes how to use the model for inference using Kserve

Here is an example Dockerfile that you can use for inspiration:

Tips

Make sure you use an ENTRYPOINT as shown below (not a CMD).

FROM --platform=linux/amd64 python:3.9.18-slim

WORKDIR /app

# Dependencies
COPY ./serve-requirements.txt .
RUN pip install --no-cache-dir  -r serve-requirements.txt

# Make custom src code visible
COPY ./src /app/src
ENV PYTHONPATH "${PYTHONPATH}:/app"

# Trained model and definition with main script
COPY ./saved_model /app/saved_model
COPY ./main.py /app/main.py

# Set entrypoint
ENTRYPOINT ["python", "-m", "main"]

3. (Optional) Test containerised inference service locally

Note

All commands below are run from the deployment directory.

  1. Build your model container locally by running
docker build -t local/highwind/my-model .
  1. Create a docker-compose.yaml in the deployment directory that will spin up your container for local testing. It should look similar to this:

Note

Note the injection of the model_name argument

version: "3.9"
services:
  my_model_util:
    container_name: my_model_util
    image: local/highwind/my-model:latest
    command: --model_name=model
    working_dir: /app
    ports:
      - "8080:8080"
  1. Now you can spin up your model container locally by running
docker compose up -d
  1. Once your model container is running locally, send it a payload for testing.

If you have not yet created an example payload, create one called input.json in the `deployment folder.

Tips

Your example payload must conform to the Kserve v2 inference protocol.

Here is an example payload that conforms to the Kserve v2 inference protocol:

{
    "inputs": [
        {
            "name": "input-0",
            "shape": [1],
            "datatype": "BYTES",
            "parameters": null,
            "data": [[0.02, 0.49, 0.99, 0.41]]
        }
    ]
}

Now that you have an example payload ready, you can send it to your model using a POST request by either:

  • Option 1: From a terminal, run

    curl -X POST http://localhost:8080/v2/models/model/infer -H 'Content-Type: application/json' -d @./input.json
    
  • Option 2: Using Postman, create a raw JSON POST request with the body defined in input.json and send it to http://localhost:8080/v2/models/model/infer

If you don't get any errors and you get back a response that makes sense for your model, you can continue to the next step and deploy onto Highwind.

4. Deploy model onto Highwind

Follow the steps below to deploy the containerised model onto Highwind:

Start by referring to the Zindi documentation on the Deploy page

  • Within this page, navigate to the section titled How to Deploy on Highwind
  • To deploy a model on Highwind we start by creating a new Asset

1. Create a new Asset

  • While on the Deploy page, locate the dropdown titled 1.1 Create an Asset, followed by 1.2 Push your Docker image

  • Follow the instructions provided in the dropdown menus to complete the process of creating an Asset.

    Tips

    SelectKServe Predictor for the image type

  • Proceed to Create a new Use Case

2. Create a new Use Case

  • While on the Deploy page, locate the dropdown titled 2.1 Create a Use Case, followed by 2.2 Link the Assets

  • Follow the instructions provided in the dropdown menus to complete the process of creating a new Use Case.

  • Once done, proceed to the next step Deploy a Use Case

3. Deploy a Use Case

  • While on the Deploy page, locate the dropdown titled 3.1 Deploy the Use Case.

  • Follow the instructions provided in the dropdown menu to complete the process of deploying a new Use Case.

🥳 Congratulations! If you have reached this point, you have successfully deployed a custom model onto Highwind.

You can now test your inference endpoint through the Highwind User Interface by passing the example payload (in deployment/input.json) you created earlier.