Deploy custom model
1. Overview
This tutorial will guide you through deploying a custom model onto Highwind.
Tutorial steps
- To do this, we will containerise an existing trained model using a
Docker Repo
asset - Then we will perform some local testing to prove to ourselves that our containerised model works as expected
- Finally, we will deploy our model onto Highwind
Prerequisites
Note
This tutorial assumes that you have finished your initial experimentation phase in which you have established your data preprocessing and model training processes in something like a jupyter notebook environment. Continue if you have a model that you want to deploy for inference.
- You need to have followed the Getting Started page
- You need a trained model that you have saved to disk
- You require a small input sample that you can give to your model for inference
- We suggest creating a new folder in your project called
deployment
that will contain your deployment resources
Notes
This tutorial is based on the following project folder structure. This is not a strict requirement but it would help to structure your project in a similar way (at least to follow along with this tutorial):
.
├── deployment # Deployment resources
├── notebooks # Initial experimentation notebooks
├── saved_model # Saved trained model
└── src # Auxiliary code
2. Containerise your model
2.1 Create a main.py
file
In the deployment
folder, create a main.py
file which describes how to use the model for inference using Kserve.
This is where you capture the details of how your model integrates with the Kserve API (required for serving models on Highwind). This includes details of accepting input payloads, processing them, and using them as features to make predictions. This process mainly involves putting the relevant sections of your existing inference code in the correct place so that your model will work on Kserve.
Note
Your custom model class in main.py
will extend the kserve.Model
base class where several handlers are defined: load
, preprocess
, predict
and postprocess
. These handlers are executed in sequence. The output of preprocess
is passed to predict
as the input, the predict
handler executes the inference for your model, the postprocess
handler then turns the raw prediction result into user-friendly inference response. The load
handler is used for writing custom code to load your model into the memory from local file system or remote model storage, a general good practice is to call the load
handler in the model server class __init__
function, so your model is loaded on startup and ready to serve prediction requests. For more information on these Kserve handlers, please visit the Kserve docs.
- In your
main.py
file, create a new class that extends thekserve.Model
base class and update the following methods:load()
: How to load your trained model from diskpreprocess()
: Any preprocessing to be done before calling predictpredict()
: The inference procedurepostprocess()
: Any post-processing to do after calling predict
Here is an example main.py
file that you can use for inspiration:
Tips
You must accept an argument named model_name
in your main.py
script as shown below.
import argparse
from typing import Dict
import numpy as np
import joblib
from kserve import Model, ModelServer, model_server, InferRequest, InferOutput, InferResponse
from kserve.utils.utils import generate_uuid
class MyModel(Model):
def __init__(self, name: str):
super().__init__(name)
self.name = name
self.model = None
self.ready = False
self.load()
def load(self):
# Load feature scaler and trained model
self.scaler = joblib.load("/app/saved_model/scaler.joblib")
self.model = joblib.load("/app/saved_model/model.joblib")
self.ready = True
def preprocess(self, payload: InferRequest, **kwargs) -> np.ndarray:
# Scale input features
infer_input = payload.inputs[0]
raw_data = np.array(infer_input.data)
scaled_data = self.scaler.transform(raw_data)
return scaled_data
def predict(self, data: np.ndarray, **kwargs) -> InferResponse:
# Model prediction on scaled features
result = self.model.predict(data)
response_id = generate_uuid()
infer_output = InferOutput(name="output-0", shape=list(result.shape), datatype="FP32", data=result)
infer_response = InferResponse(model_name=self.name, infer_outputs=[infer_output], response_id=response_id)
return infer_response
# def postprocess(self, payload, **kwargs):
# # Optionally postprocess payload
# return payload
parser = argparse.ArgumentParser(parents=[model_server.parser])
parser.add_argument(
"--model_name",
default="model",
help="The name that the model is served under."
)
args, _ = parser.parse_known_args()
if __name__ == "__main__":
model = MyModel(args.model_name)
ModelServer().start([model])
2.2 Create a Dockerfile
In the deployment
folder, create a Dockerfile
that wraps the following:
- Inference dependencies
- Model definition code (and any auxiliary utils)
- Trained model file
- The
main.py
file that describes how to use the model for inference using Kserve
Here is an example Dockerfile
that you can use for inspiration:
Tips
Make sure you use an ENTRYPOINT
as shown below (not a CMD
).
FROM --platform=linux/amd64 python:3.9.18-slim
WORKDIR /app
# Dependencies
COPY ./serve-requirements.txt .
RUN pip install --no-cache-dir -r serve-requirements.txt
# Make custom src code visible
COPY ./src /app/src
ENV PYTHONPATH "${PYTHONPATH}:/app"
# Trained model and definition with main script
COPY ./saved_model /app/saved_model
COPY ./main.py /app/main.py
# Set entrypoint
ENTRYPOINT ["python", "-m", "main"]
3. (Optional) Test containerised inference service locally
Note
All commands below are run from the deployment
directory.
- Build your model container locally by running
docker build -t local/highwind/my-model .
- Create a
docker-compose.yaml
in thedeployment
directory that will spin up your container for local testing. It should look similar to this:
Note
Note the injection of the model_name
argument
version: "3.9"
services:
my_model_util:
container_name: my_model_util
image: local/highwind/my-model:latest
command: --model_name=model
working_dir: /app
ports:
- "8080:8080"
- Now you can spin up your model container locally by running
docker compose up -d
- Once your model container is running locally, send it a payload for testing.
If you have not yet created an example payload, create one called input.json
in the `deployment folder.
Tips
Your example payload must conform to the Kserve v2 inference protocol.
Here is an example payload that conforms to the Kserve v2 inference protocol:
{
"inputs": [
{
"name": "input-0",
"shape": [1],
"datatype": "BYTES",
"parameters": null,
"data": [[0.02, 0.49, 0.99, 0.41]]
}
]
}
Now that you have an example payload ready, you can send it to your model using a POST request by either:
-
Option 1: From a terminal, run
curl -X POST http://localhost:8080/v2/models/model/infer -H 'Content-Type: application/json' -d @./input.json
-
Option 2: Using Postman, create a raw JSON POST request with the body defined in
input.json
and send it tohttp://localhost:8080/v2/models/model/infer
If you don't get any errors and you get back a response that makes sense for your model, you can continue to the next step and deploy onto Highwind.
4. Deploy model onto Highwind
Follow the steps below to deploy the containerised model onto Highwind:
Start by referring to the Zindi documentation on the Deploy page
- Within this page, navigate to the section titled
How to Deploy on Highwind
- To deploy a model on Highwind we start by creating a new Asset
1. Create a new Asset
-
While on the Deploy page, locate the dropdown titled
1.1 Create an Asset
, followed by1.2 Push your Docker image
-
Follow the instructions provided in the dropdown menus to complete the process of creating an Asset.
Tips
Select
KServe Predictor
for the image type -
Proceed to
Create a new Use Case
2. Create a new Use Case
-
While on the Deploy page, locate the dropdown titled
2.1 Create a Use Case
, followed by2.2 Link the Assets
-
Follow the instructions provided in the dropdown menus to complete the process of creating a new Use Case.
-
Once done, proceed to the next step
Deploy a Use Case
3. Deploy a Use Case
-
While on the Deploy page, locate the dropdown titled
3.1 Deploy the Use Case
. -
Follow the instructions provided in the dropdown menu to complete the process of deploying a new Use Case.
🥳 Congratulations! If you have reached this point, you have successfully deployed a custom model onto Highwind.
You can now test your inference endpoint through the Highwind User Interface by passing the example payload (in deployment/input.json
) you created earlier.