4. Submit

Submission is the final step where your model is officially entered into the competition for scoring and ranking.

You must follow each step below carefully to ensure your solution is submitted correctly.

1. (Pre-requisite) Deploy on Highwind

Before you submit your solution on the Zindi website, it is essential to first deploy it on Highwind. Here’s a clarification of the terms:

Deploy on Highwind:
- Make your solution available for scoring by deploying it on Highwind.
- You can deploy multiple times a day to make sure your solution works!
Submit on Zindi:
- Enter your deployed solution into the competition on Zindi to receive your score and see your ranking on the leaderboard.
- You can submit three times per day, so pick your best foot forward!

2. Submit on Zindi

To submit your solution, create a zip file containing the following artifacts:

.
├── deployment
│   ├── Dockerfile
│   ├── requirements.txt (all python environment files)
│   └── main.py
├── image_name.txt
└── README.md

Make sure your submission is structured as demonstrated in the Highwind examples repository. This ensures compliance with the required format and facilitates the review and scoring process. Also ensure that the file names do not have spaces in them.

requirements.txt does not have to be strictly named the same as the example. It should be the same was whatever it is named in your Dockerfile.

image_name.txt

The image_name.txt file should reference the Image URI of the 1. (Pre-requisite) Deploy on Highwind step. If you are unsure of how to retrieve the Image URI, follow this guide.

For additional guidance and detailed steps on submitting to Zindi and grading your model, please refer to the following resource: Model Grading and Submitting to Zindi.

3. Leaderboard Update

Please be aware that it will take a few minutes, up to 15, for the leaderboard score to update as we are running the test set through your deployed model.

To understand your score better, please refer to the evaluation criteria page on the competition website.

Evaluation Details

This section discusses the following:

How metrics are scaled to reflect the correct weighted importance
How metrics are transformed when higher/lower is better
How private weightings are re-scaled to sum to 1.0
How to locally self evaluate your solutions before submitting it on Zindi

Metrics scaling

Because the raw metrics have different units and scales (e.g. latency in milliseconds vs. per cent CPU usage), they are each scaled to be between 0 and 1. This ensures the weighting assigned to each metric reflects its true importance.

The final public score \(s_{\text{pub}} \in [0,1]\) (which can be expressed as a percentage) is calculated as the following weighted sum:

\[ s_{\text{pub}} = w_1 b + w_2 l + w_3 m + w_4 c + w_5 i \]

where:

\(w_i \in [0,1]\): weight of component \(j\)
\(b \in [0,1]\): BLEU score (scaled)
\(l \in [0,1]\): average single-sentence inference latency (scaled)
\(m \in [0,1]\): peak memory usage during inference (scaled)
\(c \in [0,1]\): average CPU usage during inference (scaled)
\(i \in [0,1]\): inference Docker image size (scaled)

To scale each of these components, we apply the following transformation (sigmoid of the z-score):

\[ x_t = \sigma_z(x_a) = \frac{1}{1+ e^{-(z(x_a))} } \]

where:

\(x_t \in [0,1]\): is the scaled value of component \(x\)
\(x_a \in \mathbb{R}\): is the actual raw measured value of component \(x\)
\(z(x_a) = \frac{x_a - \hat\mu_x}{\hat\sigma_x} \in \mathbb{R}\): is the z-score of the measured component \(x_a\)
\(\hat\mu_x \in \mathbb{R}\): is our estimate of the mean value of component measurements \(x_a\) of all submissions
\(\hat\sigma_x \in \mathbb{R}\): is our estimate of the standard deviation of component measurements \(x_a\) of all submissions (dividing by this creates a unitless metric for each component)

Metrics Transformation

Since we perform a weighted sum to get the final score, each component needs to contribute (not detract from) the score. This means we have to convert all components to align with the “higher is better” direction.

Any metrics that are already like this, can be used as is. However, for metrics like inference latency, where lower is better, we can transform its scaled value as follows:

\[ x_d = 1.0 -x_t \]

where:

\(x_d \in [0,1]\): is the final direction-adjusted value of the scaled component \(x_t\)

Metrics re-scaling

The previous weighted sum equation shows how the public score is calculated. When we calculate the private score using additional metrics like code quality, the total weight of all components exceeds \(1.0\). In this case, we simply re-scale each component so that the total weight is \(1.0\) and perform a similar weighted sum. This means that you can interpret the weights in the table of criteria as relative weights.

Local self-assessment

You can independently evaluate your solutions locally before submitting them. Follow these steps:

Prerequisites:

The following is required information to be determined or estimated for your model:

Model's BLEU score
Model's single-sentence inference latency (in milliseconds)
Model's memory usage during inference as a fraction of the total 2GB limit (e.g. 1GB = 0.5)
Model's single-core CPU usage during inference as a fraction (e.g. 75% = 0.75)
The inference container's docker image size in GB
The main inference script's code quality as measured by pylint

Pull the docker image that contains all the self-evaluation app and give it a name

For MacOS or Linux users:

export ZINDI_SELF_ASSESS_IMAGE="melioconsulting/zindi-self-assess:latest"
docker pull $ZINDI_SELF_ASSESS_IMAGE
docker tag $ZINDI_SELF_ASSESS_IMAGE zindi-self-assess:latest

For Windows PowerShell users:

$env:ZINDI_SELF_ASSESS_IMAGE = "melioconsulting/zindi-self-assess:latest"
docker pull $env:ZINDI_SELF_ASSESS_IMAGE
docker tag $env:ZINDI_SELF_ASSESS_IMAGE zindi-self-assess:latest

Run the container so that the assessment app can calculate your scores

This exposes port 8000 where the self-assessment app will be listening for requests.
```
docker run --rm -p 8000:8000 zindi-self-assess:latest
```

Send your measurements/estimates to the app's evaluate endpoint with a POST request and check the public score that gets returned. In this example we use curl to send the payload. You can also use alternative tools like Postman.

Note that this is only an estimate of your score and does not reflect the actual public/private score on the leaderboard.

Once the app is running, you can also check the the docs of how to use it at http://localhost:8000/docs

For MacOS or Linux users:

curl -X 'POST' 'http://localhost:8000/evaluate/' -H 'Content-Type: application/json' -d '{"bleu": 35,"latency": 500,"memory": 0.7,"cpu": 0.7,"image_size": 3,"code_quality": 0.6}'

For Windows PowerShell users:

💡 Copy and paste commands separately. This will give you a nicely formatted JSON output.

$body = @{
    "bleu" = 35
    "latency" = 500
    "memory" = 0.7
    "cpu" = 0.7
    "image_size" = 3
    "code_quality" = 0.6
} | ConvertTo-Json

$response = Invoke-WebRequest -Uri 'http://localhost:8000/evaluate/' -Method POST -Body $body -ContentType 'application/json'
$responseObject = $response.Content | ConvertFrom-Json
$responseObject | ConvertTo-Json -Depth 10

Zindi Rules

Here are the important submission rules:

Each team is limited to a maximum of 10 submissions for testing.
This is an MLOps competition, and we expect participants to follow good experiment tracking methodologies.
Your final submission will be the one scored, so be sure to keep your last submission saved. Remember to resubmit your best work as your final submission to put your best foot forward.

For the comprehensive rules, refer to the competition page