4. Submit
Submission is the final step where your model is officially entered into the competition for scoring and ranking.
You must follow each step below carefully to ensure your solution is submitted correctly.
1. (Pre-requisite) Deploy on Highwind
Before you submit your solution on the Zindi website, it is essential to first deploy it on Highwind. Hereβs a clarification of the terms:
- Deploy on Highwind:
- Make your solution available for scoring by deploying it on Highwind.
- You can deploy multiple times a day to make sure your solution works!
- Submit on Zindi:
- Enter your deployed solution into the competition on Zindi to receive your score and see your ranking on the leaderboard.
- You can submit three times per day, so pick your best foot forward!
2. Submit on Zindi
To submit your solution, create a zip file containing the following artifacts:
.
βββ deployment
β βββ Dockerfile
β βββ requirements.txt (all python environment files)
β βββ main.py
βββ image_name.txt
βββ README.md
Make sure your submission is structured as demonstrated in the Highwind examples repository. This ensures compliance with the required format and facilitates the review and scoring process. Also ensure that the file names do not have spaces in them.
requirements.txt
does not have to be strictly named the same as the example. It should be the same was whatever it is named in your Dockerfile.
image_name.txt
The image_name.txt
file should reference the Image URI of the 1. (Pre-requisite) Deploy on Highwind step. If you are unsure of how to retrieve the Image URI, follow this guide.
For additional guidance and detailed steps on submitting to Zindi and grading your model, please refer to the following resource: Model Grading and Submitting to Zindi.
3. Leaderboard Update
Please be aware that it will take a few minutes, up to 15, for the leaderboard score to update as we are running the test set through your deployed model.
To understand your score better, please refer to the evaluation criteria page on the competition website.
Evaluation Details
This section discusses the following:
- How metrics are scaled to reflect the correct weighted importance
- How metrics are transformed when higher/lower is better
- How private weightings are re-scaled to sum to 1.0
- How to locally self evaluate your solutions before submitting it on Zindi
Metrics scaling
Because the raw metrics have different units and scales (e.g. latency in milliseconds vs. per cent CPU usage), they are each scaled to be between 0 and 1. This ensures the weighting assigned to each metric reflects its true importance.
The final public score \(s_{\text{pub}} \in [0,1]\) (which can be expressed as a percentage) is calculated as the following weighted sum:
where:
- \(w_i \in [0,1]\): weight of component \(j\)
- \(b \in [0,1]\): BLEU score (scaled)
- \(l \in [0,1]\): average single-sentence inference latency (scaled)
- \(m \in [0,1]\): peak memory usage during inference (scaled)
- \(c \in [0,1]\): average CPU usage during inference (scaled)
- \(i \in [0,1]\): inference Docker image size (scaled)
To scale each of these components, we apply the following transformation (sigmoid of the z-score):
where:
- \(x_t \in [0,1]\): is the scaled value of component \(x\)
- \(x_a \in \mathbb{R}\): is the actual raw measured value of component \(x\)
- \(z(x_a) = \frac{x_a - \hat\mu_x}{\hat\sigma_x} \in \mathbb{R}\): is the z-score of the measured component \(x_a\)
- \(\hat\mu_x \in \mathbb{R}\): is our estimate of the mean value of component measurements \(x_a\) of all submissions
- \(\hat\sigma_x \in \mathbb{R}\): is our estimate of the standard deviation of component measurements \(x_a\) of all submissions (dividing by this creates a unitless metric for each component)
Metrics Transformation
Since we perform a weighted sum to get the final score, each component needs to contribute (not detract from) the score. This means we have to convert all components to align with the βhigher is betterβ direction.
Any metrics that are already like this, can be used as is. However, for metrics like inference latency, where lower is better, we can transform its scaled value as follows:
where:
\(x_d \in [0,1]\): is the final direction-adjusted value of the scaled component \(x_t\)
Metrics re-scaling
The previous weighted sum equation shows how the public score is calculated. When we calculate the private score using additional metrics like code quality, the total weight of all components exceeds \(1.0\). In this case, we simply re-scale each component so that the total weight is \(1.0\) and perform a similar weighted sum. This means that you can interpret the weights in the table of criteria as relative weights.
Local self-assessment
You can independently evaluate your solutions locally before submitting them. Follow these steps:
Prerequisites:
The following is required information to be determined or estimated for your model:
- Model's BLEU score
- Model's single-sentence inference latency (in milliseconds)
- Model's memory usage during inference as a fraction of the total 2GB limit (e.g. 1GB = 0.5)
- Model's single-core CPU usage during inference as a fraction (e.g. 75% = 0.75)
- The inference container's docker image size in GB
- The main inference script's code quality as measured by pylint
-
Pull the docker image that contains all the self-evaluation app and give it a name
For MacOS or Linux users:
export ZINDI_SELF_ASSESS_IMAGE="melioconsulting/zindi-self-assess:latest" docker pull $ZINDI_SELF_ASSESS_IMAGE docker tag $ZINDI_SELF_ASSESS_IMAGE zindi-self-assess:latest
For Windows PowerShell users:
$env:ZINDI_SELF_ASSESS_IMAGE = "melioconsulting/zindi-self-assess:latest" docker pull $env:ZINDI_SELF_ASSESS_IMAGE docker tag $env:ZINDI_SELF_ASSESS_IMAGE zindi-self-assess:latest
-
Run the container so that the assessment app can calculate your scores
This exposes port
8000
where the self-assessment app will be listening for requests.docker run --rm -p 8000:8000 zindi-self-assess:latest
-
Send your measurements/estimates to the app's
evaluate
endpoint with a POST request and check the public score that gets returned. In this example we usecurl
to send the payload. You can also use alternative tools like Postman.Note that this is only an estimate of your score and does not reflect the actual public/private score on the leaderboard.
Once the app is running, you can also check the the docs of how to use it at
http://localhost:8000/docs
For MacOS or Linux users:
curl -X 'POST' 'http://localhost:8000/evaluate/' -H 'Content-Type: application/json' -d '{"bleu": 35,"latency": 500,"memory": 0.7,"cpu": 0.7,"image_size": 3,"code_quality": 0.6}'
For Windows PowerShell users:
π‘ Copy and paste commands separately. This will give you a nicely formatted JSON output.
$body = @{ "bleu" = 35 "latency" = 500 "memory" = 0.7 "cpu" = 0.7 "image_size" = 3 "code_quality" = 0.6 } | ConvertTo-Json
$response = Invoke-WebRequest -Uri 'http://localhost:8000/evaluate/' -Method POST -Body $body -ContentType 'application/json' $responseObject = $response.Content | ConvertFrom-Json $responseObject | ConvertTo-Json -Depth 10
Zindi Rules
Here are the important submission rules:
- Each team is limited to a maximum of 10 submissions for testing.
- This is an MLOps competition, and we expect participants to follow good experiment tracking methodologies.
- Your final submission will be the one scored, so be sure to keep your last submission saved. Remember to resubmit your best work as your final submission to put your best foot forward.