Canary Testing¶
This tutorial shows how easy it is validate SLOs for multiple versions of a model in KServe when fetching metrics from a metrics database like Prometheus. We show this using the sklearn-iris
model used to describe canary rollouts in KServe.
Before you begin
- Try your first experiment. Understand the main concepts behind Iter8 experiments.
- Ensure that you have the kubectl CLI.
- Have access to a cluster running KServe. You can create a KServe Quickstart environment as follows:
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.10/hack/quick_install.sh" | bash
- Install Prometheus monitoring for KServe using these instructions.
Experiment Setup¶
Deploy two models to compare and generate load against them. We follow the instructions for the KServe canary rollout example to deploy the models.
Create InferenceService for Initial Model¶
kubectl apply -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF
Update InferenceService with a Canary Model¶
kubectl apply -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
predictor:
canaryTrafficPercent: 10
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model-2"
EOF
Generate Load¶
Port forward requests to the cluster:
INGRESS_GATEWAY=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
kubectl port-forward --namespace istio-system svc/$INGRESS_GATEWAY 8080:80
Send prediction requests to the inference service. The following script generates about one request a second. In a production cluster, this step is not required since your inference service will receive requests from real users.
SERVICE_HOSTNAME="sklearn-iris.default.example.com"
# kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3
cat <<EOF > "./iris-input.json"
{
"instances": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}
EOF
while true; do
curl -H "Host: ${SERVICE_HOSTNAME}" \
http://localhost:8080/v1/models/sklearn-iris:predict \
-d @./iris-input.json
sleep 1
done
Launch Iter8 Experiment¶
Launch an Iter8 experiment inside the Kubernetes cluster:
iter8 k launch \
--set "tasks={ready,custommetrics,assess}" \
--set ready.isvc=sklearn-iris \
--set ready.timeout=180s \
--set custommetrics.templates.kserve-prometheus="https://gist.githubusercontent.com/kalantar/adc6c9b0efe483c00b8f0c20605ac36c/raw/c4562e87b7ac0652b0e46f8f494d024307bff7a1/kserve-prometheus.tpl" \
--set custommetrics.values.labels.service_name=sklearn-iris-predictor-default \
--set 'custommetrics.versionValues[0].labels.revision_name=sklearn-iris-predictor-default-00002' \
--set 'custommetrics.versionValues[1].labels.revision_name=sklearn-iris-predictor-default-00001' \
--set "custommetrics.values.latencyPercentiles={50,75,90,95}" \
--set assess.SLOs.upper.kserve-prometheus/error-count=0 \
--set assess.SLOs.upper.kserve-prometheus/latency-mean=25 \
--set assess.SLOs.upper.kserve-prometheus/latency-p'90'=40 \
--set runner=cronjob \
--set cronjobSchedule="*/1 * * * *"
About this experiment
This experiment consists of three tasks, namely, ready, custommetrics and assess.
The ready task checks if the sklearn-iris
InferenceService exists and is Ready
.
The custommetrics task reads metrics from a Prometheus service as defined by the template. The template is parameterised using labels for service and revision name. You can identify the revision names from the InferenceService
:
kubectl get isvc sklearn-iris -o json \
| jq -r '.status.components.predictor.traffic | .[] | .revisionName'
-ddddd
). The assess task verifies if the model satisfies the specified SLOs:
- there are no errors
- the mean latency of the prediction does not exceed 25 msec, and
- the 90th percentile latency for prediction does not exceed 40 msec.
This is a multi-loop Kubernetes experiment; its runner is cronjob
. The cronjobSchedule
expression specifies the frequency of the experiment execution -- periodically refreshing the metric values and performing SLO validation using the updated values.
You can assert experiment outcomes, view an experiment report, and view experiment logs as described in your first experiment.
Clean up¶
To clean up, delete the Iter8 experiment:
iter8 k delete
Remove the InferenceService
and the request data:
kubectl delete inferenceservice sklearn-iris
rm ./iris-input.json
You can remove Prometheus using these instructions.