Canary Testing¶

This tutorial shows how easy it is validate SLOs for multiple versions of a model in KServe when fetching metrics from a metrics database like Prometheus. We show this using the sklearn-iris model used to describe canary rollouts in KServe.

Before you begin

Try your first experiment. Understand the main concepts behind Iter8 experiments.
Ensure that you have the kubectl CLI.

Have access to a cluster running KServe. You can create a KServe Quickstart environment as follows:

curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.10/hack/quick_install.sh" | bash

Install Prometheus monitoring for KServe using these instructions.

Experiment Setup¶

Deploy two models to compare and generate load against them. We follow the instructions for the KServe canary rollout example to deploy the models.

Create InferenceService for Initial Model¶

kubectl apply -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF

Update InferenceService with a Canary Model¶

kubectl apply -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    canaryTrafficPercent: 10
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model-2"
EOF

Generate Load¶

Port forward requests to the cluster:

INGRESS_GATEWAY=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
kubectl port-forward --namespace istio-system svc/$INGRESS_GATEWAY 8080:80

Send prediction requests to the inference service. The following script generates about one request a second. In a production cluster, this step is not required since your inference service will receive requests from real users.

SERVICE_HOSTNAME="sklearn-iris.default.example.com"
# kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3

cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF

while true; do 
  curl -H "Host: ${SERVICE_HOSTNAME}" \
    http://localhost:8080/v1/models/sklearn-iris:predict \
    -d @./iris-input.json
  sleep 1
done

Launch Iter8 Experiment¶

Launch an Iter8 experiment inside the Kubernetes cluster:

iter8 k launch \
--set "tasks={ready,custommetrics,assess}" \
--set ready.isvc=sklearn-iris \
--set ready.timeout=180s \
--set custommetrics.templates.kserve-prometheus="https://gist.githubusercontent.com/kalantar/adc6c9b0efe483c00b8f0c20605ac36c/raw/c4562e87b7ac0652b0e46f8f494d024307bff7a1/kserve-prometheus.tpl" \
--set custommetrics.values.labels.service_name=sklearn-iris-predictor-default \
--set 'custommetrics.versionValues[0].labels.revision_name=sklearn-iris-predictor-default-00002' \
--set 'custommetrics.versionValues[1].labels.revision_name=sklearn-iris-predictor-default-00001' \
--set "custommetrics.values.latencyPercentiles={50,75,90,95}" \
--set assess.SLOs.upper.kserve-prometheus/error-count=0 \
--set assess.SLOs.upper.kserve-prometheus/latency-mean=25 \
--set assess.SLOs.upper.kserve-prometheus/latency-p'90'=40 \
--set runner=cronjob \
--set cronjobSchedule="*/1 * * * *"

About this experiment

This experiment consists of three tasks, namely, ready, custommetrics and assess.

The ready task checks if the sklearn-iris InferenceService exists and is Ready.

The custommetrics task reads metrics from a Prometheus service as defined by the template. The template is parameterised using labels for service and revision name. You can identify the revision names from the InferenceService:

kubectl get isvc sklearn-iris -o json \
| jq -r '.status.components.predictor.traffic | .[] | .revisionName'

The service name is the prefix (remove the trailing -ddddd).

The assess task verifies if the model satisfies the specified SLOs:

there are no errors
the mean latency of the prediction does not exceed 25 msec, and
the 90^th percentile latency for prediction does not exceed 40 msec.

This is a multi-loop Kubernetes experiment; its runner is cronjob. The cronjobSchedule expression specifies the frequency of the experiment execution -- periodically refreshing the metric values and performing SLO validation using the updated values.

You can assert experiment outcomes, view an experiment report, and view experiment logs as described in your first experiment.

Clean up¶

To clean up, delete the Iter8 experiment:

iter8 k delete

Remove the InferenceService and the request data:

kubectl delete inferenceservice sklearn-iris
rm ./iris-input.json

You can remove Prometheus using these instructions.