Skip to content

Continuous Validation

This tutorial shows how easy it is validate SLOs for a single model in KServe when fetching metrics from a metrics database like Prometheus. We show this using the sklearn-iris model used for a first InferenceService in the KServe documentation.

Before you begin
  1. Try your first experiment. Understand the main concepts behind Iter8 experiments.
  2. Ensure that you have the kubectl CLI.
  3. Have access to a cluster running KServe. You can create a KServe Quickstart environment as follows:
    curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.10/hack/quick_install.sh" | bash
    
  4. Install Prometheus monitoring for KServe using these instructions.

Experiment Setup

Deploy the model and generate load against it. We follow the instructions for the KServe first InferenceService to deploy the model.

Create InferenceService for Initial Model

kubectl apply -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-iris"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF

Generate Load

Port forward requests to the cluster:

INGRESS_GATEWAY=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
kubectl port-forward --namespace istio-system svc/$INGRESS_GATEWAY 8080:80

Send prediction requests to the inference service. The following script generates about one request a second. In a production cluster, this step is not required since your inference service will receive requests from real users.

SERVICE_HOSTNAME="sklearn-iris.default.example.com"
# kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3


cat <<EOF > "./iris-input.json"
{
  "instances": [
    [6.8,  2.8,  4.8,  1.4],
    [6.0,  3.4,  4.5,  1.6]
  ]
}
EOF

while true; do 
  curl -H "Host: ${SERVICE_HOSTNAME}" \
    http://localhost:8080/v1/models/sklearn-iris:predict \
    -d @./iris-input.json
  sleep 1
done

Launch Iter8 Experiment

Launch the Iter8 experiment inside the Kubernetes cluster:

iter8 k launch \
--set "tasks={ready,custommetrics,assess}" \
--set ready.isvc=sklearn-iris \
--set ready.timeout=180s \
--set custommetrics.templates.kserve-prometheus="https://gist.githubusercontent.com/kalantar/adc6c9b0efe483c00b8f0c20605ac36c/raw/c4562e87b7ac0652b0e46f8f494d024307bff7a1/kserve-prometheus.tpl" \
--set custommetrics.values.labels.service_name=sklearn-iris-predictor-default \
--set "custommetrics.values.latencyPercentiles={50,75,90,95}" \
--set assess.SLOs.upper.kserve-prometheus/error-count=0 \
--set assess.SLOs.upper.kserve-prometheus/latency-mean=25 \
--set assess.SLOs.upper.kserve-prometheus/latency-p'90'=40 \
--set runner=cronjob \
--set cronjobSchedule="*/1 * * * *"
About this experiment

This experiment consists of three tasks, namely, ready, custommetrics and assess.

The ready task checks if the sklearn-iris InferenceService exists and is Ready.

The custommetrics task checks reads metrics from a Prometheus service as defined by the template.

The assess task verifies if the model satisfies the specified SLOs:

  • there are no errors
  • the mean latency of the prediction does not exceed 25 msec, and
  • the 90th percentile latency for prediction does not exceed 40 msec.

This is a multi-loop Kubernetes experiment; its runner is cronjob. The cronjobSchedule expression specifies the frequency of the experiment execution -- periodically refreshing the metric values and performing SLO validation using the updated values.


You can assert experiment outcomes, view an experiment report, and view experiment logs as described in your first experiment.

Clean up

To clean up, delete the Iter8 experiment:

iter8 k delete

Remove the InferenceService and the request data:

kubectl delete inferenceservice sklearn-iris
rm ./iris-input.json

You can remove Prometheus using these instructions.