Continuous Validation¶
This tutorial shows how easy it is validate SLOs for a single model in KServe when fetching metrics from a metrics database like Prometheus. We show this using the sklearn-iris
model used for a first InferenceService
in the KServe documentation.
Before you begin
- Try your first experiment. Understand the main concepts behind Iter8 experiments.
- Ensure that you have the kubectl CLI.
- Have access to a cluster running KServe. You can create a KServe Quickstart environment as follows:
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.10/hack/quick_install.sh" | bash
- Install Prometheus monitoring for KServe using these instructions.
Experiment Setup¶
Deploy the model and generate load against it. We follow the instructions for the KServe first InferenceService to deploy the model.
Create InferenceService for Initial Model¶
kubectl apply -f - <<EOF
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
EOF
Generate Load¶
Port forward requests to the cluster:
INGRESS_GATEWAY=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
kubectl port-forward --namespace istio-system svc/$INGRESS_GATEWAY 8080:80
Send prediction requests to the inference service. The following script generates about one request a second. In a production cluster, this step is not required since your inference service will receive requests from real users.
SERVICE_HOSTNAME="sklearn-iris.default.example.com"
# kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3
cat <<EOF > "./iris-input.json"
{
"instances": [
[6.8, 2.8, 4.8, 1.4],
[6.0, 3.4, 4.5, 1.6]
]
}
EOF
while true; do
curl -H "Host: ${SERVICE_HOSTNAME}" \
http://localhost:8080/v1/models/sklearn-iris:predict \
-d @./iris-input.json
sleep 1
done
Launch Iter8 Experiment¶
Launch the Iter8 experiment inside the Kubernetes cluster:
iter8 k launch \
--set "tasks={ready,custommetrics,assess}" \
--set ready.isvc=sklearn-iris \
--set ready.timeout=180s \
--set custommetrics.templates.kserve-prometheus="https://gist.githubusercontent.com/kalantar/adc6c9b0efe483c00b8f0c20605ac36c/raw/c4562e87b7ac0652b0e46f8f494d024307bff7a1/kserve-prometheus.tpl" \
--set custommetrics.values.labels.service_name=sklearn-iris-predictor-default \
--set "custommetrics.values.latencyPercentiles={50,75,90,95}" \
--set assess.SLOs.upper.kserve-prometheus/error-count=0 \
--set assess.SLOs.upper.kserve-prometheus/latency-mean=25 \
--set assess.SLOs.upper.kserve-prometheus/latency-p'90'=40 \
--set runner=cronjob \
--set cronjobSchedule="*/1 * * * *"
About this experiment
This experiment consists of three tasks, namely, ready, custommetrics and assess.
The ready task checks if the sklearn-iris
InferenceService exists and is Ready
.
The custommetrics task checks reads metrics from a Prometheus service as defined by the template.
The assess task verifies if the model satisfies the specified SLOs:
- there are no errors
- the mean latency of the prediction does not exceed 25 msec, and
- the 90th percentile latency for prediction does not exceed 40 msec.
This is a multi-loop Kubernetes experiment; its runner is cronjob
. The cronjobSchedule
expression specifies the frequency of the experiment execution -- periodically refreshing the metric values and performing SLO validation using the updated values.
You can assert experiment outcomes, view an experiment report, and view experiment logs as described in your first experiment.
Clean up¶
To clean up, delete the Iter8 experiment:
iter8 k delete
Remove the InferenceService
and the request data:
kubectl delete inferenceservice sklearn-iris
rm ./iris-input.json
You can remove Prometheus using these instructions.