Load test a KServe model (via HTTP)¶

This tutorial shows how easy it is to run a load test for KServe when using HTTP to make requests. We use a sklearn model to demonstrate. The same approach works for any model type.

Before you begin

Try Your first performance test. Understand the main concepts behind Iter8 experiments.
Ensure that you have the kubectl CLI.

Have access to a cluster running KServe. You can create a KServe Quickstart environment as follows:

curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.11/hack/quick_install.sh" | bash

Have Grafana available. For example, Grafana can be installed on your cluster as follows:

kubectl create deploy grafana --image=grafana/grafana
kubectl expose deploy grafana --port=3000

Install Iter8 controller¶

HelmKustomize

Namespace scopedCluster scoped

helm install --repo https://iter8-tools.github.io/iter8 --version 0.1.11 iter8 controller

helm install --repo https://iter8-tools.github.io/iter8 --version 0.1.11 iter8 controller \
--set clusterScoped=true

Namespace scopedCluster scoped

kubectl apply -k 'https://github.com/iter8-tools/iter8.git/kustomize/controller/namespaceScoped?ref=v0.16.6'

kubectl apply -k 'https://github.com/iter8-tools/iter8.git/kustomize/controller/clusterScoped?ref=v0.16.6'

Deploy an InferenceService¶

Create an InferenceService which exposes an HTTP port. The following serves the sklearn irisv2 model:

cat <<EOF | kubectl apply -f -
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-irisv2"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      runtime: kserve-mlserver
      storageUri: "gs://seldon-models/sklearn/mms/lr_model"
EOF

Launch experiment¶

Launch an Iter8 experiment inside the Kubernetes cluster:

iter8 k launch \
--set "tasks={ready,http}" \
--set ready.isvc=sklearn-irisv2 \
--set ready.timeout=180s \
--set http.url=http://sklearn-irisv2.default.svc.cluster.local/v2/models/sklearn-irisv2/infer \
--set http.payloadURL=https://gist.githubusercontent.com/kalantar/d2dd03e8ebff2c57c3cfa992b44a54ad/raw/97a0480d0dfb1deef56af73a0dd31c80dc9b71f4/sklearn-irisv2-input.json \
--set http.contentType="application/json"

About this experiment

This experiment consists of two tasks, namely, ready and http.

The ready task checks if the sklearn-irisv2 InferenceService exists and is Ready.

The http task sends requests to the cluster-local HTTP service whose URL exposed by the InferenceService, http://sklearn-irisv2.default.svc.cluster.local/v2/models/sklearn-irisv2/infer, and collects Iter8's built-in HTTP load test metrics.

View results using Grafana¶

Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port-forward requests as follows:

kubectl port-forward service/grafana 3000:3000

Open Grafana by going to http://localhost:3000.

Add a JSON API data source Iter8 with the following parameters:

URL: http://iter8.default:8080/httpDashboard
Query string: namespace=default&experiment=default

Create a new dashboard by import. Paste the contents of the http Grafana dashboard into the text box and load it. Associate it with the JSON API data source defined above.

The Iter8 dashboard will look like the following:

Cleanup¶

iter8 k delete
kubectl delete inferenceservice sklearn-irisv2

Uninstall the Iter8 controller¶

HelmKustomize

helm delete iter8

Namespace scopedCluster scoped

kubectl delete -k 'https://github.com/iter8-tools/iter8.git/kustomize/controller/namespaceScoped?ref=v0.16.6'

kubectl delete -k 'https://github.com/iter8-tools/iter8.git/kustomize/controller/clusterScoped?ref=v0.16.6'

Some variations and extensions of this experiment

The http task can be configured with load related parameters such as the number of requests, queries per second, or number of parallel connections.
The http task can be configured to send various types of content as payload.