Load test a KServe model (via HTTP)¶
This tutorial shows how easy it is to run a load test for KServe when using HTTP to make requests. We use a sklearn model to demonstrate. The same approach works for any model type.
Before you begin
- Ensure that you have the
kubectl
andhelm
CLIs installed. - Have access to a cluster running KServe. You can create a KServe Quickstart environment as follows: If using a local cluster (for example, Kind or Minikube), we recommend providing the cluster with at least 16GB of memory.
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.11/hack/quick_install.sh" | bash
- Have Grafana available. For example, Grafana can be installed on your cluster as follows:
kubectl create deploy grafana --image=grafana/grafana kubectl expose deploy grafana --port=3000
Install the Iter8 controller¶
Iter8 can be installed and configured to watch resources either in a single namespace (namespace-scoped) or in the whole cluster (cluster-scoped).
helm install --repo https://iter8-tools.github.io/iter8 --version 0.18 iter8 controller
helm install --repo https://iter8-tools.github.io/iter8 --version 0.18 iter8 controller \
--set clusterScoped=true
For additional install options, see Iter8 Installation.
Deploy an InferenceService¶
Create an InferenceService which exposes an HTTP port. The following serves the sklearn irisv2 model:
cat <<EOF | kubectl apply -f -
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-irisv2"
spec:
predictor:
model:
modelFormat:
name: sklearn
runtime: kserve-mlserver
storageUri: "gs://seldon-models/sklearn/mms/lr_model"
EOF
Launch performance test¶
helm upgrade --install \
--repo https://iter8-tools.github.io/iter8 --version 0.18 model-test iter8 \
--set "tasks={ready,http}" \
--set ready.isvc=sklearn-irisv2 \
--set ready.timeout=180s \
--set http.url=http://sklearn-irisv2.default.svc.cluster.local/v2/models/sklearn-irisv2/infer \
--set http.payloadURL=https://gist.githubusercontent.com/kalantar/d2dd03e8ebff2c57c3cfa992b44a54ad/raw/97a0480d0dfb1deef56af73a0dd31c80dc9b71f4/sklearn-irisv2-input.json \
--set http.contentType="application/json"
About this performance test
This performance test consists of two tasks, namely, ready and http.
The ready task checks if the sklearn-irisv2
InferenceService exists and is Ready
.
The http task sends requests to the cluster-local HTTP service whose URL exposed by the InferenceService, http://sklearn-irisv2.default.svc.cluster.local/v2/models/sklearn-irisv2/infer
, and collects Iter8's built-in HTTP load test metrics.
View results using Grafana¶
Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port-forward requests as follows:
kubectl port-forward service/grafana 3000:3000
Open Grafana in a browser by going to http://localhost:3000 and login. The default username/password are admin
/admin
.
Add a JSON API data source model-test
with the following parameters:
- URL:
http://iter8.default:8080/httpDashboard
- Query string:
namespace=default&test=model-test
Create a new dashboard by import. Paste the contents of the http
Grafana dashboard into the text box and load it. Associate it with the JSON API data source defined above.
The Iter8 dashboard will look like the following:
Cleanup¶
helm delete model-test
kubectl delete inferenceservice sklearn-irisv2
Uninstall the Iter8 controller¶
helm delete iter8
For additional uninstall options, see Iter8 Uninstall.
If you installed Grafana, you can delete it as follows:
kubectl delete svc/grafana deploy/grafana