Load test a KServe model (via gRPC)¶
This tutorial shows how easy it is to run a load test for KServe when using gRPC to make requests. We use a sklearn model to demonstrate. The same approach works for any model type.
Before you begin
- Try Your first performance test. Understand the main concepts behind Iter8.
- Ensure that you have the kubectl and
helm
CLIs. - Have access to a cluster running KServe. You can create a KServe Quickstart environment as follows:
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.11/hack/quick_install.sh" | bash
- Have Grafana available. For example, Grafana can be installed on your cluster as follows:
kubectl create deploy grafana --image=grafana/grafana kubectl expose deploy grafana --port=3000
Install the Iter8 controller¶
helm install --repo https://iter8-tools.github.io/iter8 --version 0.1.12 iter8 controller
helm install --repo https://iter8-tools.github.io/iter8 --version 0.1.12 iter8 controller \
--set clusterScoped=true
kubectl apply -k 'https://github.com/iter8-tools/iter8.git/kustomize/controller/namespaceScoped?ref=v0.17.1'
kubectl apply -k 'https://github.com/iter8-tools/iter8.git/kustomize/controller/clusterScoped?ref=v0.17.1'
Deploy an InferenceService¶
Create an InferenceService which exposes a gRPC port. The following serves the sklearn irisv2 model:
cat <<EOF | kubectl create -f -
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-irisv2"
spec:
predictor:
model:
modelFormat:
name: sklearn
runtime: kserve-mlserver
protocolVersion: v2
storageUri: "gs://seldon-models/sklearn/mms/lr_model"
ports:
- containerPort: 9000
name: h2c
protocol: TCP
EOF
Launch performance test¶
GRPC_HOST=$(kubectl get isvc sklearn-irisv2 -o jsonpath='{.status.components.predictor.address.url}' | sed 's#.*//##')
GRPC_PORT=80
helm upgrade --install \
--repo https://iter8-tools.github.io/iter8 --version 0.17 model-test iter8 \
--set "tasks={ready,grpc}" \
--set ready.isvc=sklearn-irisv2 \
--set ready.timeout=180s \
--set grpc.protoURL=https://raw.githubusercontent.com/kserve/kserve/master/docs/predict-api/v2/grpc_predict_v2.proto \
--set grpc.host=${GRPC_HOST}:${GRPC_PORT} \
--set grpc.call=inference.GRPCInferenceService.ModelInfer \
--set grpc.dataURL=https://gist.githubusercontent.com/kalantar/6e9eaa03cad8f4e86b20eeb712efef45/raw/56496ed5fa9078b8c9cdad590d275ab93beaaee4/sklearn-irisv2-input-grpc.json
About this performance test
This performance test consists of two tasks, namely, ready and grpc.
The ready task checks if the sklearn-irisv2
InferenceService exists and is Ready
.
The grpc task sends call requests to the inference.GRPCInferenceService.ModelInfer
method of the cluster-local gRPC service with host address ${GRPC_HOST}:${GRPC_PORT}
, and collects Iter8's built-in gRPC load test metrics.
View results using Grafana¶
Inspect the metrics using Grafana. If Grafana is deployed to your cluster, port-forward requests as follows:
kubectl port-forward service/grafana 3000:3000
Open Grafana by going to http://localhost:3000.
Add a JSON API data source model-test
with the following parameters:
- URL:
http://iter8.default:8080/grpcDashboard
- Query string:
namespace=default&test=model-test
Create a new dashboard by import. Paste the contents of the grpc
Grafana dashboard into the text box and load it. Associate it with the JSON API data source defined above.
The Iter8 dashboard will look like the following:
Cleanup¶
helm delete model-test
kubectl delete inferenceservice sklearn-irisv2
Uninstall the Iter8 controller¶
helm delete iter8
kubectl delete -k 'https://github.com/iter8-tools/iter8.git/kustomize/controller/namespaceScoped?ref=v0.17.1'
kubectl delete -k 'https://github.com/iter8-tools/iter8.git/kustomize/controller/clusterScoped?ref=v0.17.1'
Some variations and extensions of this performance test
- The grpc task can be configured with load related parameters such as the number of requests, requests per second, or number of concurrent connections.