Load Test a KServe Model (via gRPC)¶
This tutorial shows how easy it is to run a load test for KServe when using gRPC to make requests. We use a sklearn model to demonstrate. The same approach works for any model type.
Before you begin
- Try your first experiment. Understand the main concepts behind Iter8 experiments.
- Ensure that you have the kubectl CLI.
- Have access to a cluster running KServe. You can create a KServe Quickstart environment as follows:
curl -s "https://raw.githubusercontent.com/kserve/kserve/release-0.10/hack/quick_install.sh" | bash
Deploy an InferenceService¶
Create an InferenceService which exposes a gRPC port. The following serves the sklearn irisv2 model:
cat <<EOF | kubectl create -f -
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-irisv2"
spec:
predictor:
model:
modelFormat:
name: sklearn
runtime: kserve-mlserver
protocolVersion: v2
storageUri: "gs://seldon-models/sklearn/mms/lr_model"
ports:
- containerPort: 9000
name: h2c
protocol: TCP
EOF
Launch Experiment¶
Launch the Iter8 experiment inside the Kubernetes cluster:
GRPC_HOST=$(kubectl get isvc sklearn-irisv2 -o jsonpath='{.status.components.predictor.address.url}' | sed 's#.*//##')
GRPC_PORT=80
iter8 k launch \
--set "tasks={ready,grpc,assess}" \
--set ready.isvc=sklearn-irisv2 \
--set ready.timeout=180s \
--set grpc.protoURL=https://raw.githubusercontent.com/kserve/kserve/master/docs/predict-api/v2/grpc_predict_v2.proto \
--set grpc.host=${GRPC_HOST}:${GRPC_PORT} \
--set grpc.call=inference.GRPCInferenceService.ModelInfer \
--set grpc.dataURL=https://gist.githubusercontent.com/kalantar/6e9eaa03cad8f4e86b20eeb712efef45/raw/56496ed5fa9078b8c9cdad590d275ab93beaaee4/sklearn-irisv2-input-grpc.json \
--set assess.SLOs.upper.grpc/error-rate=0 \
--set assess.SLOs.upper.grpc/latency/mean=5000 \
--set assess.SLOs.upper.grpc/latency/p'97\.5'=7500 \
--set runner=job
About this experiment
This experiment consists of three tasks, namely, ready, grpc, and assess.
The ready task checks if the sklearn-irisv2
InferenceService exists and is Ready
.
The grpc task sends call requests to the inference.GRPCInferenceService.ModelInfer
method of the cluster-local gRPC service with host address ${GRPC_HOST}:${GRPC_PORT}
, and collects Iter8's built-in gRPC load test metrics.
The assess task verifies if the app satisfies the specified SLOs: i) there are no errors, ii) the mean latency of the service does not exceed 50 msec, and iii) the 97.5th percentile latency does not exceed 200 msec.
This is a single-loop Kubernetes experiment where all the previously mentioned tasks will run once and the experiment will finish. Hence, its runner value is set to job
.
You can assert experiment outcomes, view an experiment report, and view experiment logs as described in your first experiment.
Some variations and extensions of this experiment
- The grpc task can be configured with load related parameters such as the number of requests, requests per second, or number of concurrent connections.
- The assess task can be configured with SLOs for any of Iter8's built-in gRPC load test metrics.
Clean up¶
iter8 k delete
kubectl delete inferenceservice sklearn-irisv2