Skip to content

Session Affinity

Scenario: Canary rollout with session affinity

Session affinity ensures that the version to which a particular user's request is routed remains consistent throughout the duration of the experiment. In this tutorial, you will use an experiment involving two user groups, 1 and 2. Reqeusts from user group 1 will have a userhash header value prefixed with 111 and will be routed to the baseline version. Requests from user group 2 will have a userhash header value prefixed with 101 and will be routed to the candidate version. The experiment is shown below.

Session affinity

Platform setup

Follow these steps to install Iter8, KFServing and Prometheus in your K8s cluster.

1. Create ML model versions

Deploy two KFServing inference services corresponding to two versions of a TensorFlow classification model, along with an Istio virtual service to split traffic between them.

kubectl apply -f $ITER8/samples/kfserving/quickstart/baseline.yaml
kubectl apply -f $ITER8/samples/kfserving/quickstart/candidate.yaml
kubectl apply -f $ITER8/samples/kfserving/session-affinity/routing-rule.yaml
kubectl wait --for=condition=Ready isvc/flowers -n ns-baseline
kubectl wait --for=condition=Ready isvc/flowers -n ns-candidate
Istio virtual service
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: routing-rule
  namespace: default
spec:
  gateways:
  - knative-serving/knative-ingress-gateway
  hosts:
  - example.com
  http:
  - match:
    - headers:
        userhash: # user hash is a 10-digit random binary string
          prefix: "101" # in expectation, 1/8th of user hashes will match this prefix
    route: # matching users will always go to v2
    - destination:
        host: flowers-predictor-default.ns-candidate.svc.cluster.local
      headers:
        request:
          set:
            Host: flowers-predictor-default.ns-candidate
        response:
          set:
            version: flowers-v2
  - route: # non-matching users will always go to v1
    - destination:
        host: flowers-predictor-default.ns-baseline.svc.cluster.local
      headers:
        request:
          set:
            Host: flowers-predictor-default.ns-baseline
        response:
          set:
            version: flowers-v1

2. Generate requests

Generate requests to your model as follows.

INGRESS_GATEWAY_SERVICE=$(kubectl get svc -n istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
kubectl port-forward -n istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80
curl -o /tmp/input.json https://raw.githubusercontent.com/kubeflow/kfserving/master/docs/samples/v1beta1/rollout/input.json
while true; do
curl -v -H "Host: example.com" -H "userhash: 1111100000" localhost:8080/v1/models/flowers:predict -d @/tmp/input.json
sleep 0.29
done
curl -o /tmp/input.json https://raw.githubusercontent.com/kubeflow/kfserving/master/docs/samples/v1beta1/rollout/input.json
while true; do
curl -v -H "Host: example.com" -H "userhash: 1010101010" localhost:8080/v1/models/flowers:predict -d @/tmp/input.json
sleep 2.0
done

3. Define metrics

Please follow Step 3 of the quick start tutorial.

4. Launch experiment

kubectl apply -f $ITER8/samples/kfserving/session-affinity/experiment.yaml
Look inside experiment.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
apiVersion: iter8.tools/v2alpha2
kind: Experiment
metadata:
  name: session-affinity-exp
spec:
  target: flowers
  strategy:
    testingPattern: A/B
    deploymentPattern: FixedSplit
    actions:
      # when the experiment completes, promote the winning version using kubectl apply
      finish:
      - task: common/exec
        with:
          cmd: /bin/bash
          args: [ "-c", "kubectl apply -f {{ .promote }}" ]
  criteria:
    requestCount: iter8-kfserving/request-count
    rewards: # Business rewards
    - metric: iter8-kfserving/user-engagement
      preferredDirection: High # maximize user engagement
    objectives:
    - metric: iter8-kfserving/mean-latency
      upperLimit: 2000
    - metric: iter8-kfserving/95th-percentile-tail-latency
      upperLimit: 5000
    - metric: iter8-kfserving/error-rate
      upperLimit: "0.01"
  duration:
    intervalSeconds: 10
    iterationsPerLoop: 10
  versionInfo:
    # information about model versions used in this experiment
    baseline:
      name: flowers-v1
      variables:
      - name: ns
        value: ns-baseline
      - name: promote
        value: https://raw.githubusercontent.com/iter8-tools/iter8/master/samples/kfserving/quickstart/promote-v1.yaml
    candidates:
    - name: flowers-v2
      variables:
      - name: ns
        value: ns-candidate
      - name: promote
        value: https://raw.githubusercontent.com/iter8-tools/iter8/master/samples/kfserving/quickstart/promote-v2.yaml

5. Observe experiment

Follow these steps to observe your experiment.

6. Cleanup

kubectl delete -f $ITER8/samples/kfserving/session-affinity/experiment.yaml
kubectl delete -f $ITER8/samples/kfserving/session-affinity/routing-rule.yaml
kubectl delete -f $ITER8/samples/kfserving/quickstart/candidate.yaml
kubectl delete -f $ITER8/samples/kfserving/quickstart/baseline.yaml
Back to top