Chaos Injection with SLOs¶
Perform a joint Iter8 and LitmusChaos experiment. This joint experiment enables you to verify if an app continues to be resilient (satisfies SLOs) in the midst of chaos (pod kill).
In the tutorial, the app consists of a Kubernetes service and deployment. The chaos experiment kills the app's pods intermittently. At the same time, the Iter8 experiment performs a load test of the app and validates its service-level objectives (SLOs).
Before you begin
- Try your first experiment. Understand the main concepts behind Iter8 experiments.
- Ensure that you have the kubectl CLI.
- Install Litmus in Kubernetes using these steps.
- Create the
httpbin
deployment file.cat <<EOF> deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: httpbin labels: app: httpbin spec: replicas: 1 selector: matchLabels: app: httpbin template: metadata: labels: app: httpbin spec: containers: - name: httpbin image: kennethreitz/httpbin ports: - containerPort: 80 initContainers: - name: init-myservice image: busybox:1.28 command: ['sh', '-c', 'sleep 1'] EOF
- Create the
httpbin
deployment.kubectl apply -f deploy.yaml
- Create the
httpbin
service.kubectl expose deploy httpbin --port=80
Launch experiments¶
Launch the LitmusChaos and Iter8 experiments as described below.
helm install httpbin litmuschaos \
--repo https://iter8-tools.github.io/hub/ \
--set applabel='app=httpbin' \
--set totalChaosDuration=3600 \
--set chaosInterval=5
About this LitmusChaos experiment
This is a LitmusChaos pod-delete experiment packaged for reusability in the form of a Helm chart. This experiment causes (forced/graceful) pod failure of specific/random replicas of application resources, in this case, pods with a label called app
with value httpbin
.
The deletion of pod(s) will be attempted by the chaos experiment once every chaosInterval
(5) seconds, and this experiment will terminate after totalChaosDuration
(3600) seconds.
iter8 k launch \
--set "tasks={ready,http,assess}" \
--set ready.deploy=httpbin \
--set ready.service=httpbin \
--set ready.chaosengine=litmuschaos-httpbin \
--set ready.timeout=60s \
--set http.url=http://httpbin.default/get \
--set http.duration=30s \
--set http.qps=20 \
--set assess.SLOs.upper.http/latency-mean=50 \
--set assess.SLOs.upper.http/latency-p99=100 \
--set assess.SLOs.upper.http/error-count=0 \
--set runner=job
About this Iter8 experiment
This Iter8 experiment is similar to your first Iter8 experiment with some notable changes. The ready
task in this experiment also checks if the chaosengine
resource exists before it starts, and in addition to the mean latency and error count SLOs, it verifies that the 99th percentile latency is under 100 msec.
Observe experiments¶
Observe the LitmusChaos and Iter8 experiments as follows. The chaos and Iter8 experiments
Verify that the phase of the chaos experiment is Running
.
kubectl get chaosresults/litmuschaos-httpbin-pod-delete -n default \
-o jsonpath='{.status.experimentStatus.phase}'
On completion of the LitmusChaos experiment
After the LitmusChaos experiment completes (in ~3600 sec), the phase of the experiment will change to Completed
. At that point, you can verify that the chaos experiment returns a Pass
verdict. The Pass
verdict states that the application is still running after chaos has ended.
When the LitmusChaos experiment is still running, its verdict will be set to Awaited
.
kubectl get chaosresults/litmuschaos-httpbin-pod-delete -n default \
-o jsonpath='{.status.experimentStatus.verdict}'
Due to chaos injection, and the fact that the number of replicas is set to 1, SLOs are not expected to be satisfied within the Iter8 experiment. Verify this is the case.
# the SLOs assertion is expected to fail
iter8 k assert -c completed -c nofailure -c slos --timeout 30s
For a more detailed report of the Iter8 experiment, run the report
command.
iter8 k report
Cleanup experiments¶
Clean up the LitmusChaos and Iter8 experiments as described below.
helm uninstall httpbin
iter8 k delete
Scale app and retry¶
Scale up the app so that replica count is increased to 3.
kubectl scale --replicas=3 -n default deploy/httpbin
The scaled app is now more resilient. Performing the same experiments as above will now result in SLOs being satisfied. Relaunch the experiments and observe the experiments. You should now find that the SLOs are satisfied.
Cleanup¶
Cleanup the Kubernetes cluster.
Some variations and extensions of this experiment
- Reuse the above experiment with your app by replacing the
httpbin
app with your app, and modifying the experiment values appropriately. - Iter8 supports load testing and SLO validation for gRPC services. Try a joint chaos injection and SLO validation experiment for gRPC.
- Litmus makes it possible to inject over 51 types of chaos. Modify the LitmusChaos Helm chart with new templates in order to use any of these other types of chaos experiments.