custommetrics¶
Fetch metrics from databases (like Prometheus) and other REST APIs.
Usage Example¶
In this example, the custommetrics
task fetches metrics from the Prometheus database that is created by Istio's Prometheus add-on.
iter8 k launch \
--set "tasks={custommetrics,assess}" \
--set custommetrics.templates.istio-prom="https://raw.githubusercontent.com/iter8-tools/hub/main/templates/custommetrics/istio-prom.tpl" \
--set custommetrics.values.labels.destination_app=httpbin \
--set custommetrics.values.labels.namespace=default \
--set assess.SLOs.upper.istio-prom/error-rate=0 \
--set assess.SLOs.upper.istio-prom/latency-mean=100 \
--set runner=cronjob \
--set cronjobSchedule="*/1 * * * *"
Parameters¶
Name | Type | Description |
---|---|---|
templates | map[string]string | A map where each key is the name of a provider, and the corresponding value is a URL containing the provider template. |
values | map[string]interface{} | A map that contains the values for variables in provider templates. When there are two or more app versions, this map contains values that are common to all versions. |
versionValues | []map[string]interface{} | An array that contains version-specific values for variables in provider templates. While fetching metrics for version i , the task merges values with versionValues[i] (latter takes precedence), and the merged map contains the values for variables in provider templates. |
How it works¶
The logic of this task is illustrated by the following flowchart.
graph TD
A([Start]) --> B([Get provider template]);
B --> C([Compute variable values]);
C --> D([Create provider spec by combining template with values]);
D --> E([Query database]);
E --> F([Process response]);
F --> G([Update metric value in experiment]);
G --> H{Done with all metrics?};
H ---->|No| E;
H ---->|Yes| I{Done with all versions?};
I ---->|No| C;
I ---->|Yes| J([End]);
We describe the concepts or provider spec and provider template next.
Provider spec¶
Iter8 needs the information following in order to fetch metrics from a database.
- The HTTP URL where the database can be queried.
- The HTTP headers and method (GET/POST) to be used while querying the database.
- For each metric to be fetched from the database:
- The specific HTTP query to be used, in particular, the HTTP query parameters and body (if any).
- The logic for parsing the query response and retrieving the metric value.
The above information is encapsulated by ProviderSpec
, a data structure which Iter8 associates with each provider, and Metric
, a data structure which Iter8 associates with each metric provided by a provider.
Golang type definitions for ProviderSpec and Metric
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
The ProviderSpec
and Metric
data structures together supply Iter8 with all the information needed to query databases, process the response to extract metric values, store the metric values in experiments, and display them in experiment reports with auxiliary information (such as description and units). Metric types are defined here.
Provider template¶
Rather than supplying provider specs directly, Iter8 enables users to supply one or more Golang templates for provider specs. Iter8 combines the provider templates with values, in order to generate provider specs in YAML format, and uses them to query for the metrics.
Example providers specs: * istio-prom for Istio's Prometheus plugin
In order to create provider templates and use them in experiments, it is necessary to have a clear understanding of how variable values are computed, and how the response from the database is processed by Iter8. We describe these steps next.
Computing variable values¶
Variable values are configured explicitly by the user during experiment launch. The sole exception to this rule is the elapsedTimeSeconds
variable which is computed by Iter8. Please see the tabs below to learn more about how to configure values and how Iter8 computes elapsedTimeSeconds
.
When the experiment involves a single version of the app, template variable values are supplied directly as part of the custommetrics.values
map. See usage example for an illustration.
When the experiment involves two or more versions of the app, values that are shared by all versions are supplied as part of the custommetrics.values
map, and values that are specific to versions are supplied as part of the custommetrics.versionValues
list. The length of this list is the number of versions, and custommetrics.versionValues[i]
is the map that holds values specific to version i
. Iter8 merges custommetrics.values
with custommetrics.versionValues[i]
(latter takes precedence), and uses the resulting map for version i
when substituting template variables. Configuring values for two versions is illustrated in the following usage example.
iter8 k launch \
--set "tasks={custommetrics,assess}" \
--set custommetrics.templates.istio-prom="https://raw.githubusercontent.com/iter8-tools/hub/main/templates/custommetrics/istio-prom.tpl" \
--set custommetrics.values.labels.namespace=default \
--set custommetrics.values.labels.destination_app=httpbin \
--set custommetrics.values.labels.reporter=destination \
--set 'custommetrics.versionValues[0].labels.destination_version=v1' \
--set 'custommetrics.versionValues[1].labels.destination_version=v2' \
--set assess.SLOs.upper.istio-prom/error-rate=0 \
--set assess.SLOs.upper.istio-prom/latency-mean=100 \
--set runner=cronjob \
--set cronjobSchedule="*/1 * * * *"
A metric query often involves specifying the time window over which the metric need to be computed. In provider templates, a special template variable named elapsedTimeSeconds
holds the length of this time window. Its use within a template is illustrated in the following snippets.
sum(last_over_time(istio_requests_total{
destination_app="httpbin",
namespace="default"
}[3600s]))
sum(last_over_time(istio_requests_total{
destination_app="httpbin",
namespace="default"
}[{{ .elapsedTimeSeconds }}s]))
elapsedTimeSeconds
. Iter8 computes the value of the elapsedTimeSeconds
variable dynamically in this task. This is the desirable behavior in multi-loop experiments (see usage example), where metrics need to be fetched periodically, and the time window over which metrics are computed stretches farther back with each loop. The following sequence diagram illustrates how elapsedTimeSeconds
changes over loops.
sequenceDiagram
startingTime-)loop1: elapsedTimeSeconds=60;
startingTime-)loop2: elapsedTimeSeconds=120;
startingTime-)loop3: elapsedTimeSeconds=180;
Iter8 computes elapsedTimeSeconds
based on another variable named startingTime
. The default value of startingTime
is the time at which the experiment is launched. The user can override the default by explicitly configuring startingTime
during experiment launch, in the RFC 3339 format (for example, 2020-02-01T09:44:40Z
or 2020-02-01T09:44:40.954641934Z
). Iter8 sets elapsedTimeSeconds
as the difference (in seconds) between the current time and startingTime
. This logic is illustrated in the following flowchart.
graph TD
A([Start]) --> B{startingTime parameter supplied?};
B ---->|Yes| C([elapsedTimeSeconds = currentTime - startingTime]);
B ---->|No| D([startingTime = time when experiment was launched]);
D --> C;
C --> E([End]);
Note that the above design enables the user to supply different startingTime
values for different app versions (for instance, based on the creation timestamps of the versions).
--set custommetrics.values.startingTime="2020-02-01T09:44:40Z"
--set custommetrics.versionValues[0].startingTime="2020-02-01T09:44:40Z" \
--set custommetrics.versionValues[1].startingTime="2020-02-05T14:22:15Z"
Processing response¶
The metrics provider is expected to respond to Iter8's HTTP request for a metric with a JSON object. The format of this JSON object is provider-specific. Iter8 uses jq to extract the metric value from the JSON response of the provider. The jqExpression
used by Iter8 is supplied as part of the metric definition. When the jqExpression
is applied to the JSON response, it is expected to yield a number.
The format of the Prometheus JSON response is defined here. A sample Prometheus response is as follows.
1 2 3 4 5 6 7 8 9 10 11 |
|
Consider the jqExpression
defined in the sample Prometheus metric. Let us apply it to the sample JSON response from Prometheus.
echo '{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"value": [1556823494.744, "21.7639"]
}
]
}
}' | jq ".data.result[0].value[1] | tonumber"
21.7639
, a number, as required by Iter8. Note: The shell command above is for illustration only. Iter8 uses Python bindings for
jq
to evaluate thejqExpression
.
Defining and using providers¶
- Understand how the
custommetrics
task works; this is described in this section. - Create your provider template and serve it from a URL. A sample provider template is in this section.
- Configure the
custommetrics
task with one or more provider templates. An example ofcustommetrics
configuration is in this section. - The metrics fetched by this task can be used to assess app versions in Iter8 experiments. An example that illustrates the use of both
custommetrics
andassess
tasks together is in this section.