In the Resources article, we learned how to discover resources, filter on them, drill down for more detail, and learn what the relationships are. But beyond just discovering the resources, we also want to know how our resources are operating. We want to know host cpu, disk, memory, and network utilization. We also want to know service latency and error rates. To get this information, we need metrics. Op metrics are time series data with name, resource, and other tag metadata. Let's get the value for a metric right now. Let’s get the average cpu utilization for each of our hosts for the last 30 seconds:
ID | TYPE | NAME | TIMESTAMPS | CPU_USAGE 1 | HOST | i-08442999c268bb61d | 2021/07/02 12:27:30 | 4.69 2 | HOST | i-08269143cfca5afb4 | 2021/07/02 12:27:30 | 11.21 3 | HOST | i-0714d77e82ae5486e | 2021/07/02 12:27:30 | 7.75
Op resources and metrics naturally mesh together. Prefixing a metric query with a resource query narrows the metrics returned down to only those metrics associated with the returned resources.
Like resources, we can also define and save useful metric queries for later. Let's define the concept of average cpu over an interval:
As shown above, our definition statements are fully parameterized. Op's macro system allows for expression of complex substitution, but with a familiar syntax.
Op also lets users create custom derived metrics. Let's create a new metric called "cpu_usage_new":
The above examples really show the power of Op. Multiple layers of substitution allow you to very succinctly express a complex query. And in the heat of a Sev 1 or similarly critical incident, you will be able to easily remember the commands and execute efficiently.
Op allows the user to list all previously defined metrics. Imagine you are interactively debugging your cluster using Op, and you want the cpu_2_min metric, defined above, on each of your hosts, but you don’t remember the name of the defined metric. Let’s list our existing metrics.
To view just a metric by name:
They will now see our metric description alongside the metric name and formula. Op supports standard CRUD (create, read, update, and delete) operations over metric definitions. Please refer to the Op Commands Glossary for more information regarding syntax and supported operations. These features enable an operations team to rapidly build up a shared statement bank of commonly used metrics, which allows for the dissemination of operations knowledge and the ability to quickly gather valuable system information without wasting time reinventing or misremembering commonly used metric formulas.
Op plugs directly into the Prometheus exporter ecosystem. Op can pull metrics from any Prometheus exporter as well as Prometheus itself. Exporters such as Envoy and cAdvisor are auto-discovered and ingestible by the Shoreline agent.
For further information on Envoy, please see Envoy Overview.
For examples of cAdvisor metrics, please see Monitoring container metrics using cAdvisor.