In the [Resources](/t/resource) article, we learned how to discover resources, filter on them, drill down for more detail, and learn what the relationships are.  But beyond just discovering the resources, we also want to know how our resources are operating.  We want to know host cpu, disk, memory, and network utilization.  We also want to know service latency and error rates.  To get this information, we need metrics. Op metrics are time series data with name, resource, and other tag metadata. Let's get the value for a metric right now.  Let’s get the average cpu utilization for each of our hosts for the last 30 seconds:

<Op>hosts | cpu_usage | window(30s) | resolution=10 | mean(3)</Op>

```
 ID | TYPE | NAME                | TIMESTAMPS          | CPU_USAGE
 1  | HOST | i-08442999c268bb61d | 2021/07/02 12:27:30 |      4.69
 2  | HOST | i-08269143cfca5afb4 | 2021/07/02 12:27:30 |     11.21
 3  | HOST | i-0714d77e82ae5486e | 2021/07/02 12:27:30 |      7.75
```

Op resources and metrics naturally mesh together.  Prefixing a metric query with a resource query narrows the metrics returned down to only those metrics associated with the returned resources.

<Release version="next">

## Metric Queries

</Release>

## Create new Metrics

Like resources, we can also define and save useful metric queries for later.  Let's define the concept of average cpu over an interval:

<Op>metric cpu_2_min = cpu_usage | window(120s)</Op>

![](https://files.helpdocs.io/6m3dgskxth/articles/0h85lt9lh6/1620105655052/screen-shot-2021-05-03-at-10-20-40-pm.png)

As shown above, our definition statements are fully parameterized.  Op's macro system allows for expression of complex substitution, but with a familiar syntax.

Op also lets users create custom derived metrics. Let's create a new metric called "cpu_usage_new":

<Op>{'metric cpu_usage_new = (100 - 100 \* (metric_query(metric_names="node_cpu_seconds_total", tags={"mode":"idle"}) | irate(2) | group() | mean)) | lower_bound(0) | upper_bound(100)'}</Op>

If you need to update the formula, simply over-write with the following:

<Op>cpu_usage_new.val = [new formula]</Op>

To set unit of measurement (uom) for the metric:

<Op>cpu_usage_new.units = "percent"</Op>

Now we can leverage both of the Op commands we have built up.  Let's get the average cpu utilization over the last 30 seconds for each of our hosts:

<Op>hosts | cpu_2_min</Op>

![](https://files.helpdocs.io/6m3dgskxth/articles/0h85lt9lh6/1620105793098/screen-shot-2021-05-03-at-10-22-57-pm.png)

The above examples really show the power of Op.  Multiple layers of substitution allow you to very succinctly express a complex query.  And in the heat of a Sev 1 or similarly critical incident, you will be able to easily remember the commands and execute efficiently.

## List existing Metrics

Op allows the user to list all previously defined metrics. Imagine you are interactively debugging your cluster using Op, and you want the cpu_2_min metric, defined above, on each of your hosts, but you don’t remember the name of the defined metric. Let’s list our existing metrics.

<Op>list metrics</Op>

![](https://files.helpdocs.io/6m3dgskxth/articles/0h85lt9lh6/1620105979752/screen-shot-2021-05-03-at-10-26-14-pm.png)

To view just a metric by name:

<Op>list metrics | name="cpu_usage_new"</Op>

<Note>

::include{./includes/metric-tag-regex}

</Note>

We see that we have an entry in the table for our cpu_2_min metric. We know what this metric represents since we defined it and provided it with an appropriate name. However, what if someone new to the ops team wants more information about the metric? Let’s add a description to our metric.

<Op>cpu_2_min.description = "real time cpu usage over last 2 min"</Op>

Now, if we list the metrics again:

<Op>list metrics</Op>

![](https://files.helpdocs.io/6m3dgskxth/articles/0h85lt9lh6/1620106197944/screen-shot-2021-05-03-at-10-29-51-pm.png)

They will now see our metric description alongside the metric name and formula. Op supports standard CRUD (create, read, update, and delete) operations over metric definitions. Please refer to the Op Commands Glossary for more information regarding syntax and supported operations. These features enable an operations team to rapidly build up a shared statement bank of commonly used metrics, which allows for the dissemination of operations knowledge and the ability to quickly gather valuable system information without wasting time reinventing or misremembering commonly used metric formulas.

## Metrics exporter support

Op plugs directly into the **[Prometheus](https://prometheus.io/docs/introduction/overview/)** exporter ecosystem. Op can pull metrics from any Prometheus exporter as well as Prometheus itself.  Exporters such as Envoy and cAdvisor are auto-discovered and ingestible by the Shoreline agent.

<Note type="tip">

Check out our [Metric Scraper configuration guide](/installation/kubernetes/metric-scraper) for additional information.

</Note>

For further information on Envoy, please see **[Envoy Overview](https://www.envoyproxy.io)**.

For examples of cAdvisor metrics, please see **[Monitoring container metrics using cAdvisor](https://prometheus.io/docs/guides/cadvisor)**.


Overview

Metrics

Getting Started

What is Shoreline.io?

Release Notes

Tutorial

Shoreline Tutorial

Resources

Agents

Platform

User Guide

Shoreline Self-Hosted User Guide

Shoreline Controller

Shoreline Controller Guide

Shoreline Self-Hosted

Installation Overview

Virtual Machines

Virtual Machine Installation

Kubernetes Installation

Helm

Metric Scraper

Kubernetes

Windows Installation

Windows

Okta

Custom Okta Provider

Google

Google Cloud Identity

JumpCloud

Installation

Alarms

Alarm Properties

Actions

Action Properties

Action Sequences

Bots

Bot Properties

Circuit Breakers

Properties

Circuit Breaker Properties

auth

columns

config

Dynamic Filters

Events

Jobs

Reserved Words

Op Commands

backend_version

count

delete

describe

disable

enable

jobs

list

timeout

Commands

Op Packs

Op Pack Tutorial

Packs

Keyboard Shortcuts

Op Keyboard Shortcuts

Notebooks

Parameters

Notebook Parameters

Runs

Notebook Runs

Time Triggers

Approvals

Notebook Approvals

Standard Metrics

OpenTelemetry

OpenTelemetry Integration

Transformations

Transform Functions

Monitoring

Monitoring Dashboard

ENV Banner

Access Control

Groups

Access Control Groups

Principals

Principal Properties

Audit

Exporting Audit Logs

Metric Sets

Configuration

CloudTrail Lake

AWS CloudTrail Lake Integration

Alertmanager

Alertmanager Integration

Datadog Integration

Using Notebooks

Notebook Remediation for Datadog

Datadog

Grafana

Grafana Integration

New Relic

New Relic Integration

Slack

Slack Integration

Integrations

Dash 2021

Reading Group

Reading Group Workshop, 2022

Workshops

Datadog Demo Script

Demos

Internal Deployments

Terraform

Test article - wh

Internal

Documentation

Style Guide

Create an Article

Dynamic Content

Publish an Article

Users & Auth

Users, Authentication, and Authorization

Versioning & Releases

Components

Learn