Dynamic Filters

Filter the results of a Resource query based on runtime commands and Op statements.

Simple Resource queries allow you to retrieve Resources based on type or tag. However, dynamic filters allow you to reduce the returned Resource using more advanced criteria:

Using Dynamic Filters

To add a Dynamic Filter to your Resource query just pipe the result to the filter function:

op>
<resource_query> | filter(<criteria>)

The filter function is exclusive -- the result excludes all Resources which do not satisfy the criteria.

If any of the data points from a given Resource satisfy the condition, this meets the filter criteria. Consequently, if the Metric query returns no data points, then the Resource fails to meet the criteria and is excluded.

For example, there are 30 pods in total but half of them have no memory usage, so 15 are filtered:

op>
pod | filter(pod_memory_usage) | count
 RESOURCE_COUNT
 15
Filters excluded 15/30 pods
You can change the default behavior of non-data exclusion by setting the include_missing parameter to true. Doing so includes all Resources with missing data.
op>
pod | filter(pod_memory_usage) | include_missing=true | count
 RESOURCE_COUNT
 30

Filter by Metric Query

Reduce the Resource query collection based on the Metric query result applied to each Resource.

To illustrate, we'll initially chain a Resource query with a Metric to display all host Resource memory usage.

op>
host | mem_used
 ID | TYPE | NAME                | REGION    | AZ         | TIMESTAMPS          | MEM_USED
 1  | HOST | i-027db1eb164210c5f | us-west-2 | us-west-2a | 2021/06/30 14:02:23 |    48.60
    |      |                     |           |            | 2021/06/30 14:02:24 |    48.60
    |      |                     |           |            | 2021/06/30 14:02:25 |    48.59
    |      |                     |           |            | 2021/06/30 14:02:26 |    48.60
    |      |                     |           |            | 2021/06/30 14:02:27 |    48.59
 2  | HOST | i-080a0e999c044f9f4 | us-west-2 | us-west-2c | 2021/06/30 14:02:23 |    27.96
    |      |                     |           |            | 2021/06/30 14:02:24 |    27.96
    |      |                     |           |            | 2021/06/30 14:02:25 |    27.95
    |      |                     |           |            | 2021/06/30 14:02:26 |    27.96
    |      |                     |           |            | 2021/06/30 14:02:27 |    27.95
 3  | HOST | i-03d223a4a97d193ca | us-west-2 | us-west-2b | 2021/06/30 14:02:23 |    45.85
    |      |                     |           |            | 2021/06/30 14:02:24 |    45.86
    |      |                     |           |            | 2021/06/30 14:02:25 |    45.85
    |      |                     |           |            | 2021/06/30 14:02:26 |    45.85
    |      |                     |           |            | 2021/06/30 14:02:27 |    45.85
Passing the standard mem_used Metric as the filter criteria returns all Resources with any memory usage, which is all hosts in this case.
op>
host | filter(mem_used)
 ID | TYPE | NAME                | REGION    | AZ
 1  | HOST | i-027db1eb164210c5f | us-west-2 | us-west-2a
 2  | HOST | i-080a0e999c044f9f4 | us-west-2 | us-west-2c
 3  | HOST | i-03d223a4a97d193ca | us-west-2 | us-west-2b

You can also pass a comparison value to a Metric query filter criteria. As such, you can use any standard comparison operators, i.e.: ==, !=, >, >=, <, and <=.

Here you're getting host Resources using at least 40% of their memory.

op>
host | filter(mem_used >= 40)
 ID | TYPE | NAME                | REGION    | AZ
 1  | HOST | i-027db1eb164210c5f | us-west-2 | us-west-2a
 3  | HOST | i-03d223a4a97d193ca | us-west-2 | us-west-2b
Filters excluded 1/3 hosts
You can combine multiple Metric queries into a single filter criteria using the and or or operators. Here we're getting hosts with at least 40% memory usage and under 15% CPU usage.
op>
host | filter((mem_used >= 40) and (cpu_usage < 15))
 ID | TYPE | NAME                | REGION    | AZ
 1  | HOST | i-027db1eb164210c5f | us-west-2 | us-west-2a
 3  | HOST | i-03d223a4a97d193ca | us-west-2 | us-west-2b
Filters excluded 1/3 hosts

You can continue the Op statement by chaining additional commands onto the filtered results. For example, piping a Metric query onto the above filtered Resource collection lets you output Metric data for just those filtered results.

op>
host | filter((mem_used >= 40) and (cpu_usage < 15)) | mem_used | cpu_usage
 ID | TYPE | NAME                | TIMESTAMPS          | MEM_USED | CPU_USAGE
 1  | HOST | i-027db1eb164210c5f | 2021/06/30 14:53:53 |    48.66 |       ---
    |      |                     | 2021/06/30 14:53:54 |    48.66 |      3.25
    |      |                     | 2021/06/30 14:53:55 |    48.65 |      3.75
    |      |                     | 2021/06/30 14:53:56 |    48.65 |      5.75
    |      |                     | 2021/06/30 14:53:57 |    48.65 |      5.25
 3  | HOST | i-03d223a4a97d193ca | 2021/06/30 14:53:53 |    45.85 |       ---
    |      |                     | 2021/06/30 14:53:54 |    45.86 |      6.75
    |      |                     | 2021/06/30 14:53:55 |    45.86 |      5.50
    |      |                     | 2021/06/30 14:53:56 |    45.86 |      7.00
    |      |                     | 2021/06/30 14:53:57 |    45.86 |      6.00

Filter by Linux Command

Filtering by Linux commands is evaluated based on exit codes. Op supports integer exit code comparisons using ==, !=, >, >=, <, and <=. The default comparison against the exit code is == 0.

Consider a cluster with three hosts. We're executing a Linux command to echo the average CPU usage from the last second to stdout:

op>
host | `echo $[100-$(vmstat 1 2|tail -1|awk '{print $15}')]`
 ID | TYPE | NAME                | STATUS | STDOUT
 1  | HOST | i-08d4799e91dfd5f51 |   0    | 8
 2  | HOST | i-0aa578e9eb2c111dc |   0    | 4
 3  | HOST | i-02da5463c5bb8c476 |   0    | 9
Notice that the STATUS value (i.e. the exit code) is 0. Therefore, passing the above Linux command to the filter() function compares the 0 exit code to the standard comparison of == 0, returning all Resources.
op>
host | filter(`echo $[100-$(vmstat 1 2|tail -1|awk '{print $15}')]`)
 ID | TYPE | NAME
 1  | HOST | i-08d4799e91dfd5f51
 2  | HOST | i-0aa578e9eb2c111dc
 3  | HOST | i-02da5463c5bb8c476

As illustrated above, Op does not currently support comparisons of the stdout stream. However, there are two potential workarounds:

Modify the exit code

You can modify your Linux command to return an exit code value.

Below we're passing the above command to the exit Linux function after checking if the value is greater than 7. This boolean is inverted to ensure the proper exit code: 0 indicates the host passed the comparison.

op>
host | filter(`exit $[!$[100-$(vmstat 1 2|tail -1|awk '{print $15}') > 7]]`)
 ID | TYPE | NAME
 2  | HOST | i-0aa578e9eb2c111dc
 3  | HOST | i-02da5463c5bb8c476
Filters excluded 1/3 hosts
Use environment variables

The res_env_var Action property can be set to an environment variable, the value of which is evaluated instead of the exit status in certain contexts.

res_env_var is most useful when an Alarm's fire_query or clear_query executes an Action.

Consider the following Action that sets the AVG_CPU_USAGE environment variable to the current CPU usage.

op>
action high_cpu_action = `AVG_CPU_USAGE=$[100-$(vmstat 1 2|tail -1|awk '{print $15}')]`
Created action 'high_cpu_action'.

Set res_env_var to the AVG_CPU_USAGE environment variable.

op>
high_cpu_action.res_env_var = "AVG_CPU_USAGE"
Updated action 'high_cpu_action'.

The value returned by high_cpu_action is no longer the default exit status code. Instead, the value of AVG_CPU_USAGE is returned.

Next, create an Alarm with a fire_query evaluating the result of the high_cpu_action. Here we're firing the Alarm when AVG_CPU_USAGE is at least 10.

op>
alarm high_cpu_alarm = high_cpu_action >= 10
op>
high_cpu_alarm.clear_query = high_cpu_action < 10
op>
high_cpu_alarm.resource_query = host
Lastly, enable the Alarm.
op>
enable high_cpu_alarm
The high_cpu_alarm now executes the high_cpu_action, which internally returns the value of the environment variable set in res_env_var. Below we're filtering to find host Resources on which the high_cpu_alarm has fired:
op>
host | filter((events | type="alarm" | name="high_cpu_alarm"))
 ID | TYPE | NAME                | REGION    | AZ
 2  | HOST | i-0a7803bd4fa45dcac | us-west-2 | us-west-2a
 3  | HOST | i-010a03bf359e54b1b | us-west-2 | us-west-2b
Filters excluded 1/3 hosts

The res_env_var property allows you to create complex Action and Alarm relationships based on any environment variables on the Resource.

Filter by Event

You can filter Resources by Events, based on things like the Event type, name, count, and so forth.

For example, below we're filtering all host Resources to those that have triggered the my_cpu_alarm:

op>
hosts | filter(events | type="alarm" | name="my_cpu_alarm")
RESOURCE_ID | RESOURCE_NAME       | RESOURCE_TYPE | ALARM_NAME        | STATUS   | STEP_TYPE   | TIMESTAMP
2           | i-0746dfd68699f2227 | HOST          | my_cpu_alarm      | resolved |             |
            |                     |               |                   |          | ALARM_FIRE  | 2021-09-01T15:38:45-07:00
            |                     |               |                   |          | ALARM_CLEAR | 2021-09-01T15:38:46-07:00
2           | i-0746dfd68699f2227 | HOST          | my_cpu_alarm      | resolved |             |
            |                     |               |                   |          | ALARM_FIRE  | 2021-09-01T15:38:43-07:00
            |                     |               |                   |          | ALARM_CLEAR | 2021-09-01T15:38:44-07:00
3           | i-0e1d2e73331dd57b1 | HOST          | my_cpu_alarm      | resolved |             |
            |                     |               |                   |          | ALARM_FIRE  | 2021-09-01T15:38:09-07:00
            |                     |               |                   |          | ALARM_CLEAR | 2021-09-01T15:43:18-07:00
Filters excluded 1/3 hosts

When passing the count command to the Event query it returns the number of matching Events, which is a great way to further reduce the returned Resources.

Here we're seeing that only one host triggered my_cpu_alarm multiple times.

op>
hosts | filter((events | type="alarm" | name="my_cpu_alarm" | count) >= 2)
GROUP     | RESOURCE_ID | RESOURCE_TYPE | RESOURCE_NAME       | EVENT_TYPE | FIRED | CLEARED | ACTIVE | TOTAL_COUNT
group_all |             |               |                     | ALARMS     | 3     | 3       | 0      | 3
3         |             |               |                     | ALARMS     | 1     | 1       | 0      | 1
          | 3           | HOST          | i-0e1d2e73331dd57b1 |            |       |         |        |
2         |             |               |                     | ALARMS     | 2     | 2       | 0      | 2
          | 2           | HOST          | i-0746dfd68699f2227 |            |       |         |        |
Filters excluded 2/3 hosts

You can also use additional Event query options such as status to filter only Resources that triggered an Alarm.

op>
hosts | filter((events | type="alarm" | status="triggered"))
Filters excluded 3/3 hosts

Using Filter Results

Since filter results are Resource collections you can can pipe the results to subsequent Op commands, including Metric queries, Linux commands, and Event queries.

For example, here's a simple Metric query to get memory usage.

op>
host | filter(cpu_usage > 10) | mem_used
 ID | TYPE | NAME                | TIMESTAMPS          | MEM_USED
 1  | HOST | i-08d4799e91dfd5f51 | 2021/07/01 16:34:54 |    28.31
    |      |                     | 2021/07/01 16:34:55 |    28.32
    |      |                     | 2021/07/01 16:34:56 |    28.31
    |      |                     | 2021/07/01 16:34:57 |    28.32
    |      |                     | 2021/07/01 16:34:58 |    28.31
 2  | HOST | i-0aa578e9eb2c111dc | 2021/07/01 16:34:54 |    21.46
    |      |                     | 2021/07/01 16:34:55 |    21.45
    |      |                     | 2021/07/01 16:34:56 |    21.45
    |      |                     | 2021/07/01 16:34:57 |    21.45
    |      |                     | 2021/07/01 16:34:58 |    21.45
Filters excluded 1/3 hosts

Below is a Linux command to list files in the /tmp directory.

op>
host | filter(cpu_usage > 10) | `ls /tmp`
ID | TYPE | NAME                | STATUS | STDOUT
 2  | HOST | i-0aa578e9eb2c111dc |   0    | foo.txt
Filters excluded 2/3 hosts

Here is an Event query getting all Actions.

op>
host | filter(cpu_usage > 10) | events | type="action"

Chaining Filters

Filters are chainable, allowing you to refine the resulting Resource collection.

For example, below we're filtering hosts with over 15% cpu_usage that also contain foo.txt in the /tmp directory.

op>
host | filter(cpu_usage > 15) | filter(`test -f /tmp/foo.txt`)
 ID | TYPE | NAME
 2  | HOST | i-0aa578e9eb2c111dc
 3  | HOST | i-02da5463c5bb8c476
Filters excluded 1/3 hosts

Op also supports filters within a Resource query. Doing so restricts the filtering at that level but then appends the remaining Resource query to produce the result collection.

For example, consider the following Op statement.

op>
host | filter(cpu_usage > 15) | .pod | .container | filter(container_memory_usage > 50)
 ID | TYPE      | NAME
 18 | CONTAINER | test6-cust.shoreline-2h2bm.shoreline
 19 | CONTAINER | samplejava.tomcat-7c6d66f845-nv4zq.tomcat
 21 | CONTAINER | monitoring.test6c-node-exporter-gvkhh.prometheus-node-exporter
 23 | CONTAINER | kube-system.fluentd-sp4bj.fluentd
 24 | CONTAINER | kube-system.coredns-cbbcfbf8-ccdsf.coredns
 25 | CONTAINER | kube-system.coredns-cbbcfbf8-6gcp7.coredns
Filters excluded 2/3 hosts, 11/25 pods, 19/25 containers

It processes the following steps:

  1. Get all hosts
  2. Retrieve only hosts with more than 15% CPU usage
  3. Get pods of those filtered hosts
  4. Get containers of those pods
  5. Retrieve only containers with more than 50% memory usage