Alarms

Alarms are fully-customizable Metric or status checks that automatically trigger remediation Actions.

Alarms frequently check one or more Metric thresholds or custom Resource queries. The Alarm is raised based on custom thresholds or shell commands you define, which informs any connected Bot to trigger remedial Actions.

Alarm Properties

Each Alarm can define many properties to determine its behavior. The required properties when creating an Alarm are:

Check out Alarm Properties for details on all available properties and how to use them.

Create an Alarm

Alarms are created using either the CLI or the Configuration UI.

Below we'll walk through how to create an Alarm within the CLI that meets the following criteria:

  • Applies to all host Resources within the us-east-1a Availability Zone
  • Fires whenever CPU usage exceeds 65% for at least 15 seconds during the previous 30-second period
  • Resolves whenever CPU usage falls below 55% for at least 20 seconds during the previous 30-second period

Creating basic shell script Alarms with Op uses the following syntax: alarm <name> = <fire_query>

  • name - The name of the Alarm. This value must contain only alphanumeric or underscore characters, and must be globally unique.
  • fire_query - The Op statement that triggers the Alarm.
  1. Start by defining an Alarm named high_cpu_alarm and set the fire_query property to meet the outlined requirements above

    op>
    alarm high_cpu_alarm = (cpu_usage > 65 | sum(30)) >= 15
    Created alarm 'high_cpu_alarm'.
    
  2. Define the clear_query to tell the Alarm when to resolve itself

    op>
    high_cpu_alarm.clear_query = ( cpu_usage < 55 | sum ( 30 ) ) >= 20
    Updated alarm 'high_cpu_alarm'.
    
  3. Set the Resources the Alarm applies to with the resource_query property

    op>
    high_cpu_alarm.resource_query = host | az=["us-east-1a"]
    Updated alarm 'high_cpu_alarm'.
    
  4. Lastly, enable the Alarm with the enable command

    op>
    enable high_cpu_alarm
    Updated alarm 'high_cpu_alarm'.
    

That's it! Your Alarm is now active and ready to be connected to an Action via a Bot.

Use the list command to verify that the Alarm is defined and enabled:

op>
list alarms | name="high_cpu_alarm"
TYPE  | NAME           | ENABLED | RESOURCE_QUERY           | FIRE_QUERY
ALARM | high_cpu_alarm | 1       | host | az=["us-east-1a"] | ( cpu_usage > 65 | sum ( 30 ) ) >= 15

Edit an Alarm

You can edit an existing Alarm from either the CLI or the UI.

To edit an existing Alarm in the CLI simply set a new value for the appropriate Alarm property using dot notation.

For example, to change the Alarm description:

op>
high_cpu_alarm.description = "<string_description>"
See Alarm Properties for a list of editable properties.

Delete an Alarm

You can delete an existing Alarm from either the CLI or the UI.

To delete an Alarm via the CLI use the delete Op command:

op>
delete high_cpu_alarm
Deleted alarm 'high_cpu_alarm'.

Examples

You can create many types of Alarms.