In this 15 minute tutorial, we'll cover the fundamental building blocks of Shoreline. This tutorial will focus on Op: Shoreline's domain-specific language for operations. Op allows operators (SREs, DevOps, SysAdmin, and SWE) to:
- Interactively debug their systems
- Create automated remediations to detect and mitigate issues without operator involvement
This tutorial will focus on Shoreline running in a Kubernetes environment on AWS. As of writing, Shoreline also runs on AWS VMs directly, however, that will not be covered.
Operators care about the resources they are responsible for. To that end, Shoreline discovers the hosts, pods, and containers in your environment. Let's begin with a basic resource query to see all of our hosts.
The pipe operator is inspired by shell's pipe - you use it chain together statements in the Op, passing data from one to the next. In the above, a tag filter placed after a resource query filters down the hosts to only those whose tags match.
Let's combine more statements. Op also discovers resource topology e.g. which pods are scheduled to which hosts and which containers sit in those pods. Let's fetch all of the pods scheduled to those hosts in zone us-west-2c:
These symbols are accessible to all operators. This prevents the issue of commands being trapped in folks' heads or growing out of date on wikis.
Operators care about which resources they are responsible for and how these resources are performing. To that end, metrics are integrated into Shoreline as a first class entity. We can combine together resource and metric queries using the pipe operator. Let's fetch the cpu usage for all of our hosts:
Like Prometheus, Shoreline polls Kubernetes exporters to ingest metrics making it easy to place into a Kubernetes environment. Shoreline also supports deploying to virtual machines, and in that environment, Shoreline installs the same exporters to gather metrics.
Op contains an entire metrics expression language. The entire syntax is outside of the scope of this tutorial, but we can do some simple manipulations of our cpu metric. Let's grab it for the last 30 seconds:
While Shoreline is useful as a resource inventory and observability tool, it really shines as a mitigation tool i.e. shoreline lets you take action. To that end, Shoreline supports executing distributed commands. Distributed commands can run on hosts or on containers. Let's start off by creating a resource query that returns our shoreline agent containers: we'll use these to test actions.
Individual resource, metric, and action are useful for interactive debugging, but the power to perform automated remediation comes from combining them together into higher constructs. The first of these is an alarm. Alarms can fire on metrics and/or Linux commands. This means you can create alarms that take into account fast moving metrics along with system state.
Let's create the most basic type of alarm, an alarm on a single metric. Let's define the alarm to fire when the cpu_usage for a host has exceeded 70%, 20 times in the last 30 seconds.
Next, we need to define how to clear the alarm. Alarm clear and fire might not be symmetrical e.g. we might want it to be easier to enter alarm than to exit alarm so that we have some certainty that the alarm won't flap up and down.
In this case though, we'll define a symmetrical alarm, on just the opposite condition i.e. we'll clear the alarm when cpu usage is greater than 70% less than 20 times in the last 30 seconds.
So far we have created an alarm, but nothing happens if the alarm fires. To solve that problem, we need to link an action to the alarm. To do that, we'll define a bot. Bots are if-this-then-that style structures that link alarms to actions. Let's create a bot that runs our top_memory action from before whenever the high_cpu_alarm fires:
We hope that you found this to be a useful introduction to using Op and Shoreline. There are many more capabilities and parameters that were not covered in this tutorial. Those can be found throughout the rest of the documentation or by reaching out via slack or email. But, we hope that this gives you enough starting knowledge to begin automating operational tasks!