Metric Scraper

A guide to configuring the Shoreline Metric Scraper.

Prometheus powers the Shoreline Metric Scraper, which allows you to generate a wide variety of multi-dimensional time series metric data. You'll configure the Metric Scraper as part of your Shoreline Kubernetes (k8s) installation process.

Basic Configuration

Metric Scraper configuration is defined as a ConfigMap within your Shoreline Kubernetes Installation.

apiVersion: v1
data:
  scraper.yml: |
    scrape_configs:
      - job_name: ''
kind: ConfigMap
metadata:
  name: scraper-config
  namespace: <company_name>

The scrape_configs section is an extension of Prometheus' scrape_config and provides the same capabilities. Use it to define jobs that use a set of targets and associated parameters which describe how to scrape those targets.

Service Discovery

The Shoreline Metric Scraper supports two types of service discovery configurations: kubernetes_sd_config and static_config.

kubernetes_sd_config

The kubernetes_sd_configs section allows you to retrieve scrape targets from the Kubernetes REST API.

For example, the following kubernetes_sd_config discovers scrape targets from service endpoints with the monitoring namespace.

kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
        - monitoring

static_config

The static_configs section allows you to define a list of scrape targets and to set a label for them.

For example, the following static_config discovers targets at localhost:8080 and sets the service label with a value of web-app:

static_configs:
  - targets:
    [ 'localhost:8080' ]
    labels:
      service: 'web-app'

resource_mapping_config

The optional resource_mapping_config section allows you to add extra configuration that maps Prometheus metrics to Shoreline Resources.

resource_mapping_config:
  # per scraper mapping or per metrics mapping
  mode: <exporter(default)|metric>
  # the list of possible Resource types
  resource_types: -<host|pod|container>
  # default resource type, when the __shoreline_rq_type is not specified in relabel_configs. Usually it's used in exporter mode.
  default_resource_type: <host(default)|pod|container>

For example, a resource_mapping_config for a job definition based on node exporter might look like the following:

resource_mapping_config:
  mode: exporter
  resource_types:
    - host # default resource type, default value of __shoreline_rq_type

A job definition that uses metrics from cAdvisor might look like:

resource_mapping_config:
  mode: metric
  resource_types:
    - pod
    - container
  default_resource_type: container # default resource type, default value of __shoreline_rq_type
  • mode: metric allows mapped metrics to come from different Resources.
  • resource_types: [pod, container] allows metrics to specifically come from pod and container Resources.
  • default_resource_type defines the default Resource type. This value is mapped to the __shoreline_rq_type meta label.

relabel_config

A relabel_config allows you to rewrite target labels before they are scraped. Each relabel_config is defined in the relabel_configs section and are applied in order.

For example, the following relabel_config finds any k8s endpoints with a name that matches node-exporter and passes those to the rest of the configuration:

relabel_configs:
  - source_labels: [__meta_kubernetes_endpoints_name]
    regex: node-exporter
    action: keep

For more info see the Prometheus: relabel_config documentation.

metric_relabel_config

A metric_relabel_config uses the same syntax and rules as relabel_config, but these configurations are applied at the end of the ingestion process, making it an ideal place to select only the metrics you need.

For example, consider the following metric_relabel_config:

metric_relabel_configs:
  - source_labels: [__name__] # metric filter
    action: keep
    regex: (node_cpu_seconds_total|node_memory_MemFree_bytes|node_memory_Cached_bytes)
  • The __name__ meta label finds all source metric names.
  • The regex contains a list of metric names to match.
  • action: keep drops all metrics that do not match any of the regex values.

Another example is a series of metric_relabel_configs, used in a cAdvisor job definition.

metric_relabel_configs:
  - source_labels: [__name__] # metric filter
    action: keep
    regex: (container_cpu_usage_seconds_total|container_memory_usage_bytes|container_spec_cpu_quota)
  - source_labels: [container]
    target_label: __shoreline_rq_type
    regex: (^$|^POD$)
    action: replace
    replacement: pod
  - source_labels: [namespace]
    target_label: __shoreline_rq_tags_namespace
  - source_labels: [pod]
    target_label: __shoreline_rq_tags_pod_name
  - source_labels: [container]
    regex: (^([^P]|P[^O]|PO[^D])+$) # not empty and not 'POD'
    target_label: __shoreline_rq_tags_container_name

Let's quickly evaluate what each above metric_relabel_config does:

  1. Removes all metrics by name except for a select few targeting container Resources.
  2. If the container label is an empty string, or equal to POD, set the __shoreline_rq_type meta label to the value of pod.
  3. Set the __shoreline_rq_tags_namespace meta label to the value of the namespace label.
  4. Set the __shoreline_rq_tags_pod_name meta label to the value of the pod label.
  5. Finally, if the container label is not empty and is not POD, __shoreline_rq_tags_container_name meta label to the value of the container label.

The above metric_relabel_configs section allows you to lookup Shoreline Resources based on metric names. When using types and tags to make Resource queries, there should be only one matching returned Resource.

Define a Job

A job definition typically consists of a service discovery definition and one or more relabel_configs and metric_relabel_configs.

Node Exporter

- job_name: 'node-exporter'
  kubernetes_sd_configs:
    - role: endpoints
      namespaces:
        names:
          - monitoring
  relabel_configs:
    - source_labels: [__meta_kubernetes_endpoints_name]
      regex: node-exporter
      action: keep
  metric_relabel_configs:
    - source_labels: [__name__] # metric filter
      action: keep
      regex: (node_cpu_seconds_total|node_memory_MemFree_bytes|node_memory_Cached_bytes|node_memory_MemTotal_bytes|node_memory_Buffers_bytes|node_filesystem_avail_bytes|node_network_transmit_drop_total|node_network_transmit_packets_total|node_network_transmit_bytes_total|node_network_receive_packets_total|node_network_receive_bytes_total|node_load5|node_memory_MemAvailable_bytes|node_disk_reads_completed_total|node_disk_writes_completed_total|node_disk_read_bytes_total|node_disk_written_bytes_total|node_disk_read_time_seconds_total|node_disk_write_time_seconds_total|node_vmstat_pgfault|node_vmstat_pgmajfault|node_vmstat_oom_kill|node_filesystem_free_bytes|node_filesystem_size_bytes|node_network_receive_errs_total|node_network_transmit_errs_total|node_network_receive_drop_total)
  • This job looks for an exposed node exporter k8s endpoint, using the monitoring namespace and a name of node-exporter.
  • It filters out all metrics that do not match any of the existing names in the metric_relabel_config regex value.

cAdvisor

- job_name: 'cadvisor'
  scheme: https
  metrics_path: /metrics/cadvisor
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
    - role: node
  resource_mapping_config:
    mode: metric
    resource_types:
      - pod
      - container
    default_resource_type: container # default resource type, default value of __shoreline_rq_type__
  relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
  metric_relabel_configs:
    - source_labels: [__name__] # metric filter
      action: keep
      regex: (container_cpu_usage_seconds_total|container_memory_usage_bytes|container_spec_cpu_quota|container_fs_limit_bytes|container_spec_memory_limit_bytes|container_network_receive_packets_dropped_total|container_network_transmit_packets_dropped_total|container_network_receive_bytes_total|container_network_transmit_bytes_total|container_fs_read_seconds_total|container_fs_reads_bytes_total|container_fs_reads_total|container_fs_write_seconds_total|container_fs_writes_merged_total|container_fs_writes_total|container_memory_working_set_bytes|container_fs_writes_bytes_total)
    - source_labels: [container]
      target_label: __shoreline_rq_type
      regex: (^$|^POD$)
      action: replace
      replacement: pod
    - source_labels: [namespace]
      target_label: __shoreline_rq_tags_namespace
    - source_labels: [pod]
      target_label: __shoreline_rq_tags_pod_name
    - source_labels: [container]
      regex: (^([^P]|P[^O]|PO[^D])+$) # not empty or POD
      target_label: __shoreline_rq_tags_container_name
    - regex: (image|id|name|namespace|pod)
      action: labeldrop
  • This cAdvisor job uses a non-default endpoint, so the scheme, metrics_path, tls_config, and bearer_token_file attributes are required to find, authenticate, and scrape data from it.
  • The kubernetes_sd_configs role is set to node to discover the cAdvisor address.

The remaining configuration is detailed elsewhere in this article.

Troubleshooting

Invalid config format

The agent enters a CrashLoop if the configuration syntax or a value is invalid. Please check the logs to find the appropriate error message and debug the issue. If you're still unable to determine the root cause, please contact your Shoreline representative for assistance.

Verify metric data

When the Agent is running but no metric data is available, the first step is to check if you can query a specific metric using the CLI.

For example, assume your Metric Scraper configuration ingests the elasticsearch_os_cpu_percent metric and maps the pod Resource type. The following Op statement allows you to check if you're collecting metric data:

op>
pods | metric_query(metric_names="elasticsearch_os_cpu_percent")
When no Metric Query data is returned from an expected metric, that indicates there may be a scraper configuration issue.

Debugging from scraper logs

Prerequisites

Start by setting the Elixir logger level to info. There are two ways to do it.

  • Set the environment variable for the Shoreline Agent DaemonSet, which persistently changes the log level of all services on the Agent.

    spec:
      template:
        spec:
          containers:
            - env:
                - name: ELIXIR_LOGGER_LEVEL
                  value: info
    
  • Alternatively, change the log level of the scraper at runtime. To do so, make an HTTP POST request to http://localhost:5252/debug/loglevel/info.

    For example, using cURL:

    $ curl -X POST http://localhost:5252/debug/loglevel/info
    

Check the scraper logs

The basic syntax to get logs via kubectl is:

kubectl --kubeconfig <kubeconfig_path> -n <namespace> logs -f <container_name>

The -f flag streams the result to stdout, which allows you to use tools like grep to filter the results.

For example, to parse the logs for references to the cadvisor scraper job you might use something like this:

$ kubectl --kubeconfig .kube/acme-cust-config -n acme-cust logs -f acme-foo | grep "cadvisor"

time=2021-08-04T15:51:00.260 level=info msg=total dropped targets by labels: 0 node=10.78.132.59 scrapePool=cadvisor service=scraper_service
time=2021-08-04T15:51:00.260 level=info msg=total active targets: 1 node=10.78.132.59 scrapePool=cadvisor service=scraper_service
time=2021-08-04T15:51:01.106 level=info msg=metric datas: 439 node=10.78.132.59 scraperId=cadvisor-https://10.78.140.79:10250/metrics/cadvisor-5eecec86cb2cad3 service=scraper_service
time=2021-08-04T15:51:02.099 level=info msg=metric datas: 439 node=10.78.132.59 scraperId=cadvisor-https://10.78.140.79:10250/metrics/cadvisor-5eecec86cb2cad3 service=scraper_service
time=2021-08-04T15:51:03.125 level=info msg=metric datas: 439 node=10.78.132.59 scraperId=cadvisor-https://10.78.140.79:10250/metrics/cadvisor-5eecec86cb2cad3 service=scraper_service
time=2021-08-04T15:51:04.102 level=info msg=metric datas: 439 node=10.78.132.59 scraperId=cadvisor-https://10.78.140.79:10250/metrics/cadvisor-5eecec86cb2cad3 service=scraper_service

In this case the cadvisor scraper job is working and generating metric data. However, there are a number of key indicators to look for if there's a potential problem:

  1. If the total dropped targets by labels is 0, it means there is no target discovered based on the role you set in the config. For kubernetes_sd_config, the namespace and k8s label/field selector is applied. Please check if they are set correctly.
  2. If the total active targets is 0, it means there is no target that matches the conditions after target relabeling (i.e., after applying the relabel_config rules). Please check if the relabel_config is set correctly.
  3. If there's a target is not found issue, please make sure the exporter is running on the same host as the Agent where you are checking the logs.
  4. If the total active targets is greater than or equal to 1, but metric datas is 0:
    • It could be that the resource mapping failed. You should be able to find log entries with Exporter resource lookup not found in exporter mode. If you set the resource mapping_config.mode to metric, please remember to set the log to debug level. You should be able to find the log entry with lookupResource not found resource id. Please check if you set the __shoreline_rq related tags correctly. You can verify it by performing the same Resource query within the CLI.
    • Alternatively, it could be that the metrics are all filtered out by your metric_relabel_config.

Examples

Below is a log example of the Resource not found at the metric level with a job name of elasticsearch.

$ kubectl --kubeconfig .kube/acme-cust-config -n acme-cust logs -f acme-foo | grep "elasticsearch"

time=2021-08-16T04:49:18.085 level=debug msg=lookupResource not found resource id, lset: {__name__="elasticsearch_indices_flush_time_seconds", cluster="op-packs", es_client_node="true", es_data_node="true", es_ingest_node="true", es_master_node="true", host="10.78.131.129", name="op-packs-es-default-2"} node=10.78.151.110 scraperId=elasticsearch-http://10.78.148.110:9108/metrics-b24efa0fb57d8b0d service=scraper_service

Here's an example of the Resource not found at the exporter level with a job name of argo-metrics-test.

$ kubectl --kubeconfig .kube/tacme-cust-config -n acme-cust logs -f acme-foo | grep "argo-metrics-test"

time=2021-08-04T15:51:00.260 level=info msg=total dropped targets by labels: 0 node=10.78.132.59 scrapePool=argo-metrics-test service=scraper_service
time=2021-08-04T15:51:00.260 level=info msg=total active targets: 1 node=10.78.132.59 scrapePool=argo-metrics-test service=scraper_service
time=2021-08-04T15:51:30.000 level=info msg=Exporter resource lookup not found, retry in next round node=10.78.132.59 scraperId=argo-metrics-test-http://172.20.232.37:9090/metrics-1cf98a3b052a73f3 service=scraper_service

Verify the pod and Agent share a node

In some cases it may be useful to execute a Resource query via the CLI to verify that the pod you're not getting metrics from actually shares the same Kubernetes node as the Shoreline Agent.

To illustrate, consider the following cluster of pods:

$ kubectl --kubeconfig .kube/test12-cust-config -n test12-cust -n shoreline2 get pods -o wide

NAME                                                              IP              NODE
elastic-exporter-v1-0-0-prometheus-elasticsearch-exporter-4l69z   10.78.148.110   ip-10-78-134-108.us-west-2.compute.internal
acme-bar                                                          10.78.141.148   ip-10-78-147-219.us-west-2.compute.internal
acme-foo                                                          10.78.141.98    ip-10-78-147-108.us-west-2.compute.internal
shoreline-jwbph                                                   10.78.151.110   ip-10-78-134-108.us-west-2.compute.internal

In this case, the Shoreline Agent pod (shoreline-jwbph) is on the same node as the exporter pod (elastic-exporter) and the acme-foo pod. Thus, to get metrics from acme-foo your metric_relabel_config might look something like this:

metric_relabel_configs:
  - target_label: __shoreline_rq_type
    action: replace
    replacement: pod
  - replacement: shoreline2
    action: replace
    target_label: __shoreline_rq_tags_namespace
  - replacement: acme-foo
    action: replace
    target_label: __shoreline_rq_tags_pod_name

In this case, the acme-foo pod is correctly configured, so metrics from the elastic-exporter should be coming through.

If you know the pod name and Agent Kubernetes node name, you can use a simple Resource query to quickly determine if a pod shares the same host as the Agent:

op>
hosts | k8s_node_name="ip-10-78-134-108.us-west-2.compute.internal" | .pods | namespace="acme" | pod_name="acme-foo"
ID | TYPE | NAME          | REGION    | AZ
11 | POD  | acme.acme-foo | us-west-2 | us-west-2b

This Op statement is broken into the following sub-queries:

  1. Get all host Resources
  2. Filter hosts to those with a k8s node name of ip-10-78-134-108.us-west-2.compute.internal
  3. Get all pods within the target host
  4. Find the pod with the acme namespace and acme-foo name

Conversely, the acme-bar pod does not share the same Kubernetes node (i.e., host) as the Shoreline Agent, which means metrics for acme-bar aren't ingested.

Again, we can use the same Resource query syntax to verify this disconnect:

op>
hosts | k8s_node_name="ip-10-78-134-108.us-west-2.compute.internal" | .pods | namespace="acme" | pod_name="acme-bar"

Since acme-bar is not within the ip-10-78-134-108.us-west-2.compute.internal host, the query above returns no result.