Featured image of post Monitor instances

Monitor instances

Export metrics, collect them, visualize them.

# Monitoring

Monitoring your instances allow you to keep track of servers load and its health overtime. Even looking at the stats once a day can make a huge difference as it allows you to prevent catastrophic disasters before they even happen.
I have been monitoring my servers with this method for years and I had many cases I was grateful for setting it all up. In this small article I have included two guides to set these services up. First is with NixOs and I also explain with docker-compose but it’s very sore as the main focus of this article is NixOS.

Made with Excalidraw

Prometheus
Prometheus is an open-source monitoring system. It helps to track, collect, and analyze metrics from various applications and infrastructure components. It collects metrics from other software called exporters that server a HTTP endpoint that return data in the prometheus data format.
Here is an example from node-exporter

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# curl http://localhost:9100

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 2.54196405e+06
node_cpu_seconds_total{cpu="0",mode="iowait"} 4213.44
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="nice"} 0.06
node_cpu_seconds_total{cpu="0",mode="softirq"} 743.4
...

Grafana
Grafana is an open-source data visualization and monitoring platform. It has hundreds of features embedded that can help you query from data sources like Prometheus, InfluxDB, MySQL and so on…

# NixOs

Nix makes it trivial to set up these services, as there are already predefined options for it in nixpkgs. I will give you example configuration files below that you can just copy and paste.

I have a guide on remote deployment for NixOs, below you can see an example on a folder structure you can use to deploy the services.

      • some-service.nix
        • prometheus.nix
        • grafana.nix
          • node.nix
          • smartctl.nix
    • configuration.nix
    • flake.nix
    • flake.lock
  • # Exporters

    First is node-exporter. It exports all kind of system metrics ranging from cpu usage, load average and even systemd service count.

    # Node-exporter

    1
    2
    3
    4
    5
    6
    7
    8
    
    # /services/monitoring/exporters/node.nix
    { pkgs, ... }: {
      services.prometheus.exporters.node = {
        enable = true;
        #port = 9001; #default is 9100
        enabledCollectors = [ "systemd" ];
      };
    }
    

    # Smartctl

    Smartctl is a tool included in the smartmontools package. It is a collection of monitoring tools for hard-drives, SSDs and filesystems.
    This exporter enables you to check up on the health of your drive(s). And it will also give you a wall notifications if one of your drives has a bad sector(s), which mainly suggests it’s dying off.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
    # /services/monitoring/exporters/smartctl.nix
    { pkgs, ... }: {
      # exporter
      services.prometheus.exporters.smartctl = {
        enable = true;
        devices = [ "/dev/sda" ];
      };
      # for wall notifications
      services.smartd = {
        enable = true;
        notifications.wall.enable = true;
        devices = [
          {
            device = "/dev/sda";
          }
        ];
      };
    
    }
    

    If you happen to have other drives you can just use lsblk to check their paths

    1
    
    nix-shell -p util-linux --command lsblk
    

    For example here is my pc’s drives

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    
    NAME                              MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
    sda                                 8:0    1     0B  0 disk
    nvme1n1                           259:0    0 476,9G  0 disk
    ├─nvme1n1p1                       259:1    0   512M  0 part  /boot
    ├─nvme1n1p2                       259:2    0 467,6G  0 part
    │ └─luks-bbb8e429-bee1-4b5e-8ce8-c54f5f4f29a2
    │                                 254:0    0 467,6G  0 crypt /nix/store
    │                                                            /
    └─nvme1n1p3                       259:3    0   8,8G  0 part
      └─luks-f7e86dde-55a5-4306-a7c2-cf2d93c9ee0b
                                      254:1    0   8,8G  0 crypt [SWAP]
    nvme0n1                           259:4    0 931,5G  0 disk  /mnt/data
    

    # Prometheus

    Now that we have setup these two exporters we need to somehow collect their metrics.
    Here is a config file for prometheus, with the scrape configs already written down.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    
    # /services/monitoring/prometheus.nix
    {pkgs, config, ... }:{
    
      services.prometheus = {
        enable = true;
    
        scrapeConfigs = [
          {
            job_name = "node";
            scrape_interval = "5s";
            static_configs = [
              {
                targets = [ "localhost:${toString config.services.prometheus.exporters.node.port}" ];
                labels = { alias = "node.server1.local"; };
              }
            ];
          }
          {
            job_name = "smartctl";
            scrape_interval = "5s";
            static_configs = [
              {
                targets = [ "localhost:${toString config.services.prometheus.exporters.smartctl.port}" ];
                labels = { alias = "smartctl.server1.local"; };
              }
            ];
          }
        ];
      };
    }
    

    I recommend setting the 5s delay to a bigger number if you have little storage as you can imagine it can generate a lot of data. ~16kB average per scrape (node-exporter). 1 day has 86400 seconds, divide that by 5 thats 17280 scrapes a day.
    17280 * 16 = 276480 kB. Thats 270 megabytes a day. And if you have multiple servers that causes X times as much.
    30 days of scarping is about 8 gigabytes (1x). But remember, by default prometheus stores data for 30 days!

    # Grafana

    Now let’s get onto gettin’ a sexy dashboard like this. First we gotta setup grafana.

    Node exporter full (id 1860)

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    
    # /services/monitoring/grafana.nix
    { pkgs, config, ... }:
    let
      grafanaPort = 3000;
    in
    {
      services.grafana = {
        enable = true;
        settings.server = {
          http_port = grafanaPort;
          http_addr = "0.0.0.0";
        };
        provision = {
          enable = true;
          datasources.settings.datasources = [
            {
              name = "prometheus";
              type = "prometheus";
              url = "http://127.0.0.1:${toString config.services.prometheus.port}";
              isDefault = true;
            }
          ];
        };
      };
    
      networking.firewall = {
        allowedTCPPorts = [ grafanaPort ];
        allowedUDPPorts = [ grafanaPort ];
      };
    }
    

    If you want to access it via the internet, change the following:

    • http_addr = "127.0.0.1"
    • remove the firewall allowed ports

    This insures data will only flow thru the nginx reverse proxy

    Remember to set networking.domain = "example.com" to your domain.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    
    # /services/nginx.nix
    { pkgs, config, ... }:
    let
      url = "http://127.0.0.1:${toString config.services.grafana.settings.server.http_port}";
    in {
      services.nginx = {
        enable = true;
    
        virtualHosts = {
          "grafana.${config.networking.domain}" = {
            # Auto cert by let's encrypt
            forceSSL = true;
            enableACME = true;
    
            locations."/" = {
              proxyPass = url;
              extraConfig = "proxy_set_header Host $host;";
            };
    
            locations."/api" = {
              extraConfig = ''
                proxy_http_version 1.1;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection $connection_upgrade;
                proxy_set_header Host $host;
              '';
              proxyPass = url;
            };
          };
        };
      };
    
      # enable 80 and 443 ports for nginx
      networking.firewall = {
        enable = true;
        allowedTCPPorts = [
          443
          80
        ];
        allowedUDPPorts = [
          443
          80
        ];
      };
    }
    

    # Log in

    The default user is admin and password is admin. Grafana will ask you to change it upon logging-in!

    # Add the dashboards

    For node-exporter you can go to dashboards –> new –> import –> paste in 1860
    Now you can see all the metrics of all your server(s).

    # Docker-compose

    • docker-compose.yml
    • prometheus.nix
  • # Compose project

    I did not include a reverse proxy, neither smartctl as I forgot how to actually do it, that’s how long I’ve been using nix :/

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    
    # docker-compose.yml
    version: "3.8"
    
    networks:
      monitoring:
        driver: bridge
    
    volumes:
      prometheus_data: {}
    
    services:
      node-exporter:
        image: prom/node-exporter:latest
        container_name: node-exporter
        restart: unless-stopped
        hostname: node-exporter
        volumes:
          - /proc:/host/proc:ro
          - /sys:/host/sys:ro
          - /:/rootfs:ro
        command:
          - "--path.procfs=/host/proc"
          - "--path.rootfs=/rootfs"
          - "--path.sysfs=/host/sys"
          - "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
        networks:
          - monitoring
    
      prometheus:
        image: prom/prometheus:latest
        container_name: prometheus
        restart: unless-stopped
        hostname: prometheus
        volumes:
          - ./prometheus.yml:/etc/prometheus/prometheus.yml
          - prometheus_data:/prometheus
        command:
          - "--config.file=/etc/prometheus/prometheus.yml"
          - "--storage.tsdb.path=/prometheus"
          - "--web.console.libraries=/etc/prometheus/console_libraries"
          - "--web.console.templates=/etc/prometheus/consoles"
          - "--web.enable-lifecycle"
        networks:
          - monitoring
    
     grafana:
        image: grafana/grafana:latest
        container_name: grafana
        networks:
          - monitoring
        restart: unless-stopped
        ports:
         - '3000:3000'
    
    1
    2
    3
    4
    5
    6
    7
    8
    
    # ./prometheus.yml
    global:
      scrape_interval: 5s
    
    scrape_configs:
      - job_name: "node"
        static_configs:
          - targets: ["node-exporter:9100"]
    
    1
    
    docker compose up -d
    

    # Setup prometheus as data source inside grafana

    Head to Connections –> Data sources –> Add new data source –> Prometheus
    Type in http://prometheus:9090 as the URL, on the bottom click Save & test.

    Now you can add the dashboards, explained in this section

    Photo by Chris Yang on Unsplash