Files
heartbeat/docs/NAGIOS_INTEGRATION.md
andreas a534c06b26 feat: nagios operator for direct exit-code severity mapping
Add ComparisonOperator.NAGIOS ("nagios") that maps Nagios exit codes
directly to alert levels (0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN) without
requiring numeric warning/critical thresholds. Hysteresis is bypassed for
discrete codes. Display template defaults to "{check_name}: {output}".
_format_display() handles None threshold_value gracefully.

Add nagios_runner.status_code as a built-in default threshold config so
nagios checks alert out of the box.

Also: fix alerts.html scrolling (override html,body), make hostname a link
to /plugins#<hostname>, remove overall_status/overall_status_code/plugin_count
from nagios_runner and hbc_mini, replace with computed worst-status in
plugins.html via nagiosWorstStatus() helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 12:26:56 -04:00

7.4 KiB

Nagios Plugin Integration Guide

The Heartbeat monitoring system now supports running existing Nagios-compatible monitoring plugins through the nagios_runner plugin. This allows you to leverage the thousands of existing Nagios plugins without modification.

Quick Start

1. Install Nagios Plugins

Debian/Ubuntu:

sudo apt-get install nagios-plugins

RHEL/CentOS/Fedora:

sudo yum install nagios-plugins-all
# or
sudo dnf install nagios-plugins-all

Arch Linux:

sudo pacman -S monitoring-plugins

2. Configure Heartbeat

Add the nagios_runner section to your ~/.hb.yaml config:

nagios_runner:
  interval: 60          # Run plugins every 60 seconds
  timeout: 30           # Command timeout in seconds
  commands:
    - name: check_disk_root
      command: /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
    
    - name: check_load
      command: /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6
    
    - name: check_procs
      command: /usr/lib/nagios/plugins/check_procs -w 250 -c 400

3. Start Heartbeat Client

hbc -v localhost

The client will now execute the configured Nagios plugins and send their results to the server.

How It Works

Nagios Plugin Standard

Nagios plugins follow a simple interface:

  1. Exit Codes:

    • 0 = OK
    • 1 = WARNING
    • 2 = CRITICAL
    • 3 = UNKNOWN
  2. Output Format:

    STATUS - Message | performance_data
    
  3. Performance Data Format:

    'label'=value[UOM];[warn];[crit];[min];[max]
    

Example Plugin Output

$ /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
DISK OK - free space: / 156 GB (78%); | /=44GB;127;142;0;159

This output includes:

  • Status: DISK OK
  • Message: free space: / 156 GB (78%)
  • Performance Data: /=44GB;127;142;0;159
    • Current value: 44GB
    • Warning threshold: 127GB
    • Critical threshold: 142GB
    • Min: 0GB
    • Max: 159GB

Data Collected

The nagios_runner plugin collects:

For each configured command:

  • {name}_status - Status string (OK, WARNING, CRITICAL, UNKNOWN)
  • {name}_status_code - Numeric exit code (0-3)
  • {name}_output - Status message
  • {name}_{metric} - Each performance metric value
  • {name}_{metric}_uom - Unit of measurement (if present)
  • {name}_{metric}_warn - Warning threshold (if present)
  • {name}_{metric}_crit - Critical threshold (if present)
  • {name}_{metric}_min - Minimum value (if present)
  • {name}_{metric}_max - Maximum value (if present)

Configuration Options

nagios_runner:
  # Collection interval in seconds (default: 60)
  interval: 60
  
  # Command execution timeout in seconds (default: 30)
  timeout: 30
  
  # Execute commands via shell (default: true)
  # Set to false for direct execution (more secure but less flexible)
  shell: true
  
  # List of Nagios plugins to run
  commands:
    - name: unique_name       # Required: unique identifier
      command: /path/to/plugin [args]  # Required: full command to execute

Common Nagios Plugins

System Resources

Disk Space:

- name: check_disk_root
  command: /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /

Load Average:

- name: check_load
  command: /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6

Swap Usage:

- name: check_swap
  command: /usr/lib/nagios/plugins/check_swap -w 20% -c 10%

Process Count:

- name: check_procs
  command: /usr/lib/nagios/plugins/check_procs -w 250 -c 400

Users Logged In:

- name: check_users
  command: /usr/lib/nagios/plugins/check_users -w 5 -c 10

Network Services

SSH:

- name: check_ssh
  command: /usr/lib/nagios/plugins/check_ssh localhost

HTTP:

- name: check_http_local
  command: /usr/lib/nagios/plugins/check_http -H localhost
  
- name: check_http_ssl
  command: /usr/lib/nagios/plugins/check_http -H example.com --ssl

DNS:

- name: check_dns
  command: /usr/lib/nagios/plugins/check_dns -H google.com

Ping:

- name: check_ping_gateway
  command: /usr/lib/nagios/plugins/check_ping -H 192.168.1.1 -w 100,20% -c 500,60%

Databases

MySQL:

- name: check_mysql
  command: /usr/lib/nagios/plugins/check_mysql -H localhost -u user -p password

PostgreSQL:

- name: check_pgsql
  command: /usr/lib/nagios/plugins/check_pgsql -H localhost -d database

Writing Custom Nagios Plugins

You can write your own Nagios-compatible plugins in any language. Here's a simple example:

Bash:

#!/bin/bash
# /usr/local/bin/check_example.sh

# Get the value to check
value=$(some_command)

# Define thresholds
warn=80
crit=90

# Check and output result
if [ $value -ge $crit ]; then
    echo "CRITICAL - Value is $value | value=${value};${warn};${crit};0;100"
    exit 2
elif [ $value -ge $warn ]; then
    echo "WARNING - Value is $value | value=${value};${warn};${crit};0;100"
    exit 1
else
    echo "OK - Value is $value | value=${value};${warn};${crit};0;100"
    exit 0
fi

Python:

#!/usr/bin/env python3
# /usr/local/bin/check_example.py

import sys

def check_something():
    value = get_value()  # Your check logic here
    warn = 80
    crit = 90
    
    perfdata = f"value={value};{warn};{crit};0;100"
    
    if value >= crit:
        print(f"CRITICAL - Value is {value} | {perfdata}")
        sys.exit(2)
    elif value >= warn:
        print(f"WARNING - Value is {value} | {perfdata}")
        sys.exit(1)
    else:
        print(f"OK - Value is {value} | {perfdata}")
        sys.exit(0)

if __name__ == "__main__":
    check_something()

Then configure in Heartbeat:

nagios_runner:
  commands:
    - name: my_custom_check
      command: /usr/local/bin/check_example.sh

Troubleshooting

Plugin not found

Error: Command not found

Solution: Use the full path to the plugin. Common locations:

  • /usr/lib/nagios/plugins/
  • /usr/lib64/nagios/plugins/
  • /usr/local/nagios/libexec/

Permission denied

Error: Permission denied

Solution: Ensure the plugin is executable:

chmod +x /path/to/plugin

Timeout errors

Command timed out after 30s

Solution: Increase the timeout in config:

nagios_runner:
  timeout: 60  # Increase timeout

No performance data

If performance data is not being parsed:

  1. Check plugin output includes | separator
  2. Verify performance data format: 'label'=value[UOM];...
  3. Enable debug logging: hbc -v -x localhost

Benefits

  1. Massive Plugin Library: Thousands of existing Nagios plugins available
  2. No Rewriting: Use plugins as-is without modification
  3. Community Support: Well-documented and maintained plugins
  4. Flexibility: Mix Nagios plugins with native Heartbeat plugins
  5. Standard Interface: Consistent exit codes and output format
  6. Performance Data: Automatic extraction of metrics

Resources

Next Steps

  • Configure threshold alerts based on Nagios plugin status codes
  • View plugin data in the Heartbeat web UI
  • Create custom plugins for your specific monitoring needs
  • Integrate with existing Nagios/Icinga configurations