Files
heartbeat/docs/NAGIOS_INTEGRATION.md
T
Andreas Wrede 0543266c92 Major refactoring of the codebase, including restructuring of files and directories, renaming of modules and classes, and improvements to the overall organization and readability of the code. This refactoring aims to enhance maintainability, scalability, and clarity of the codebase while preserving existing functionality. The changes include:
- Restructuring of the project directory into client and server components
- Renaming of modules and classes to better reflect their purpose and functionality
- Moving common utilities and configurations to a shared location
- Updating import statements to reflect the new structure
- Adding new documentation files for better clarity on various aspects of the project
- Removing deprecated or unused code to streamline the codebase
- Ensuring that all existing functionality is preserved and that the codebase remains functional after the refactoring.
2026-03-29 11:13:40 -04:00

7.6 KiB

Nagios Plugin Integration Guide

The Heartbeat monitoring system now supports running existing Nagios-compatible monitoring plugins through the nagios_runner plugin. This allows you to leverage the thousands of existing Nagios plugins without modification.

Quick Start

1. Install Nagios Plugins

Debian/Ubuntu:

sudo apt-get install nagios-plugins

RHEL/CentOS/Fedora:

sudo yum install nagios-plugins-all
# or
sudo dnf install nagios-plugins-all

Arch Linux:

sudo pacman -S monitoring-plugins

2. Configure Heartbeat

Add the nagios_runner section to your ~/.hb.yaml config:

nagios_runner:
  interval: 60          # Run plugins every 60 seconds
  timeout: 30           # Command timeout in seconds
  commands:
    - name: check_disk_root
      command: /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
    
    - name: check_load
      command: /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6
    
    - name: check_procs
      command: /usr/lib/nagios/plugins/check_procs -w 250 -c 400

3. Start Heartbeat Client

hbc -v localhost

The client will now execute the configured Nagios plugins and send their results to the server.

How It Works

Nagios Plugin Standard

Nagios plugins follow a simple interface:

  1. Exit Codes:

    • 0 = OK
    • 1 = WARNING
    • 2 = CRITICAL
    • 3 = UNKNOWN
  2. Output Format:

    STATUS - Message | performance_data
    
  3. Performance Data Format:

    'label'=value[UOM];[warn];[crit];[min];[max]
    

Example Plugin Output

$ /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
DISK OK - free space: / 156 GB (78%); | /=44GB;127;142;0;159

This output includes:

  • Status: DISK OK
  • Message: free space: / 156 GB (78%)
  • Performance Data: /=44GB;127;142;0;159
    • Current value: 44GB
    • Warning threshold: 127GB
    • Critical threshold: 142GB
    • Min: 0GB
    • Max: 159GB

Data Collected

The nagios_runner plugin collects:

For each configured command:

  • {name}_status - Status string (OK, WARNING, CRITICAL, UNKNOWN)
  • {name}_status_code - Numeric exit code (0-3)
  • {name}_output - Status message
  • {name}_{metric} - Each performance metric value
  • {name}_{metric}_uom - Unit of measurement (if present)
  • {name}_{metric}_warn - Warning threshold (if present)
  • {name}_{metric}_crit - Critical threshold (if present)
  • {name}_{metric}_min - Minimum value (if present)
  • {name}_{metric}_max - Maximum value (if present)

Overall:

  • overall_status - Worst status from all commands
  • overall_status_code - Worst status code
  • plugin_count - Number of Nagios plugins executed

Configuration Options

nagios_runner:
  # Collection interval in seconds (default: 60)
  interval: 60
  
  # Command execution timeout in seconds (default: 30)
  timeout: 30
  
  # Execute commands via shell (default: true)
  # Set to false for direct execution (more secure but less flexible)
  shell: true
  
  # List of Nagios plugins to run
  commands:
    - name: unique_name       # Required: unique identifier
      command: /path/to/plugin [args]  # Required: full command to execute

Common Nagios Plugins

System Resources

Disk Space:

- name: check_disk_root
  command: /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /

Load Average:

- name: check_load
  command: /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6

Swap Usage:

- name: check_swap
  command: /usr/lib/nagios/plugins/check_swap -w 20% -c 10%

Process Count:

- name: check_procs
  command: /usr/lib/nagios/plugins/check_procs -w 250 -c 400

Users Logged In:

- name: check_users
  command: /usr/lib/nagios/plugins/check_users -w 5 -c 10

Network Services

SSH:

- name: check_ssh
  command: /usr/lib/nagios/plugins/check_ssh localhost

HTTP:

- name: check_http_local
  command: /usr/lib/nagios/plugins/check_http -H localhost
  
- name: check_http_ssl
  command: /usr/lib/nagios/plugins/check_http -H example.com --ssl

DNS:

- name: check_dns
  command: /usr/lib/nagios/plugins/check_dns -H google.com

Ping:

- name: check_ping_gateway
  command: /usr/lib/nagios/plugins/check_ping -H 192.168.1.1 -w 100,20% -c 500,60%

Databases

MySQL:

- name: check_mysql
  command: /usr/lib/nagios/plugins/check_mysql -H localhost -u user -p password

PostgreSQL:

- name: check_pgsql
  command: /usr/lib/nagios/plugins/check_pgsql -H localhost -d database

Writing Custom Nagios Plugins

You can write your own Nagios-compatible plugins in any language. Here's a simple example:

Bash:

#!/bin/bash
# /usr/local/bin/check_example.sh

# Get the value to check
value=$(some_command)

# Define thresholds
warn=80
crit=90

# Check and output result
if [ $value -ge $crit ]; then
    echo "CRITICAL - Value is $value | value=${value};${warn};${crit};0;100"
    exit 2
elif [ $value -ge $warn ]; then
    echo "WARNING - Value is $value | value=${value};${warn};${crit};0;100"
    exit 1
else
    echo "OK - Value is $value | value=${value};${warn};${crit};0;100"
    exit 0
fi

Python:

#!/usr/bin/env python3
# /usr/local/bin/check_example.py

import sys

def check_something():
    value = get_value()  # Your check logic here
    warn = 80
    crit = 90
    
    perfdata = f"value={value};{warn};{crit};0;100"
    
    if value >= crit:
        print(f"CRITICAL - Value is {value} | {perfdata}")
        sys.exit(2)
    elif value >= warn:
        print(f"WARNING - Value is {value} | {perfdata}")
        sys.exit(1)
    else:
        print(f"OK - Value is {value} | {perfdata}")
        sys.exit(0)

if __name__ == "__main__":
    check_something()

Then configure in Heartbeat:

nagios_runner:
  commands:
    - name: my_custom_check
      command: /usr/local/bin/check_example.sh

Troubleshooting

Plugin not found

Error: Command not found

Solution: Use the full path to the plugin. Common locations:

  • /usr/lib/nagios/plugins/
  • /usr/lib64/nagios/plugins/
  • /usr/local/nagios/libexec/

Permission denied

Error: Permission denied

Solution: Ensure the plugin is executable:

chmod +x /path/to/plugin

Timeout errors

Command timed out after 30s

Solution: Increase the timeout in config:

nagios_runner:
  timeout: 60  # Increase timeout

No performance data

If performance data is not being parsed:

  1. Check plugin output includes | separator
  2. Verify performance data format: 'label'=value[UOM];...
  3. Enable debug logging: hbc -v -x localhost

Benefits

  1. Massive Plugin Library: Thousands of existing Nagios plugins available
  2. No Rewriting: Use plugins as-is without modification
  3. Community Support: Well-documented and maintained plugins
  4. Flexibility: Mix Nagios plugins with native Heartbeat plugins
  5. Standard Interface: Consistent exit codes and output format
  6. Performance Data: Automatic extraction of metrics

Resources

Next Steps

  • Configure threshold alerts based on Nagios plugin status codes
  • View plugin data in the Heartbeat web UI
  • Create custom plugins for your specific monitoring needs
  • Integrate with existing Nagios/Icinga configurations