Files
heartbeat/docs/NAGIOS_INTEGRATION.md
T
andreas a534c06b26 feat: nagios operator for direct exit-code severity mapping
Add ComparisonOperator.NAGIOS ("nagios") that maps Nagios exit codes
directly to alert levels (0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN) without
requiring numeric warning/critical thresholds. Hysteresis is bypassed for
discrete codes. Display template defaults to "{check_name}: {output}".
_format_display() handles None threshold_value gracefully.

Add nagios_runner.status_code as a built-in default threshold config so
nagios checks alert out of the box.

Also: fix alerts.html scrolling (override html,body), make hostname a link
to /plugins#<hostname>, remove overall_status/overall_status_code/plugin_count
from nagios_runner and hbc_mini, replace with computed worst-status in
plugins.html via nagiosWorstStatus() helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 12:26:56 -04:00

327 lines
7.4 KiB
Markdown

# Nagios Plugin Integration Guide
The Heartbeat monitoring system now supports running existing Nagios-compatible monitoring plugins through the `nagios_runner` plugin. This allows you to leverage the thousands of existing Nagios plugins without modification.
## Quick Start
### 1. Install Nagios Plugins
**Debian/Ubuntu:**
```bash
sudo apt-get install nagios-plugins
```
**RHEL/CentOS/Fedora:**
```bash
sudo yum install nagios-plugins-all
# or
sudo dnf install nagios-plugins-all
```
**Arch Linux:**
```bash
sudo pacman -S monitoring-plugins
```
### 2. Configure Heartbeat
Add the `nagios_runner` section to your `~/.hb.yaml` config:
```yaml
nagios_runner:
interval: 60 # Run plugins every 60 seconds
timeout: 30 # Command timeout in seconds
commands:
- name: check_disk_root
command: /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
- name: check_load
command: /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6
- name: check_procs
command: /usr/lib/nagios/plugins/check_procs -w 250 -c 400
```
### 3. Start Heartbeat Client
```bash
hbc -v localhost
```
The client will now execute the configured Nagios plugins and send their results to the server.
## How It Works
### Nagios Plugin Standard
Nagios plugins follow a simple interface:
1. **Exit Codes:**
- `0` = OK
- `1` = WARNING
- `2` = CRITICAL
- `3` = UNKNOWN
2. **Output Format:**
```
STATUS - Message | performance_data
```
3. **Performance Data Format:**
```
'label'=value[UOM];[warn];[crit];[min];[max]
```
### Example Plugin Output
```bash
$ /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
DISK OK - free space: / 156 GB (78%); | /=44GB;127;142;0;159
```
This output includes:
- **Status:** `DISK OK`
- **Message:** `free space: / 156 GB (78%)`
- **Performance Data:** `/=44GB;127;142;0;159`
- Current value: 44GB
- Warning threshold: 127GB
- Critical threshold: 142GB
- Min: 0GB
- Max: 159GB
### Data Collected
The `nagios_runner` plugin collects:
**For each configured command:**
- `{name}_status` - Status string (OK, WARNING, CRITICAL, UNKNOWN)
- `{name}_status_code` - Numeric exit code (0-3)
- `{name}_output` - Status message
- `{name}_{metric}` - Each performance metric value
- `{name}_{metric}_uom` - Unit of measurement (if present)
- `{name}_{metric}_warn` - Warning threshold (if present)
- `{name}_{metric}_crit` - Critical threshold (if present)
- `{name}_{metric}_min` - Minimum value (if present)
- `{name}_{metric}_max` - Maximum value (if present)
## Configuration Options
```yaml
nagios_runner:
# Collection interval in seconds (default: 60)
interval: 60
# Command execution timeout in seconds (default: 30)
timeout: 30
# Execute commands via shell (default: true)
# Set to false for direct execution (more secure but less flexible)
shell: true
# List of Nagios plugins to run
commands:
- name: unique_name # Required: unique identifier
command: /path/to/plugin [args] # Required: full command to execute
```
## Common Nagios Plugins
### System Resources
**Disk Space:**
```yaml
- name: check_disk_root
command: /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
```
**Load Average:**
```yaml
- name: check_load
command: /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6
```
**Swap Usage:**
```yaml
- name: check_swap
command: /usr/lib/nagios/plugins/check_swap -w 20% -c 10%
```
**Process Count:**
```yaml
- name: check_procs
command: /usr/lib/nagios/plugins/check_procs -w 250 -c 400
```
**Users Logged In:**
```yaml
- name: check_users
command: /usr/lib/nagios/plugins/check_users -w 5 -c 10
```
### Network Services
**SSH:**
```yaml
- name: check_ssh
command: /usr/lib/nagios/plugins/check_ssh localhost
```
**HTTP:**
```yaml
- name: check_http_local
command: /usr/lib/nagios/plugins/check_http -H localhost
- name: check_http_ssl
command: /usr/lib/nagios/plugins/check_http -H example.com --ssl
```
**DNS:**
```yaml
- name: check_dns
command: /usr/lib/nagios/plugins/check_dns -H google.com
```
**Ping:**
```yaml
- name: check_ping_gateway
command: /usr/lib/nagios/plugins/check_ping -H 192.168.1.1 -w 100,20% -c 500,60%
```
### Databases
**MySQL:**
```yaml
- name: check_mysql
command: /usr/lib/nagios/plugins/check_mysql -H localhost -u user -p password
```
**PostgreSQL:**
```yaml
- name: check_pgsql
command: /usr/lib/nagios/plugins/check_pgsql -H localhost -d database
```
## Writing Custom Nagios Plugins
You can write your own Nagios-compatible plugins in any language. Here's a simple example:
**Bash:**
```bash
#!/bin/bash
# /usr/local/bin/check_example.sh
# Get the value to check
value=$(some_command)
# Define thresholds
warn=80
crit=90
# Check and output result
if [ $value -ge $crit ]; then
echo "CRITICAL - Value is $value | value=${value};${warn};${crit};0;100"
exit 2
elif [ $value -ge $warn ]; then
echo "WARNING - Value is $value | value=${value};${warn};${crit};0;100"
exit 1
else
echo "OK - Value is $value | value=${value};${warn};${crit};0;100"
exit 0
fi
```
**Python:**
```python
#!/usr/bin/env python3
# /usr/local/bin/check_example.py
import sys
def check_something():
value = get_value() # Your check logic here
warn = 80
crit = 90
perfdata = f"value={value};{warn};{crit};0;100"
if value >= crit:
print(f"CRITICAL - Value is {value} | {perfdata}")
sys.exit(2)
elif value >= warn:
print(f"WARNING - Value is {value} | {perfdata}")
sys.exit(1)
else:
print(f"OK - Value is {value} | {perfdata}")
sys.exit(0)
if __name__ == "__main__":
check_something()
```
Then configure in Heartbeat:
```yaml
nagios_runner:
commands:
- name: my_custom_check
command: /usr/local/bin/check_example.sh
```
## Troubleshooting
### Plugin not found
```
Error: Command not found
```
**Solution:** Use the full path to the plugin. Common locations:
- `/usr/lib/nagios/plugins/`
- `/usr/lib64/nagios/plugins/`
- `/usr/local/nagios/libexec/`
### Permission denied
```
Error: Permission denied
```
**Solution:** Ensure the plugin is executable:
```bash
chmod +x /path/to/plugin
```
### Timeout errors
```
Command timed out after 30s
```
**Solution:** Increase the timeout in config:
```yaml
nagios_runner:
timeout: 60 # Increase timeout
```
### No performance data
If performance data is not being parsed:
1. Check plugin output includes `|` separator
2. Verify performance data format: `'label'=value[UOM];...`
3. Enable debug logging: `hbc -v -x localhost`
## Benefits
1. **Massive Plugin Library:** Thousands of existing Nagios plugins available
2. **No Rewriting:** Use plugins as-is without modification
3. **Community Support:** Well-documented and maintained plugins
4. **Flexibility:** Mix Nagios plugins with native Heartbeat plugins
5. **Standard Interface:** Consistent exit codes and output format
6. **Performance Data:** Automatic extraction of metrics
## Resources
- [Nagios Plugin Development Guidelines](https://nagios-plugins.org/doc/guidelines.html)
- [Monitoring Plugins Project](https://www.monitoring-plugins.org/)
- [Nagios Exchange](https://exchange.nagios.org/) - Plugin repository
- [Check_MK Local Checks](https://docs.checkmk.com/latest/en/localchecks.html) - Compatible format
## Next Steps
- Configure threshold alerts based on Nagios plugin status codes
- View plugin data in the Heartbeat web UI
- Create custom plugins for your specific monitoring needs
- Integrate with existing Nagios/Icinga configurations