docs: update README with changes since 917d6a4
- ZFS monitor plugin (zfs_monitor) added to plugin list and features - nagios_runner: async execution, stderr capture, signal handling, path validation - Threshold alerting: de-escalation suppression, short-duration suppression, ping_monitor thresholds - Per-host watch flag and role-filtered dashboards - HTTP API & Web UI: hostname links in Live View, Host Overview with ZFS renderer, alert pie chart in nav bar, Settings threshold viewer - hbc connection retry: indefinite retry for IPv4; IPv6 dropped after 3 early startup failures - hbc daemon mode: logs routed to syslog after daemonizing - hbc_mini: noted zfs_monitor and IPv6 early-fail protection not available Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -27,6 +27,7 @@ A lightweight daemon that listens for UDP heartbeat messages and acts on them: k
|
|||||||
- Configurable retention and backup management
|
- Configurable retention and backup management
|
||||||
- **Plugin system for extensible monitoring** ✅
|
- **Plugin system for extensible monitoring** ✅
|
||||||
- Collect system metrics (CPU, memory, disk, network)
|
- Collect system metrics (CPU, memory, disk, network)
|
||||||
|
- Monitor ZFS pool health, capacity, and I/O via `zpool(8)`
|
||||||
- Execute existing Nagios monitoring plugins
|
- Execute existing Nagios monitoring plugins
|
||||||
- Create custom plugins with simple Python classes
|
- Create custom plugins with simple Python classes
|
||||||
- **Threshold alerting system** ✅
|
- **Threshold alerting system** ✅
|
||||||
@@ -34,6 +35,8 @@ A lightweight daemon that listens for UDP heartbeat messages and acts on them: k
|
|||||||
- Hysteresis to prevent alert flapping
|
- Hysteresis to prevent alert flapping
|
||||||
- Automatic notifications on state changes
|
- Automatic notifications on state changes
|
||||||
- Re-notification for ongoing alerts
|
- Re-notification for ongoing alerts
|
||||||
|
- **Per-host watch flag** — set `watch: false` on any host to silence all notifications for that host without removing its configuration ✅
|
||||||
|
- **Role-filtered dashboards** — Live Dashboard and Host Overview show only hosts where the logged-in user is owner or manager (admins see all) ✅
|
||||||
- Modular codebase suitable for unit testing and CI ✅
|
- Modular codebase suitable for unit testing and CI ✅
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -61,12 +64,16 @@ Heartbeat includes a comprehensive plugin architecture that extends monitoring b
|
|||||||
- `network_monitor`: Monitors network interface statistics, bandwidth, and connections
|
- `network_monitor`: Monitors network interface statistics, bandwidth, and connections
|
||||||
- `filesystem_info`: Collects mounted filesystem information (physical filesystems only by default)
|
- `filesystem_info`: Collects mounted filesystem information (physical filesystems only by default)
|
||||||
- `nagios_runner`: Executes Nagios monitoring plugins (check_disk, check_load, check_http, etc.)
|
- `nagios_runner`: Executes Nagios monitoring plugins (check_disk, check_load, check_http, etc.)
|
||||||
|
- `zfs_monitor`: Monitors ZFS pool health, capacity, fragmentation, dedup ratio, and cumulative I/O via `zpool(8)`
|
||||||
|
|
||||||
### Nagios Integration
|
### Nagios Integration
|
||||||
|
|
||||||
The `nagios_runner` plugin provides seamless integration with the vast Nagios plugin ecosystem. You can run any Nagios-compatible plugin and have the results automatically parsed and stored:
|
The `nagios_runner` plugin provides seamless integration with the vast Nagios plugin ecosystem. You can run any Nagios-compatible plugin and have the results automatically parsed and stored:
|
||||||
|
|
||||||
- Executes plugins via subprocess with timeout protection
|
- Executes plugins asynchronously (non-blocking) with timeout protection
|
||||||
|
- Captures both stdout and stderr; if stdout is empty, stderr is used as the status message
|
||||||
|
- Handles signal-killed processes (negative exit code → UNKNOWN status)
|
||||||
|
- Validates absolute command paths at startup and warns on missing or non-executable files
|
||||||
- Parses exit codes (OK/WARNING/CRITICAL/UNKNOWN)
|
- Parses exit codes (OK/WARNING/CRITICAL/UNKNOWN)
|
||||||
- Extracts performance data with thresholds
|
- Extracts performance data with thresholds
|
||||||
- Reports aggregated status across all configured checks
|
- Reports aggregated status across all configured checks
|
||||||
@@ -147,9 +154,11 @@ Heartbeat includes a sophisticated threshold alerting system that monitors plugi
|
|||||||
- **Multi-level alerts**: WARNING and CRITICAL severity levels
|
- **Multi-level alerts**: WARNING and CRITICAL severity levels
|
||||||
- **Flexible operators**: Support for >, >=, <, <=, ==, != comparisons
|
- **Flexible operators**: Support for >, >=, <, <=, ==, != comparisons
|
||||||
- **Hysteresis**: Prevents alert flapping with configurable recovery thresholds
|
- **Hysteresis**: Prevents alert flapping with configurable recovery thresholds
|
||||||
- **Smart notifications**: Alerts only on state changes, not every check
|
- **Smart notifications**: Alerts only on state changes, not every check; de-escalations (e.g. CRITICAL → WARNING) do not generate a notification
|
||||||
- **Re-notifications**: Periodic reminders for ongoing alerts
|
- **Re-notifications**: Periodic reminders for ongoing alerts
|
||||||
|
- **Short-duration suppression**: Recovery notifications are suppressed for down events under 4 seconds (avoids noise from transient blips)
|
||||||
- **Journal integration**: All threshold events logged for audit trail
|
- **Journal integration**: All threshold events logged for audit trail
|
||||||
|
- **`ping_monitor` thresholds**: Latency and packet-loss thresholds use the same format as all other plugin metrics
|
||||||
|
|
||||||
### Configuration
|
### Configuration
|
||||||
|
|
||||||
@@ -363,9 +372,10 @@ Heartbeat includes a built-in HTTP/WebSocket server that provides both a REST AP
|
|||||||
### Web Dashboards
|
### Web Dashboards
|
||||||
|
|
||||||
- **Login** (`/login`): Browser login form (shown automatically when auth is configured)
|
- **Login** (`/login`): Browser login form (shown automatically when auth is configured)
|
||||||
- **Live View** (`/live`): Real-time host connectivity, latency, and messages
|
- **Live View** (`/live`): Real-time host connectivity, latency, and messages; hostnames link directly to the Host Overview page
|
||||||
- **Plugin Metrics** (`/plugins`): Browse and visualize metrics from all plugins
|
- **Host Overview** (`/plugins/<host>`): Per-host plugin metrics with ZFS pool visualization; filtered to hosts where the logged-in user is owner or manager (admins see all)
|
||||||
- **Alerts Dashboard** (`/alerts`): Monitor active alerts with severity filtering
|
- **Alerts Dashboard** (`/alerts`): Monitor active alerts with severity filtering; alert count pie chart shown in the navigation bar
|
||||||
|
- **Settings** (`/settings`): Server configuration, user management, and threshold configuration viewer
|
||||||
|
|
||||||
### API Endpoints
|
### API Endpoints
|
||||||
|
|
||||||
@@ -476,6 +486,10 @@ plugins:
|
|||||||
|
|
||||||
All monitoring plugins default to 5-minute (300 second) intervals, but can be customized as needed.
|
All monitoring plugins default to 5-minute (300 second) intervals, but can be customized as needed.
|
||||||
|
|
||||||
|
**Connection retry:** If a server is temporarily unreachable, `hbc` retries `open()` indefinitely on every heartbeat interval. IPv6 connections that never succeeded during early startup are dropped after 3 consecutive failures (to handle hosts without IPv6 routing), while IPv4 connections always retry.
|
||||||
|
|
||||||
|
**Daemon logging:** When running with `-d`, `hbc` routes all log output to syslog (`LOG_DAEMON` facility) after daemonizing. Without `-d`, logs go to stderr as usual.
|
||||||
|
|
||||||
### hbc_mini — single-file client (no external dependencies)
|
### hbc_mini — single-file client (no external dependencies)
|
||||||
|
|
||||||
`scripts/hbc_mini.py` is a self-contained version of the heartbeat client that requires only Python 3.8+ and no external packages. Copy it to any host and run it directly — no virtualenv, no `pip install`.
|
`scripts/hbc_mini.py` is a self-contained version of the heartbeat client that requires only Python 3.8+ and no external packages. Copy it to any host and run it directly — no virtualenv, no `pip install`.
|
||||||
@@ -531,8 +545,10 @@ python3 hbc_mini.py -m "maintenance starting" your-server.example.com
|
|||||||
|
|
||||||
- No YAML config (use JSON instead)
|
- No YAML config (use JSON instead)
|
||||||
- No `filesystem_info` plugin
|
- No `filesystem_info` plugin
|
||||||
|
- No `zfs_monitor` plugin (requires `zpool(8)` and the full plugin loader)
|
||||||
- `cpu_monitor` does not report per-core usage or CPU frequency (no psutil)
|
- `cpu_monitor` does not report per-core usage or CPU frequency (no psutil)
|
||||||
- Plugins cannot be loaded from external `.py` files — all plugins are compiled in
|
- Plugins cannot be loaded from external `.py` files — all plugins are compiled in
|
||||||
|
- No IPv6 early-fail protection — connections that fail to open at startup are silently skipped rather than retried
|
||||||
|
|
||||||
Everything else — heartbeat protocol, ACK/CMD/UPD handling, `hb_install.sh`-based self-update, daemonize, syslog — is identical to the full client.
|
Everything else — heartbeat protocol, ACK/CMD/UPD handling, `hb_install.sh`-based self-update, daemonize, syslog — is identical to the full client.
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user