From babb5d61aa0c42b257e7b1616542f11ee6b5f8dc Mon Sep 17 00:00:00 2001
From: Andreas Wrede <andreas@wrede.ca>
Date: Mon, 4 May 2026 12:46:35 +0200
Subject: [PATCH] docs: update README with changes since 917d6a4

- ZFS monitor plugin (zfs_monitor) added to plugin list and features
- nagios_runner: async execution, stderr capture, signal handling, path validation
- Threshold alerting: de-escalation suppression, short-duration suppression, ping_monitor thresholds
- Per-host watch flag and role-filtered dashboards
- HTTP API & Web UI: hostname links in Live View, Host Overview with ZFS renderer, alert pie chart in nav bar, Settings threshold viewer
- hbc connection retry: indefinite retry for IPv4; IPv6 dropped after 3 early startup failures
- hbc daemon mode: logs routed to syslog after daemonizing
- hbc_mini: noted zfs_monitor and IPv6 early-fail protection not available

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 README.md | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index 1194aa2..4c698bd 100644
--- a/README.md
+++ b/README.md
@@ -27,6 +27,7 @@ A lightweight daemon that listens for UDP heartbeat messages and acts on them: k
   - Configurable retention and backup management
 - **Plugin system for extensible monitoring** ✅
   - Collect system metrics (CPU, memory, disk, network)
+  - Monitor ZFS pool health, capacity, and I/O via `zpool(8)`
   - Execute existing Nagios monitoring plugins
   - Create custom plugins with simple Python classes
 - **Threshold alerting system** ✅
@@ -34,6 +35,8 @@ A lightweight daemon that listens for UDP heartbeat messages and acts on them: k
   - Hysteresis to prevent alert flapping
   - Automatic notifications on state changes
   - Re-notification for ongoing alerts
+- **Per-host watch flag** — set `watch: false` on any host to silence all notifications for that host without removing its configuration ✅
+- **Role-filtered dashboards** — Live Dashboard and Host Overview show only hosts where the logged-in user is owner or manager (admins see all) ✅
 - Modular codebase suitable for unit testing and CI ✅
 
 ---
@@ -61,12 +64,16 @@ Heartbeat includes a comprehensive plugin architecture that extends monitoring b
 - `network_monitor`: Monitors network interface statistics, bandwidth, and connections
 - `filesystem_info`: Collects mounted filesystem information (physical filesystems only by default)
 - `nagios_runner`: Executes Nagios monitoring plugins (check_disk, check_load, check_http, etc.)
+- `zfs_monitor`: Monitors ZFS pool health, capacity, fragmentation, dedup ratio, and cumulative I/O via `zpool(8)`
 
 ### Nagios Integration
 
 The `nagios_runner` plugin provides seamless integration with the vast Nagios plugin ecosystem. You can run any Nagios-compatible plugin and have the results automatically parsed and stored:
 
-- Executes plugins via subprocess with timeout protection
+- Executes plugins asynchronously (non-blocking) with timeout protection
+- Captures both stdout and stderr; if stdout is empty, stderr is used as the status message
+- Handles signal-killed processes (negative exit code → UNKNOWN status)
+- Validates absolute command paths at startup and warns on missing or non-executable files
 - Parses exit codes (OK/WARNING/CRITICAL/UNKNOWN)
 - Extracts performance data with thresholds
 - Reports aggregated status across all configured checks
@@ -147,9 +154,11 @@ Heartbeat includes a sophisticated threshold alerting system that monitors plugi
 - **Multi-level alerts**: WARNING and CRITICAL severity levels
 - **Flexible operators**: Support for >, >=, <, <=, ==, != comparisons
 - **Hysteresis**: Prevents alert flapping with configurable recovery thresholds
-- **Smart notifications**: Alerts only on state changes, not every check
+- **Smart notifications**: Alerts only on state changes, not every check; de-escalations (e.g. CRITICAL → WARNING) do not generate a notification
 - **Re-notifications**: Periodic reminders for ongoing alerts
+- **Short-duration suppression**: Recovery notifications are suppressed for down events under 4 seconds (avoids noise from transient blips)
 - **Journal integration**: All threshold events logged for audit trail
+- **`ping_monitor` thresholds**: Latency and packet-loss thresholds use the same format as all other plugin metrics
 
 ### Configuration
 
@@ -363,9 +372,10 @@ Heartbeat includes a built-in HTTP/WebSocket server that provides both a REST AP
 ### Web Dashboards
 
 - **Login** (`/login`): Browser login form (shown automatically when auth is configured)
-- **Live View** (`/live`): Real-time host connectivity, latency, and messages
-- **Plugin Metrics** (`/plugins`): Browse and visualize metrics from all plugins
-- **Alerts Dashboard** (`/alerts`): Monitor active alerts with severity filtering
+- **Live View** (`/live`): Real-time host connectivity, latency, and messages; hostnames link directly to the Host Overview page
+- **Host Overview** (`/plugins/<host>`): Per-host plugin metrics with ZFS pool visualization; filtered to hosts where the logged-in user is owner or manager (admins see all)
+- **Alerts Dashboard** (`/alerts`): Monitor active alerts with severity filtering; alert count pie chart shown in the navigation bar
+- **Settings** (`/settings`): Server configuration, user management, and threshold configuration viewer
 
 ### API Endpoints
 
@@ -476,6 +486,10 @@ plugins:
 
 All monitoring plugins default to 5-minute (300 second) intervals, but can be customized as needed.
 
+**Connection retry:** If a server is temporarily unreachable, `hbc` retries `open()` indefinitely on every heartbeat interval. IPv6 connections that never succeeded during early startup are dropped after 3 consecutive failures (to handle hosts without IPv6 routing), while IPv4 connections always retry.
+
+**Daemon logging:** When running with `-d`, `hbc` routes all log output to syslog (`LOG_DAEMON` facility) after daemonizing. Without `-d`, logs go to stderr as usual.
+
 ### hbc_mini — single-file client (no external dependencies)
 
 `scripts/hbc_mini.py` is a self-contained version of the heartbeat client that requires only Python 3.8+ and no external packages. Copy it to any host and run it directly — no virtualenv, no `pip install`.
@@ -531,8 +545,10 @@ python3 hbc_mini.py -m "maintenance starting" your-server.example.com
 
 - No YAML config (use JSON instead)
 - No `filesystem_info` plugin
+- No `zfs_monitor` plugin (requires `zpool(8)` and the full plugin loader)
 - `cpu_monitor` does not report per-core usage or CPU frequency (no psutil)
 - Plugins cannot be loaded from external `.py` files — all plugins are compiled in
+- No IPv6 early-fail protection — connections that fail to open at startup are silently skipped rather than retried
 
 Everything else — heartbeat protocol, ACK/CMD/UPD handling, `hb_install.sh`-based self-update, daemonize, syslog — is identical to the full client.