diff --git a/README.md b/README.md index 2e01cd3..86d973b 100644 --- a/README.md +++ b/README.md @@ -1,195 +1,354 @@ -# Heartbeat Daemon (hbd) โœ… +# Heartbeat Daemon (hbd) -A lightweight daemon that listens for UDP heartbeat messages and acts on them: keeps host state, optionally updates DNS records via `nsupdate`, forwards messages to WebSocket clients, and sends notifications (email, Pushover, Mattermost, Signal). It is a refactor of a previously monolithic script into a modular Python package (`hbd`). +A lightweight UDP-based host monitoring system. Monitored hosts run a client (`hbc`) that sends periodic heartbeat packets and system metrics to a central server (`hbd`). The server tracks host reachability, evaluates metric thresholds, sends notifications, and serves a web dashboard. --- -## ๐Ÿ“Œ Features +## Architecture -- Receive and parse heartbeat datagrams (text or zlib-compressed) โœ… -- Maintain host state and detect up/down transitions โœ… -- Queue DNS updates via `nsupdate` and run them in an asyncio background task โœ… -- WebSocket API for live updates (hosts & messages) โœ… -- Notification pipeline (email, Pushover, Mattermost, Signal) โœ… -- **User management & access control** โœ… - - Optional user accounts with bcrypt-style password hashing (stdlib only) - - Per-host roles: owner, manager, monitor - - Session-based auth with cookie support (browser login page included) - - Backwards compatible: no auth required when no users are configured -- **HTTP API & Web UI** โœ… - - REST API for plugin data, alerts, host information, and user management - - Live dashboard with WebSocket updates - - Interactive plugin metrics visualization - - Alerts dashboard with filtering and summaries -- **Message journal with automatic log rotation** โœ… - - Logs all received messages in JSON format - - Size-based automatic rotation - - Configurable retention and backup management -- **Plugin system for extensible monitoring** โœ… - - Collect system metrics (CPU, memory, disk, network) - - Monitor ZFS pool health, capacity, and I/O via `zpool(8)` - - Execute existing Nagios monitoring plugins - - Create custom plugins with simple Python classes -- **Threshold alerting system** โœ… - - Monitor metrics against configurable WARNING/CRITICAL thresholds - - Hysteresis to prevent alert flapping - - Automatic notifications on state changes - - Re-notification for ongoing alerts -- **Per-host watch flag** โ€” set `watch: false` on any host to silence all notifications for that host without removing its configuration โœ… -- **Role-filtered dashboards** โ€” Live Dashboard and Host Overview show only hosts where the logged-in user is owner or manager (admins see all) โœ… -- Modular codebase suitable for unit testing and CI โœ… +``` + [ host running hbc ] [ server running hbd ] + โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” + โ”‚ heartbeat client โ”‚ UDP 50003 โ”‚ heartbeat daemon โ”‚ + โ”‚ โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€> โ”‚ โ”‚ + โ”‚ plugins: โ”‚ HTB / PLG โ”‚ host state tracking โ”‚ + โ”‚ - cpu_monitor โ”‚ โ”‚ threshold evaluation โ”‚ + โ”‚ - memory_monitor โ”‚ <โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚ DNS updates (nsupdate) โ”‚ + โ”‚ - disk_monitor โ”‚ ACK/CMD/UPD โ”‚ notifications โ”‚ + โ”‚ - nagios_runner โ”‚ โ”‚ web dashboard + REST API โ”‚ + โ”‚ - ... โ”‚ โ”‚ WebSocket live updates โ”‚ + โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +``` + +**Package:** `hbd` v5.3.4 +**Python:** 3.11+ + +### Subpackages + +| Package | Purpose | +|---|---| +| `hbd.common` | Protocol encoding/decoding, shared utilities | +| `hbd.server` | The `hbd` daemon | +| `hbd.client` | The `hbc` client | --- -## ๐Ÿ”Œ Plugin System +## Installation -Heartbeat includes a comprehensive plugin architecture that extends monitoring beyond simple heartbeats. The plugin system allows you to: +Dependencies are declared in `pyproject.toml`. Install into a virtualenv: -- **Collect system information**: OS details, hardware info, system configuration -- **Monitor resources**: CPU usage, memory, disk space, network statistics -- **Run Nagios plugins**: Execute thousands of existing Nagios monitoring plugins without modification -- **Create custom plugins**: Build your own monitoring logic with simple Python classes +```bash +# Server + client +pip install . -### Plugin Types +# Using the install script +scripts/hb_install.sh +``` -- **InfoPlugin**: Collects static information once (e.g., OS version, hardware specs) -- **MonitorPlugin**: Collects metrics periodically (e.g., CPU usage every 30 seconds) +**Entry points:** +- `hbd` โ€” server (`hbd.server.cli:main`) +- `hbc` โ€” client (`hbd.client.main:main`) -### Built-in Plugins +**Runtime dependencies:** -- `os_info`: Collects OS, kernel, distribution, and architecture information -- `cpu_monitor`: Monitors CPU usage, load average, frequency, process counts, and uptime -- `memory_monitor`: Monitors RAM and swap usage, available memory (ZFS ARC-aware) -- `disk_monitor`: Monitors disk usage, I/O statistics, and filesystem metrics -- `network_monitor`: Monitors network interface statistics, bandwidth, and connections -- `ping_monitor`: Measures round-trip latency to configured hosts -- `filesystem_info`: Collects mounted filesystem information (physical filesystems only by default) -- `nagios_runner`: Executes Nagios monitoring plugins (check_disk, check_load, check_http, etc.) -- `zfs_monitor`: Monitors ZFS pool health, capacity, fragmentation, dedup ratio, and cumulative I/O via `zpool(8)` +| Component | Packages | +|---|---| +| Both | PyYAML โ‰ฅ6.0 | +| Client | psutil โ‰ฅ5.9.0 | +| Server | aiohttp โ‰ฅ3.11, websockets โ‰ฅ13.2, Jinja2 โ‰ฅ3.1.6, ruamel.yaml โ‰ฅ0.18, mattermostdriver โ‰ฅ7.3.0, matrix-nio โ‰ฅ0.24 | -### Nagios Integration +--- -The `nagios_runner` plugin provides seamless integration with the vast Nagios plugin ecosystem. You can run any Nagios-compatible plugin and have the results automatically parsed and stored: +## Server (`hbd`) -- Executes plugins asynchronously (non-blocking) with timeout protection -- Captures both stdout and stderr; if stdout is empty, stderr is used as the status message -- Handles signal-killed processes (negative exit code โ†’ UNKNOWN status) -- Validates absolute command paths at startup and warns on missing or non-executable files -- Parses exit codes (OK/WARNING/CRITICAL/UNKNOWN) -- Extracts performance data with thresholds -- Reports per-check status, exit code, and output; no aggregate rollup field +### Starting the server -See [docs/NAGIOS_INTEGRATION.md](docs/NAGIOS_INTEGRATION.md) for complete integration guide including configuration examples and custom plugin development. +```bash +# Foreground, verbose, with config file +hbd serve -c /etc/hb.yaml -f -v -### Creating Custom Plugins +# As a module +python -m hbd.server.cli serve -c /etc/hb.yaml +``` + +### CLI subcommands + +| Command | Description | +|---|---| +| `hbd serve` | Start the daemon (default) | +| `hbd passwd ` | Generate a password hash for config | +| `hbd notify` | Test notification channels | +| `hbd stop` | Stop a running daemon | +| `hbd reload` | Reload config (send SIGHUP) | +| `hbd restart` | Restart daemon | + +### Configuration (`~/.hb.yaml`) + +```yaml +# Network +hb_port: 50003 # UDP port for heartbeat messages +hbd_port: 50004 # HTTP API / web UI port +hbd_host: "" # Bind address (empty = all interfaces) +ws_port: 50005 # WebSocket port (plain) +wss_port: ~ # WebSocket port (TLS; requires cert_path/wss_pem/wss_key) + +# Timing +interval: 20 # Expected heartbeat interval (seconds) +grace: 2 # Extra seconds before declaring a host overdue + +# Persistence +pickfile: ~/.hb.pick # Host state persistence +pidfile: ~/.hb.pid +logfile: ~/.hb.log + +# Message journal +journal_enabled: true +journal_dir: /var/log/heartbeat +journal_file: messages.journal +journal_max_size: 104857600 # 100 MB +journal_max_backups: 10 + +# DNS +nsupdate_bin: /usr/bin/nsupdate +dyndomains: + - example.com + +# Threshold alert re-notification interval (seconds) +threshold_renotify_interval: 3600 + +# Notification channels +notification_channels: + pushover_ops: + type: pushover + token: YOUR_APP_TOKEN + user: YOUR_USER_KEY + email_ops: + type: email + smtp_server: smtp.example.com + port: 587 + user: alerts@example.com + password: secret + recipients: [ops@example.com] + +# Users +users: + alice: + full_name: Alice Smith + password: pbkdf2:sha256:... # generate with: hbd passwd alice + admin: true + notification_channels: [pushover_ops] + bob: + password: pbkdf2:sha256:... + notification_channels: [email_ops] + +default_owner: alice + +# Hosts +hosts: + webserver01: + dyndns: true # Update DNS when address changes + owner: alice + managers: [bob] + monitors: [] + database01: + watch: false # Suppress all notifications for this host +``` + +Send SIGHUP (or `hbd reload`) to reload configuration without restarting. Changes to ports, certificates, pickle path, and journal path require a full restart. + +### Persistence + +Host state (reachability, plugin data, alert states) is saved to `pickfile` every 5 minutes and on clean shutdown. The server loads this state on startup. + +--- + +## Client (`hbc`) + +### Usage + +```bash +# Basic โ€” send heartbeats to a server +hbc your-server.example.com + +# Multiple servers +hbc server1.example.com server2.example.com + +# With config file, running as a daemon +hbc -d -c /etc/hbc.yaml your-server.example.com + +# Send a boot message, then heartbeat normally +hbc -b your-server.example.com + +# One-off message +hbc -m "maintenance starting" your-server.example.com + +# Force IPv4 or IPv6 only +hbc -4 your-server.example.com +hbc -6 your-server.example.com +``` + +### Options + +| Flag | Description | +|---|---| +| `-b`, `--boot` | Send a boot message at startup | +| `-c`, `--config FILE` | Config file path (default: `~/.hbc.yaml`) | +| `-d`, `--daemon` | Daemonize (logs go to syslog) | +| `-m`, `--message TEXT` | Send a one-off message and exit | +| `-n`, `--name NAME` | Override reported hostname | +| `-v`, `--verbose` | Verbose output | +| `-x`, `--debug` | Debug level (repeatable) | +| `-4` / `-6` | Restrict to IPv4 or IPv6 | + +### Configuration (`~/.hbc.yaml`) + +```yaml +hb_port: 50003 # Server UDP port +interval: 10 # Heartbeat interval (seconds) +owner: alice # Optional: claim ownership of this host + +plugins: + cpu_monitor: + interval: 300 # Override collection interval + per_core: true # Report per-core CPU usage + memory_monitor: + interval: 300 + disk_monitor: + interval: 300 + network_monitor: + interval: 300 + ping_monitor: + interval: 60 + hosts: [8.8.8.8, 192.168.1.1] + nagios_runner: + interval: 300 + commands: + - name: check_load + command: /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6 + - name: check_disk_root + command: /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p / + zfs_monitor: + interval: 300 +``` + +### Connection behaviour + +- The client sends heartbeats over UDP to each server address resolved from the hostname (IPv4 and IPv6). +- If a connection fails to open at startup, IPv6 connections are dropped after 3 consecutive failures. IPv4 connections retry indefinitely. +- In daemon mode (`-d`), all log output goes to syslog (`LOG_DAEMON` facility). + +--- + +## UDP Protocol + +All messages are zlib-compressed key=value pairs with an ID prefix. + +``` +!: +``` + +Payload format: `key=value;key=value;...` + +| Message | Direction | Purpose | +|---|---|---| +| `HTB` | client โ†’ server | Heartbeat (name, timestamp, RTT, acks, interval) | +| `PLG` | client โ†’ server | Plugin data (plugin name + metrics) | +| `ACK` | server โ†’ client | Acknowledgment | +| `CMD` | server โ†’ client | Execute a shell command on the client | +| `UPD` | server โ†’ client | Trigger self-update via `hb_install.sh` | + +Value encoding: +- Floats: 5 decimal places +- Lists/dicts: JSON prefixed with `@` +- Booleans: `1` / `0` + +RTT is measured using kernel SO_TIMESTAMP when available (Linux, macOS, FreeBSD), falling back to application-layer timing. + +--- + +## Plugin System + +Plugins run on the client and collect system metrics that are sent to the server as `PLG` messages. + +### Plugin types + +| Type | `interval` | When collected | +|---|---|---| +| `InfoPlugin` | 0 | Once at startup; re-collected on server request | +| `MonitorPlugin` | 30 (default) | Periodically on the configured interval | + +### Built-in plugins + +| Plugin | Type | Data collected | +|---|---|---| +| `os_info` | Info | OS, kernel, distro, architecture, Python version, hbc version | +| `cpu_monitor` | Monitor | cpu_percent, per-core usage, load averages, process count, frequency | +| `memory_monitor` | Monitor | RAM and swap usage (ZFS ARC-aware) | +| `disk_monitor` | Monitor | Per-partition usage, disk I/O stats | +| `network_monitor` | Monitor | Per-interface byte/packet counts, connection count | +| `ping_monitor` | Monitor | RTT, packet loss, jitter per configured host | +| `filesystem_info` | Info | Mounted filesystems (excludes pseudo filesystems) | +| `nagios_runner` | Monitor | Output of configured Nagios-compatible check commands | +| `zfs_monitor` | Monitor | ZFS pool health, capacity, fragmentation, dedup ratio, I/O | + +### Custom plugins + +Create a `.py` file in `hbd/client/plugins/`: ```python from hbd.client.plugin import MonitorPlugin -class DiskMonitorPlugin(MonitorPlugin): - name = "disk_monitor" - interval = 60 # Run every 60 seconds - +class MyPlugin(MonitorPlugin): + name = "my_plugin" + interval = 60 + async def collect(self): - return { - "disk_usage": get_disk_usage(), - "timestamp": time.time() - } + return {"my_metric": 42} ``` -Place plugins in `hbd/client/plugins/` and they'll be automatically discovered and loaded by the client. +`initialize()` is called once at load time; return `False` to disable the plugin (e.g., if a required binary is missing). ---- +### Nagios integration -## ๐Ÿ“ Message Journal - -Heartbeat includes a message journal that logs all received messages with automatic rotation. - -### Features - -- **JSON Format**: All messages logged in JSONL (JSON Lines) format for easy parsing -- **Automatic Rotation**: Size-based rotation with configurable thresholds -- **Backup Management**: Keeps configurable number of rotated log files -- **Non-blocking**: Async logging with minimal performance impact - -### Configuration +The `nagios_runner` plugin executes any Nagios-compatible check binary: ```yaml -# Message journal settings -journal_enabled: true # Enable/disable journaling -journal_dir: /var/log/heartbeat # Journal directory -journal_file: messages.journal # Base filename -journal_max_size: 104857600 # Max size (100MB default) -journal_max_backups: 10 # Number of backups to keep +plugins: + nagios_runner: + commands: + - name: check_http + command: /usr/lib/nagios/plugins/check_http -H example.com ``` -### Example Journal Entry - -```json -{"timestamp":1711234567.123,"datetime":"2026-03-28T12:34:56","source_ip":"192.168.1.100","source_port":50003,"message":{"ID":"HTB","name":"webserver1","interval":30}} -``` - -### Analyzing Journal Files - -```bash -# View recent messages -tail -100 /var/log/heartbeat/messages.journal | jq . - -# Count messages by type -cat /var/log/heartbeat/messages.journal | jq -r '.message.ID' | sort | uniq -c - -# Filter by hostname -cat /var/log/heartbeat/messages.journal | jq 'select(.message.name == "webserver1")' -``` - -See [docs/MESSAGE_JOURNAL.md](docs/MESSAGE_JOURNAL.md) for complete documentation including rotation behavior, integration with log management systems, and analysis examples. +- Commands are validated (absolute paths, executable) at startup. +- Exit codes map to OK / WARNING / CRITICAL / UNKNOWN. +- Performance data fields are extracted and stored individually. +- The `nagios` threshold operator maps exit codes directly to alert levels (see Threshold Alerting). --- -## ๐Ÿšจ Threshold Alerting +## Threshold Alerting -Heartbeat includes a sophisticated threshold alerting system that monitors plugin metrics and triggers notifications when values exceed configured limits. - -### Features - -- **Multi-level alerts**: WARNING and CRITICAL severity levels -- **Flexible operators**: Support for >, >=, <, <=, ==, != comparisons -- **Hysteresis**: Prevents alert flapping with configurable recovery thresholds -- **Smart notifications**: Alerts only on state changes, not every check; de-escalations (e.g. CRITICAL โ†’ WARNING) do not generate a notification -- **Re-notifications**: Periodic reminders for ongoing alerts -- **Short-duration suppression**: Recovery notifications are suppressed for down events under 4 seconds (avoids noise from transient blips) -- **Journal integration**: All threshold events logged for audit trail -- **`ping_monitor` thresholds**: Latency and packet-loss thresholds use the same format as all other plugin metrics +The server evaluates plugin metrics against configurable thresholds and fires notifications on state changes. ### Configuration ```yaml thresholds: - # RTT (Round-Trip Time) thresholds for heartbeat monitoring - # These are checked on every HTB message arrival - rtt: - webserver01: - warning: 100.0 # Warn when RTT > 100ms - critical: 500.0 # Critical when RTT > 500ms - - database01: - warning: 50.0 - critical: 200.0 - - # Plugin metric thresholds cpu_monitor: cpu_percent: - warning: 80.0 # Warn when CPU > 80% - critical: 90.0 # Critical when CPU > 90% - operator: ">" - hysteresis: 0.02 # 2% hysteresis to prevent flapping - display: "(threshold: {op_symbol} {threshold_value}%)" # optional - + warning: 80.0 + critical: 90.0 + operator: ">" # >, >=, <, <=, ==, != (default: >) + hysteresis: 0.1 # 10%: recover at 81 when critical=90 + count: 1 # Require N consecutive breaches before alerting + display: "CPU {cpu_percent}% (threshold: {op_symbol}{threshold_value})" + memory_monitor: percent: warning: 85.0 critical: 95.0 - + disk_monitor: partitions: /: @@ -197,142 +356,19 @@ thresholds: warning: 80.0 critical: 90.0 free_gb: - warning: 10.0 # Alert when < 10GB free + warning: 10.0 critical: 5.0 - operator: "<" # Inverse threshold + operator: "<" -# Global settings -threshold_renotify_interval: 3600 # Re-notify every hour for ongoing alerts + nagios_runner: + status_code: + operator: "nagios" # 0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN + display: "{check_name}: {output}" ``` -### RTT Monitoring +### Per-host threshold profiles -Heartbeat monitors network latency (Round-Trip Time) for each host's heartbeat messages. RTT thresholds are **fully integrated with the threshold alerting system**: - -- **Per-host configuration**: Set different thresholds for each monitored host -- **Real-time checking**: Thresholds evaluated on every HTB message arrival -- **Alert state tracking**: RTT alerts use the same state management as plugin metrics -- **Hysteresis support**: Configurable hysteresis prevents rapid state transitions -- **Alerts dashboard**: RTT alerts visible on the `/alerts` web page alongside plugin alerts -- **Smart notifications**: Only triggers on state changes (OK โ†’ WARNING โ†’ CRITICAL) -- **Re-notification**: Periodic reminders for ongoing RTT issues -- **Event & journal logging**: All RTT events logged for audit trail - -**Configuration format:** -```yaml -thresholds: - rtt: - : - warning: # Warn when RTT > this value - critical: # Critical when RTT > this value - hysteresis: 0.02 # Optional: 2% hysteresis (default) -``` - -**Example alerts:** -``` -WARNING: webserver01 - rtt.webserver01 = 125.3 -CRITICAL: database01 - rtt.database01 = 520.1 -RECOVERED: webserver01 - rtt.webserver01 = 45.2 (WARNING -> OK) -``` - -RTT alerts appear on the Alerts dashboard and can be filtered by severity level. The `metric_path` format is `rtt.`, making it easy to distinguish from plugin metrics. - -### Alert Behavior - -1. **State Changes**: Notifications sent when crossing thresholds - - OK โ†’ WARNING: Early notification - - WARNING โ†’ CRITICAL: Escalation - - CRITICAL โ†’ OK: Recovery - -2. **Hysteresis**: Prevents rapid state transitions - ``` - Critical threshold: 90% - Hysteresis: 10% - Recovery threshold: 81% (90 - 10% of 90) - - Value 91% โ†’ CRITICAL (threshold crossed) - Value 85% โ†’ CRITICAL (still above 81%) - Value 79% โ†’ OK (below recovery threshold) - ``` - -3. **Re-notifications**: Periodic reminders for ongoing alerts - - Default: Every 60 minutes - - Configurable via `threshold_renotify_interval` - -### Example Notifications - -``` -WARNING: webserver01 - cpu_monitor.cpu_percent = 85.0 -CRITICAL: webserver01 - memory_monitor.percent = 96.0 -RECOVERED: database01 - disk_monitor./.percent = 75.0 (WARNING -> OK) -REMINDER (CRITICAL): mailserver - cpu_monitor.load_1min = 12.5 (ongoing for 3600s) -``` - -### Supported Metrics - -All plugin metrics can be thresholded: - -- **CPU**: cpu_percent, load_1min, load_5min, load_15min -- **Memory**: percent, available_mb, swap_percent -- **Disk**: Per-partition percent, free_gb, free_mb -- **Network**: errors_total, dropped packets, connection counts -- **Nagios**: Any field emitted by `nagios_runner` (`_status_code`, `_status`, `_output`, performance data fields) - -### Display Format Templates - -Each threshold entry accepts an optional `display` field โ€” a Python format string shown in notifications and on the Alerts dashboard: - -```yaml -nagios_runner: - status_code: - warning: 1 - critical: 2 - operator: ">=" - display: "{check_name}: exit {value} (expected < {threshold_value})" -``` - -Available variables: - -| Variable | Description | -|---|---| -| `{value}` | Current metric value | -| `{threshold_value}` | Threshold that was crossed | -| `{op_symbol}` | Comparison operator (`>`, `<`, `>=`, โ€ฆ); `"nagios"` for the nagios operator | -| `{check_name}` | Prefix stripped by generic matching (see below) | -| `{metric_name}` | Full field name within the plugin data | -| `{output}` | For `nagios_runner` generic matches: the matched check's status text (alias for `{check_name}_output`) | -| `{status}` | For `nagios_runner` generic matches: the matched check's status name โ€” OK/WARNING/CRITICAL/UNKNOWN (alias for `{check_name}_status`) | -| any plugin field | Any other field present in the plugin's data | - -### Generic Threshold Matching - -When a metric name has no exact threshold entry, the server progressively strips leading underscore-separated segments and re-tries the lookup. This lets a single generic entry cover an entire family of metrics. - -The classic use case is `nagios_runner`, which names each metric after the command that produced it: - -``` -nagios_runner.check_disk_root_status_code โ†’ no exact match -nagios_runner.disk_root_status_code โ†’ no match -nagios_runner.root_status_code โ†’ no match -nagios_runner.status_code โ†’ matched โœ“ -``` - -Configure the generic threshold once using the `nagios` operator, which maps exit codes directly to alert severity without requiring numeric warning/critical values: - -```yaml -nagios_runner: - status_code: - operator: "nagios" # 0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN - display: "{check_name}: {output}" -``` - -The stripped prefix (`check_disk_root` in the example above) is available as `{check_name}` in the display template, so you can identify which check triggered the alert without writing a separate threshold entry per command. - -Exact matches always take priority. A generic entry only applies when no specific one is defined. - -### Per-Host Threshold Profiles - -Named threshold configurations let different hosts use different limits. A host's `threshold_config` can be a single name or a **list** โ€” lists are applied left-to-right so profiles compose without duplication: +Named profiles let different hosts use different thresholds. A single name or a list is accepted; lists are applied left-to-right. ```yaml threshold_configs: @@ -340,49 +376,166 @@ threshold_configs: thresholds: cpu_monitor: cpu_percent: {warning: 80, critical: 90} - memory_monitor: - memory_percent: {warning: 85, critical: 95} - tight_cpu: # override CPU limits only + tight_cpu: thresholds: cpu_monitor: cpu_percent: {warning: 60, critical: 75} - db_disk: # add a database partition check - thresholds: - disk_monitor: - partitions: - /var/lib/postgresql: - percent: {warning: 75, critical: 88} - hosts: web-01: - threshold_config: default # single profile - + threshold_config: default db-01: - threshold_config: [tight_cpu, db_disk] # layered: CPU override + extra disk check + threshold_config: [default, tight_cpu] ``` -Each named config's overrides are applied in order on top of the defaults. Metrics not mentioned in a profile are inherited unchanged. +### Alert states -See [docs/THRESHOLD_ALERTING.md](docs/THRESHOLD_ALERTING.md) for comprehensive documentation including best practices, troubleshooting, and advanced configuration. +| State | Meaning | +|---|---| +| OK | Metric within normal range | +| WARNING | Metric crossed warning threshold | +| CRITICAL | Metric crossed critical threshold | +| UNKNOWN | Cannot determine (e.g. Nagios exit code 3) | + +Notifications are sent on state transitions (OK โ†’ WARNING, WARNING โ†’ CRITICAL, CRITICAL โ†’ OK). De-escalations (CRITICAL โ†’ WARNING) do not trigger a notification. Ongoing alerts generate a re-notification every `threshold_renotify_interval` seconds (default: 3600). Alerts can be acknowledged via the web UI or API to suppress re-notifications. + +### RTT thresholds + +The server measures heartbeat round-trip time and supports RTT thresholds using the same format: + +```yaml +thresholds: + rtt: + webserver01: + warning: 100.0 # ms + critical: 500.0 +``` + +### Generic threshold matching + +When a metric has no exact threshold entry, the server strips leading segments and retries. This allows one entry to cover all Nagios checks: + +``` +nagios_runner.check_disk_root_status_code โ†’ no match +nagios_runner.disk_root_status_code โ†’ no match +nagios_runner.root_status_code โ†’ no match +nagios_runner.status_code โ†’ matched โœ“ +``` + +The stripped prefix (`check_disk_root`) is available as `{check_name}` in the `display` template. + +### Display template variables + +| Variable | Description | +|---|---| +| `{value}` | Current metric value | +| `{threshold_value}` | Threshold that was crossed | +| `{op_symbol}` | Comparison operator | +| `{check_name}` | Prefix stripped by generic matching | +| `{metric_name}` | Full field name | +| `{output}` | Nagios check output text | +| `{status}` | Nagios status name (OK/WARNING/CRITICAL/UNKNOWN) | +| any plugin field | Any field present in the plugin's data | --- -## ๐Ÿ‘ฅ User Management +## Notification Channels -Heartbeat supports optional user accounts with role-based access control per host. +Notifications are dispatched to the host's owner, managers, and monitors. Each user specifies which channels to use. + +### Supported channel types + +| Type | Required fields | +|---|---| +| `pushover` | `token`, `user` | +| `email` | `smtp_server`, `recipients`, `sender`, `user`, `password`, `port` | +| `mattermost` | `webhook_url`, `channel` | +| `matrix` | `homeserver`, `user`, `password`, `room_id` | +| `signal` | `phone_number`, `recipient` | +| `sms_voipms` | `api_key`, `recipient` | + +Each channel can set a `min_level` (`WARNING` or `CRITICAL`) to filter low-severity alerts. + +Recovery notifications are only sent to channels that received the original alert. + +--- + +## Web Dashboard & HTTP API + +The server exposes a web UI and REST API on `hbd_port` (default 50004). + +### Web pages + +| Path | Description | +|---|---| +| `/login` | Login form (shown automatically when auth is configured) | +| `/live` | Real-time host connectivity, RTT, and message stream | +| `/plugins/` | Per-host plugin metrics | +| `/alerts` | Active alerts with severity filtering | +| `/settings` | Server config, users, notification channels, thresholds | + +Live views use WebSocket connections for real-time updates. + +Non-admin users see only hosts where they have a role (monitor, manager, or owner). Admins see all hosts. + +### REST API + +All endpoints are under `/api/0/`. When authentication is configured, include a session token: + +```bash +# Log in, get a token +TOKEN=$(curl -s -X POST http://localhost:50004/api/0/auth/login \ + -H 'Content-Type: application/json' \ + -d '{"username":"alice","password":"secret"}' | jq -r .token) + +# Use the token +curl -H "Authorization: Bearer $TOKEN" http://localhost:50004/api/0/hosts +``` + +| Method | Endpoint | Description | +|---|---|---| +| GET | `/api/0/hosts` | All visible hosts | +| GET | `/api/0/alerts` | All active alerts | +| GET | `/api/0/alert_summary` | Count of ok/warning/critical | +| GET | `/api/0/messages` | Last 30 messages | +| GET | `/api/0/hosts/{host}/plugins` | All plugin data for host | +| GET | `/api/0/hosts/{host}/plugins/{plugin}?limit=N` | Plugin samples | +| GET | `/api/0/hosts/{host}/alerts` | Alert states for host | +| GET | `/api/0/hosts/{host}/access` | Access roles | +| PUT | `/api/0/hosts/{host}/access` | Update access roles | +| GET | `/api/0/hosts/{host}/info` | Host info (hbc version, thresholds) | +| POST | `/api/0/alerts/acknowledge` | Acknowledge alert | +| GET | `/api/0/users` | All users (admin only) | +| GET | `/api/0/users/me` | Current user profile | +| PUT | `/api/0/users/me` | Update own profile | +| POST | `/api/0/auth/login` | Create session | +| POST | `/api/0/auth/logout` | Destroy session | +| GET | `/api/0/config` | Server config (secrets redacted) | +| POST | `/api/0/config` | Update config | +| GET | `/api/0/config/backups` | List config backups | +| POST | `/api/0/config/rollback` | Roll back to previous config | +| GET | `/api/0/notification_channels` | List channels | +| POST | `/api/0/notification_channels` | Create channel | +| PUT | `/api/0/notification_channels/{name}` | Update channel | +| DELETE | `/api/0/notification_channels/{name}` | Delete channel | + +--- + +## User Management & Authentication + +When no `users:` block is in config, the server runs unauthenticated โ€” all existing behaviour is preserved. ### Roles -- **monitor** โ€” view status, plugin data, alerts -- **manager** โ€” monitor + queue commands, trigger DNS, queue upgrades -- **owner** โ€” manager + drop host, transfer ownership, update access -- **admin** (user flag) โ€” owner-level access on every host +| Role | Capabilities | +|---|---| +| monitor | View status, plugin data, alerts | +| manager | monitor + queue commands, trigger DNS, queue upgrades | +| owner | manager + drop host, transfer ownership, update access | +| admin | Owner-level on all hosts + access to server config and users | -When no users are configured the server runs in **unauthenticated mode** โ€” all existing behaviour is unchanged. - -### Quick setup +### Setup ```yaml users: @@ -390,386 +543,213 @@ users: full_name: Alice Smith password: pbkdf2:sha256:... # hbd passwd alice admin: true + notification_channels: [pushover_ops] -default_owner: alice +default_owner: alice # Owns any host with no explicit owner hosts: webserver01: owner: alice managers: [bob] monitors: [carol] - dyndns: true # update DNS record when IP changes ``` -```bash -# Generate a password hash -hbd passwd alice -``` +Password hashing uses PBKDF2-HMAC-SHA256 (260,000 iterations). Sessions expire after 24 hours. -Browser users are redirected to `/login` automatically. The session cookie is set on login, so `fetch()` calls from dashboards work without any JavaScript changes. - -See [docs/USERS.md](docs/USERS.md) for complete user management documentation. - ---- - -## ๐ŸŒ HTTP API & Web UI - -Heartbeat includes a built-in HTTP/WebSocket server that provides both a REST API and web-based dashboards for monitoring and visualization. - -### Features - -- **User auth**: Optional session-based authentication with per-host role enforcement -- **REST API**: JSON endpoints for accessing plugin data, alerts, host information, and user management -- **Live Dashboard**: Real-time WebSocket-powered host status view -- **Plugin Metrics**: Interactive visualization of all plugin data with auto-refresh -- **Alerts Dashboard**: Comprehensive alert monitoring with filtering and summaries - -### Web Dashboards - -- **Login** (`/login`): Browser login form (shown automatically when auth is configured) -- **Live View** (`/live`): Real-time host connectivity, latency, and messages; hostnames link directly to the Host Overview page -- **Host Overview** (`/plugins/`): Per-host plugin metrics with ZFS pool visualization; filtered to hosts where the logged-in user is owner or manager (admins see all) -- **Alerts Dashboard** (`/alerts`): Monitor active alerts with severity filtering; alert count pie chart shown in the navigation bar -- **Settings** (`/settings`): Server configuration, user management, and threshold configuration viewer - -### API Endpoints - -```bash -# Log in (when auth is configured) -TOKEN=$(curl -s -X POST http://localhost:50004/api/0/auth/login \ - -H 'Content-Type: application/json' \ - -d '{"username":"alice","password":"secret"}' | jq -r .token) -AUTH="-H \"Authorization: Bearer $TOKEN\"" - -# List all monitored hosts -curl $AUTH http://localhost:50004/api/0/hosts - -# Get all plugin data for a host -curl $AUTH http://localhost:50004/api/0/hosts/webserver01/plugins - -# Get detailed plugin history (last 50 samples) -curl $AUTH "http://localhost:50004/api/0/hosts/webserver01/plugins/cpu_monitor?limit=50" - -# Get alert states for a specific host -curl $AUTH http://localhost:50004/api/0/hosts/webserver01/alerts - -# Get all active alerts across all hosts -curl $AUTH http://localhost:50004/api/0/alerts - -# View/update host access roles -curl $AUTH http://localhost:50004/api/0/hosts/webserver01/access -``` - -See [docs/HTTP_API.md](docs/HTTP_API.md) for complete API documentation including response formats, error handling, and integration examples. - ---- - -## โš™๏ธ Quickstart - -Prerequisites: - -- Python 3.11+ (project uses language features from recent Python) -- `nsupdate` (for DNS updates) if using dynamic DNS - -Install dependencies (recommended into a venv): - -This project now declares its dependencies in `pyproject.toml`. Instead -of the old `requirements.txt` flow, install the package into a virtualenv -using `pip`: - -See `scripts/hb_install.sh` for a way to install. - -Run the daemon (example): - -```bash -# run with default config lookup (~/.hb.yaml) -hbd -c .hb.yaml -f -v -``` - -You can also run it directly via the package entrypoint after installation: - -```bash -python -m hbd.server.cli -c /path/to/config.yaml -``` - -### Running the Client - -The heartbeat client (`hbc`) sends periodic heartbeats and plugin data to the server: - -```bash -# Basic usage pointing to server (host is a positional argument) -hbc your-server.example.com - -# Run as daemon with a config file -hbc -d -c /etc/hbc.yaml your-server.example.com - -# Send a one-off boot message -hbc --boot your-server.example.com - -# Verbose output -hbc -v your-server.example.com - -# Send 'boot' and 'shutdown' messages on start and exit -hbc -b your-server.example.com -``` - -You can also run it via the module entrypoint: - -```bash -python -m hbd.client.main your-server.example.com -``` - -Client configuration can also be specified in YAML (`~/.hbc.yaml`): +OAuth2 login (Gitea) is supported: ```yaml -hb_port: 50003 # Server port (default: 50003) -interval: 30 # Heartbeat interval in seconds -plugins: - cpu_monitor: - interval: 300 # Check every 5 minutes (default) - per_core: true - memory_monitor: - interval: 300 # Check every 5 minutes (default) - disk_monitor: - interval: 300 # Check every 5 minutes (default) - network_monitor: - interval: 300 # Check every 5 minutes (default) - nagios_runner: - interval: 300 # Check every 5 minutes (default) - commands: - - name: check_load - command: /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6 - - name: check_disk - command: /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p / +oauth: + gitea: + url: https://git.example.com + client_id: xxx + client_secret: yyy ``` -The server hostname is always passed as a positional command-line argument; there is no `server:` config key. +--- -All monitoring plugins default to 5-minute (300 second) intervals, but can be customized as needed. +## Dynamic DNS -**Connection retry:** If a server is temporarily unreachable, `hbc` retries `open()` indefinitely on every heartbeat interval. IPv6 connections that never succeeded during early startup are dropped after 3 consecutive failures (to handle hosts without IPv6 routing), while IPv4 connections always retry. +When `dyndns: true` is set on a host and `dyndomains` is configured, the server updates DNS via `nsupdate` whenever the host's source address changes. -**Daemon logging:** When running with `-d`, `hbc` routes all log output to syslog (`LOG_DAEMON` facility) after daemonizing. Without `-d`, logs go to stderr as usual. +```yaml +nsupdate_bin: /usr/bin/nsupdate +dyndomains: + - example.com -### hbc_mini โ€” single-file client (no external dependencies) +hosts: + webserver01: + dyndns: true +``` -`scripts/hbc_mini.py` is a self-contained version of the heartbeat client that requires only Python 3.8+ and no external packages. Copy it to any host and run it directly โ€” no virtualenv, no `pip install`. +DNS updates run asynchronously in a background worker. + +--- + +## Message Journal + +All received messages are logged in JSONL format with automatic size-based rotation. + +```yaml +journal_enabled: true +journal_dir: /var/log/heartbeat +journal_file: messages.journal +journal_max_size: 104857600 # 100 MB +journal_max_backups: 10 +``` + +Example entry: + +```json +{"timestamp":1711234567.123,"datetime":"2026-03-28T12:34:56","source_ip":"192.168.1.100","source_port":50003,"message":{"ID":"HTB","name":"webserver01","interval":10}} +``` + +--- + +## `hbc_mini` โ€” Zero-dependency client + +`scripts/hbc_mini.py` is a single-file client requiring only Python 3.8+ and no external packages. Copy it to any host and run directly. ```bash -# Basic usage python3 hbc_mini.py your-server.example.com - -# Run as daemon -python3 hbc_mini.py -d your-server.example.com - -# Send a boot message -python3 hbc_mini.py -b your-server.example.com - -# Send a one-off message -python3 hbc_mini.py -m "maintenance starting" your-server.example.com +python3 hbc_mini.py -d your-server.example.com # daemon mode +python3 hbc_mini.py -b your-server.example.com # send boot message ``` -**Config:** `~/.hbc.json` (same keys as `~/.hbc.yaml`, JSON format). Example: +Config: `~/.hbc.json` (JSON format, same keys as `~/.hbc.yaml`). + +**Available plugins:** + +| Plugin | Platform | +|---|---| +| `os_info` | All | +| `ping_monitor` | All | +| `nagios_runner` | All (not Windows) | +| `cpu_monitor` | Linux (`/proc/stat`; no per-core, no frequency) | +| `memory_monitor` | Linux (`/proc/meminfo`) | +| `disk_monitor` | Linux, macOS, BSD (`df -P`) | +| `network_monitor` | Linux (`/proc/net/dev`) | + +Not available vs full `hbc`: no YAML config, no `filesystem_info`, no `zfs_monitor`, no IPv6 early-fail protection. + +--- + +## `hbc_mini.c` โ€” C client + +`scripts/c/hbc_mini.c` is a single-file C port of `hbc_mini.py`. It has no runtime dependencies beyond libc, zlib, pthreads, and libm, and runs on Linux, FreeBSD, NetBSD, and DragonFly BSD. + +### Build + +```bash +cc -O2 -o hbc_mini scripts/c/hbc_mini.c -lz -lpthread -lm +``` + +### Usage + +The CLI is identical to `hbc_mini.py`: + +```bash +./hbc_mini your-server.example.com +./hbc_mini -d your-server.example.com # daemon mode (logs to syslog) +./hbc_mini -b your-server.example.com # send boot message +./hbc_mini -m "note" your-server.example.com # send one-shot message +./hbc_mini -4 your-server.example.com # IPv4 only +./hbc_mini -6 your-server.example.com # IPv6 only +``` + +Config: `~/.hbc.json` (JSON, same keys as the Python version). + +### Architecture + +The C client uses two threads: + +- **Main thread** โ€” heartbeat sender loop + `select()`-based receive loop (1 s timeout). Sends `HTB` at the configured interval, receives `ACK`/`CMD` messages, and re-sends `os_info` on server request. +- **Monitor thread** โ€” all periodic plugins in a single thread with a 1-second sleep loop. Each plugin has its own next-run timestamp tracked independently. + +SIGHUP causes the process to restart itself via `execv()`. SIGTERM/SIGINT trigger a clean shutdown (sends a shutdown heartbeat if `-b` was used). + +### Available plugins + +| Plugin | Platform | Data source | +|---|---|---| +| `os_info` | Linux, FreeBSD, NetBSD, DragonFly | `uname(2)`, `/etc/os-release`, `kern.osrelease` sysctl | +| `cpu_monitor` | Linux | `/proc/stat` | +| `cpu_monitor` | FreeBSD, DragonFly, NetBSD | `kern.cp_time` sysctl | +| `memory_monitor` | Linux | `/proc/meminfo` (ZFS ARC-aware) | +| `memory_monitor` | FreeBSD, DragonFly | `vm.stats.vm.*` sysctl | +| `memory_monitor` | NetBSD | `VM_UVMEXP` sysctl | +| `disk_monitor` | All | `df -P` subprocess | +| `network_monitor` | Linux | `/proc/net/dev` | +| `network_monitor` | FreeBSD, NetBSD, DragonFly | `getifaddrs()` + `AF_LINK` | +| `ping_monitor` | All | `ping` subprocess | +| `nagios_runner` | All | `popen()` subprocess | + +`cpu_monitor` reports: `cpu_percent`, `cpu_user`, `cpu_system`, `cpu_idle`, `cpu_iowait` (Linux only), load averages, `cpu_core_count`, `uptime_seconds`. + +`memory_monitor` reports: `memory_total`, `memory_used`, `memory_available`, `memory_free`, `memory_percent`, and swap fields when swap is present. + +`network_monitor` reports per-interface cumulative `bytes_recv`/`bytes_sent` and interval deltas. The loopback interface (`lo`) is skipped by default; this is configurable: ```json { - "hb_port": 50003, - "interval": 30, "plugins": { - "ping_monitor": { - "interval": 60, - "hosts": ["8.8.8.8", "192.168.1.1"] - }, - "nagios_runner": { - "interval": 300, - "commands": [ - {"name": "check_load", "command": "/usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6"} - ] + "network_monitor": { + "skip_interfaces": ["lo", "docker0"] } } } ``` -**Plugin availability:** +`disk_monitor` reports per-mount `total`, `used`, `free`, `percent`. An optional mount filter restricts reporting to specific paths: -| Plugin | Platform | Data source | -|---|---|---| -| `os_info` | all | `platform` stdlib | -| `ping_monitor` | all | `ping` subprocess | -| `nagios_runner` | all (not Windows) | subprocess | -| `cpu_monitor` | Linux | `/proc/stat` | -| `memory_monitor` | Linux | `/proc/meminfo` | -| `disk_monitor` | Linux, macOS, BSD | `df -P` subprocess | -| `network_monitor` | Linux | `/proc/net/dev` | - -**What is not available compared to the full `hbc`:** - -- No YAML config (use JSON instead) -- No `filesystem_info` plugin -- No `zfs_monitor` plugin (requires `zpool(8)` and the full plugin loader) -- `cpu_monitor` does not report per-core usage or CPU frequency (no psutil) -- Plugins cannot be loaded from external `.py` files โ€” all plugins are compiled in -- No IPv6 early-fail protection โ€” connections that fail to open at startup are silently skipped rather than retried - -Everything else โ€” heartbeat protocol, ACK/CMD/UPD handling, `hb_install.sh`-based self-update, daemonize, syslog โ€” is identical to the full client. - ---- - -## ๐Ÿž Debugging in VS Code - -This repository includes a ready-to-use `.vscode/launch.json` with configurations to run or attach the VS Code debugger to `hbd`. - -- Ensure the **Python** extension is installed and select the project `.venv` as the interpreter (bottom-left of VS Code). -- Use **F5** and pick one of these configurations from the Run view: - - **Python: Run hbd (module)** โ€” runs `hbd.server.cli` as a module and sets `PYTHONPATH` to the workspace root (recommended). - - **Python: Run hbd with debugpy (listen)** โ€” launches `debugpy` and `hbd` together; useful when you want the process to listen for a debugger. - - **Python: Attach (localhost:5678)** โ€” attach the debugger to a running process started with `debugpy`. - -To start `hbd` manually and wait for the debugger to attach, run: - -```bash -PYTHONPATH=. python -m debugpy --listen 5678 --wait-for-client -m hbd.server.cli -c .hb.yaml -f -v +```json +{ + "plugins": { + "disk_monitor": { + "mounts": ["/", "/data"] + } + } +} ``` -Set breakpoints in modules such as `hbd/server/udp.py`, `hbd/server/dns.py`, or `hbd/server/main.py`, and use the **Attach** configuration to connect. Use `justMyCode: false` if you need to step into third-party code. +### Differences from `hbc_mini.py` + +- No `filesystem_info` or `zfs_monitor` plugins +- `UPD` (self-update) messages are logged but not acted on +- No IPv6 early-fail protection +- Config is JSON only (`~/.hbc.json`), no YAML --- -## ๐Ÿ›  Configuration +## Development -`hbd` reads YAML configuration (optional). If `PyYAML` is not installed, built-in defaults are used. Example configuration keys (see `hbd/server/config.py`): - -- `hb_port`: UDP port to listen for heartbeats (default: 50003) -- `hbd_port`: internal control port (default: 50004) -- `hbd_host`: bind address for HTTP/WSS -- `pickfile`: path for persisted state -- `logfile`: path to log file -- `pushsrv`: push service (`pushover`|`mattermost`|`all`) -- `interval` / `grace`: heartbeat timing configuration -- `dyndomains`: list of DNS domains to update via `nsupdate` for hosts with `dyndns` set -- `nsupdate_bin`: path to nsupdate binary -- `ws_port`: port for plain WebSocket connections (default: 50005) -- `wss_port`: port for secure WebSocket (WSS) connections (default: none). - If set, `hbd` will attempt to serve WSS on this port when `wss_pem` and - `wss_key` SSL files are available under `cert_path` (see below). -- `cert_path`: directory where TLS certificate and key are looked up (default: /usr/local/etc/ssl/) -- `wss_pem`: filename for the certificate chain (default: fullchain.pem) -- `wss_key`: filename for the private key (default: privkey.pem) -- `users`: mapping of username โ†’ user attributes (full_name, avatar, password, admin, notification_channels) -- `default_owner`: username that owns hosts with no explicit owner (falls back to first admin user) - -Example `.hb.yaml` (minimal): - -```yaml -hbd_host: 0.0.0.0 -hbd_port: 50004 -dyndomains: - - example.com -nsupdate_bin: /usr/bin/nsupdate -pushsrv: pushover -hosts: - myhost: - dyndns: true # update DNS when this host's IP changes -``` - -> Tip: `SERVER_DEFAULTS` in `hbd/server/config.py` contains the canonical defaults and accepted configuration keys. - ---- - -## ๐Ÿ”ง Architecture & Modules - -The package is organized into three subpackages: - -**`hbd.common`** โ€” shared code used by both client and server: -- `hbd.common.proto` โ€” serialization/deserialization of heartbeat messages (supports compressed payloads and plugin data) -- `hbd.common.utils` โ€” small utility helpers (`shortname`, `dur`, `initlog`) - -**`hbd.server`** โ€” the heartbeat daemon (`hbd`): -- `hbd.server.cli` โ€” CLI entrypoint and argument parsing -- `hbd.server.main` โ€” async orchestration to run UDP/HTTP/WSS components -- `hbd.server.udp` โ€” UDP parsing and `handle_datagram` implementation (main state machine) -- `hbd.server.dns` โ€” `create_nsupdate_payload`, `nsupdate`, and an asyncio DNS worker (`start_dns_worker`). - The DNS worker runs as an `asyncio` task and the package exposes a small thread-safe bridge - so legacy synchronous code can `put()` updates into the queue. -- `hbd.server.notify` โ€” email and push notification helpers -- `hbd.server.ws` โ€” WebSocket server and thread-safe broadcast helpers -- `hbd.server.http` โ€” HTTP handler factory for the status UI/API -- `hbd.server.journal` โ€” message journal with size-based log rotation and backup management -- `hbd.server.threshold` โ€” threshold alerting engine -- `hbd.server.monitor` โ€” host state monitoring -- `hbd.server.hbdclass` โ€” `Host` class and shared server state -- `hbd.server.config` โ€” configuration loader and defaults - -**`hbd.client`** โ€” the heartbeat client (`hbc`): -- `hbd.client.main` โ€” client entrypoint; sends heartbeats and plugin data to the server -- `hbd.client.plugin` โ€” plugin framework with base classes, registry, and dynamic loader -- `hbd.client.plugins/` โ€” built-in plugins (os_info, cpu_monitor, memory_monitor, disk_monitor, network_monitor, filesystem_info, nagios_runner) -- `hbd.client.config` โ€” client configuration loader - -This modular layout makes the code easier to test and maintain. - -**Runtime & Shutdown** - -- The main runtime is asyncio-based. Services (UDP listener, HTTP server, WebSocket server, monitor, and DNS worker) run as asyncio tasks. -- On SIGINT/SIGTERM the server triggers a graceful shutdown: it cancels active tasks, signals the DNS worker via a sentinel, and cleans up resources before exit. -- The DNS update worker is implemented as an `asyncio` task; synchronous producers can still enqueue DNS updates via a small thread-safe bridge available at `hbd.server.hbdclass.Host.dnsQ`. - -**Templates & Static Files** - -- Template files are located under `hbd/server/templates`. The HTTP server resolves templates relative to the `hbd.server` package but the path can be overridden with the `templates_dir` config key. -- Static assets (CSS/JS/images) are served from `hbd/server/static` via the `/static/` HTTP route. - ---- - -## ๐Ÿงช Testing & Dev - -Tests are implemented using `unittest` and additional tests rely on `pytest` if you prefer. To run tests locally without installing anything beyond the dev requirements: +### Running tests ```bash -# with project root on PYTHONPATH PYTHONPATH=. python -m unittest discover -v -# or with pytest if installed +# or pytest -q ``` -Developer tooling included: - -- `pyproject.toml` โ€” project metadata and dependencies -- `tox.ini` โ€” convenience wrappers for running tests, lint, and mypy - -To run linters and type checks locally: +### Linting and type checking ```bash -# after installing dev deps tox -e lint tox -e mypy ``` ---- +### Debugging in VS Code -## ๐Ÿš€ Running in production +A `.vscode/launch.json` is included with configurations for running and attaching the debugger. Select the project `.venv` as the Python interpreter, then use F5. -- Use your system service manager (systemd, launchd, etc.) to run `hbd` in the background. -- Ensure `nsupdate` and necessary credentials are available for dynamic DNS updates. -- Configure TLS for WSS if you enable secure websockets. +To start with debugpy and wait for attach: -> Note: The project contains a small example for obtaining DNS-verified certs (certbot with RFC2136) โ€” see earlier commit history or ask me to re-add the example to this README if you want it documented here. +```bash +PYTHONPATH=. python -m debugpy --listen 5678 --wait-for-client -m hbd.server.cli serve -c .hb.yaml -f -v +``` --- -## ๐Ÿค Contributing +## License -Contributions welcome! Please: - -1. Open an issue to discuss larger changes. -2. Create a topic branch and a clear PR. -3. Add tests for new features and run linters. -4. Keep changes focused and documented. - ---- - -## ๐Ÿ“œ License - -This repository is licensed under the MIT license. See `LICENSE` for details. - ---- +MIT. See `LICENSE` for details.