hbd (5.0.12)
Installation
pip install --index-url --extra-index-url https://pypi.org/simple hbdAbout this package
Heartbeat monitoring system — client (hbc) and server (hbd)
Heartbeat Daemon (hbd) ✅
A lightweight daemon that listens for UDP heartbeat messages and acts on them: keeps host state, optionally updates DNS records via nsupdate, forwards messages to WebSocket clients, and sends notifications (email, Pushover, Mattermost, Signal). It is a refactor of a previously monolithic script into a modular Python package (hbd).
📌 Features
- Receive and parse heartbeat datagrams (text or zlib-compressed) ✅
- Maintain host state and detect up/down transitions ✅
- Queue DNS updates via
nsupdateand run them in a background thread ✅ - WebSocket API for live updates (hosts & messages) ✅
- Notification pipeline (email, Pushover, Mattermost, Signal) ✅
- User management & access control ✅
- Optional user accounts with bcrypt-style password hashing (stdlib only)
- Per-host roles: owner, manager, monitor
- Session-based auth with cookie support (browser login page included)
- Backwards compatible: no auth required when no users are configured
- HTTP API & Web UI ✅
- REST API for plugin data, alerts, host information, and user management
- Live dashboard with WebSocket updates
- Interactive plugin metrics visualization
- Alerts dashboard with filtering and summaries
- Message journal with automatic log rotation ✅
- Logs all received messages in JSON format
- Size-based automatic rotation
- Configurable retention and backup management
- Plugin system for extensible monitoring ✅
- Collect system metrics (CPU, memory, disk, network)
- Execute existing Nagios monitoring plugins
- Create custom plugins with simple Python classes
- Threshold alerting system ✅
- Monitor metrics against configurable WARNING/CRITICAL thresholds
- Hysteresis to prevent alert flapping
- Automatic notifications on state changes
- Re-notification for ongoing alerts
- Modular codebase suitable for unit testing and CI ✅
🔌 Plugin System
Heartbeat includes a comprehensive plugin architecture that extends monitoring beyond simple heartbeats. The plugin system allows you to:
- Collect system information: OS details, hardware info, system configuration
- Monitor resources: CPU usage, memory, disk space, network statistics
- Run Nagios plugins: Execute thousands of existing Nagios monitoring plugins without modification
- Create custom plugins: Build your own monitoring logic with simple Python classes
Plugin Types
- InfoPlugin: Collects static information once (e.g., OS version, hardware specs)
- MonitorPlugin: Collects metrics periodically (e.g., CPU usage every 30 seconds)
Built-in Plugins
os_info: Collects OS, kernel, distribution, and architecture informationcpu_monitor: Monitors CPU usage, load average, frequency, and process countsmemory_monitor: Monitors RAM and swap usage, available memorydisk_monitor: Monitors disk usage, I/O statistics, and filesystem metricsnetwork_monitor: Monitors network interface statistics, bandwidth, and connectionsfilesystem_info: Collects mounted filesystem information (physical filesystems only by default)nagios_runner: Executes Nagios monitoring plugins (check_disk, check_load, check_http, etc.)
Nagios Integration
The nagios_runner plugin provides seamless integration with the vast Nagios plugin ecosystem. You can run any Nagios-compatible plugin and have the results automatically parsed and stored:
- Executes plugins via subprocess with timeout protection
- Parses exit codes (OK/WARNING/CRITICAL/UNKNOWN)
- Extracts performance data with thresholds
- Reports aggregated status across all configured checks
See docs/NAGIOS_INTEGRATION.md for complete integration guide including configuration examples and custom plugin development.
Creating Custom Plugins
from hbd.plugin import MonitorPlugin
class DiskMonitorPlugin(MonitorPlugin):
name = "disk_monitor"
interval = 60 # Run every 60 seconds
async def collect(self):
return {
"disk_usage": get_disk_usage(),
"timestamp": time.time()
}
Place plugins in hbd/plugins/ and they'll be automatically discovered and loaded by the client.
📝 Message Journal
Heartbeat includes a message journal that logs all received messages with automatic rotation.
Features
- JSON Format: All messages logged in JSONL (JSON Lines) format for easy parsing
- Automatic Rotation: Size-based rotation with configurable thresholds
- Backup Management: Keeps configurable number of rotated log files
- Non-blocking: Async logging with minimal performance impact
Configuration
# Message journal settings
journal_enabled: true # Enable/disable journaling
journal_dir: /var/log/heartbeat # Journal directory
journal_file: messages.journal # Base filename
journal_max_size: 104857600 # Max size (100MB default)
journal_max_backups: 10 # Number of backups to keep
Example Journal Entry
{"timestamp":1711234567.123,"datetime":"2026-03-28T12:34:56","source_ip":"192.168.1.100","source_port":50003,"message":{"ID":"HTB","name":"webserver1","interval":30}}
Analyzing Journal Files
# View recent messages
tail -100 /var/log/heartbeat/messages.journal | jq .
# Count messages by type
cat /var/log/heartbeat/messages.journal | jq -r '.message.ID' | sort | uniq -c
# Filter by hostname
cat /var/log/heartbeat/messages.journal | jq 'select(.message.name == "webserver1")'
See docs/MESSAGE_JOURNAL.md for complete documentation including rotation behavior, integration with log management systems, and analysis examples.
🚨 Threshold Alerting
Heartbeat includes a sophisticated threshold alerting system that monitors plugin metrics and triggers notifications when values exceed configured limits.
Features
- Multi-level alerts: WARNING and CRITICAL severity levels
- Flexible operators: Support for >, >=, <, <=, ==, != comparisons
- Hysteresis: Prevents alert flapping with configurable recovery thresholds
- Smart notifications: Alerts only on state changes, not every check
- Re-notifications: Periodic reminders for ongoing alerts
- Journal integration: All threshold events logged for audit trail
Configuration
thresholds:
# RTT (Round-Trip Time) thresholds for heartbeat monitoring
# These are checked on every HTB message arrival
rtt:
webserver01:
warning: 100.0 # Warn when RTT > 100ms
critical: 500.0 # Critical when RTT > 500ms
database01:
warning: 50.0
critical: 200.0
# Plugin metric thresholds
cpu_monitor:
cpu_percent:
warning: 80.0 # Warn when CPU > 80%
critical: 90.0 # Critical when CPU > 90%
operator: ">"
hysteresis: 0.1 # 10% hysteresis to prevent flapping
memory_monitor:
percent:
warning: 85.0
critical: 95.0
disk_monitor:
partitions:
/:
percent:
warning: 80.0
critical: 90.0
free_gb:
warning: 10.0 # Alert when < 10GB free
critical: 5.0
operator: "<" # Inverse threshold
# Global settings
threshold_renotify_interval: 3600 # Re-notify every hour for ongoing alerts
RTT Monitoring
Heartbeat monitors network latency (Round-Trip Time) for each host's heartbeat messages. RTT thresholds are fully integrated with the threshold alerting system:
- Per-host configuration: Set different thresholds for each monitored host
- Real-time checking: Thresholds evaluated on every HTB message arrival
- Alert state tracking: RTT alerts use the same state management as plugin metrics
- Hysteresis support: Configurable hysteresis prevents rapid state transitions
- Alerts dashboard: RTT alerts visible on the
/alertsweb page alongside plugin alerts - Smart notifications: Only triggers on state changes (OK → WARNING → CRITICAL)
- Re-notification: Periodic reminders for ongoing RTT issues
- Event & journal logging: All RTT events logged for audit trail
Configuration format:
thresholds:
rtt:
<hostname>:
warning: <milliseconds> # Warn when RTT > this value
critical: <milliseconds> # Critical when RTT > this value
hysteresis: 0.1 # Optional: 10% hysteresis (default)
Example alerts:
WARNING: webserver01 - rtt.webserver01 = 125.3
CRITICAL: database01 - rtt.database01 = 520.1
RECOVERED: webserver01 - rtt.webserver01 = 45.2 (WARNING -> OK)
RTT alerts appear on the Alerts dashboard and can be filtered by severity level. The metric_path format is rtt.<hostname>, making it easy to distinguish from plugin metrics.
Alert Behavior
-
State Changes: Notifications sent when crossing thresholds
- OK → WARNING: Early notification
- WARNING → CRITICAL: Escalation
- CRITICAL → OK: Recovery
-
Hysteresis: Prevents rapid state transitions
Critical threshold: 90% Hysteresis: 10% Recovery threshold: 81% (90 - 10% of 90) Value 91% → CRITICAL (threshold crossed) Value 85% → CRITICAL (still above 81%) Value 79% → OK (below recovery threshold) -
Re-notifications: Periodic reminders for ongoing alerts
- Default: Every 60 minutes
- Configurable via
threshold_renotify_interval
Example Notifications
WARNING: webserver01 - cpu_monitor.cpu_percent = 85.0
CRITICAL: webserver01 - memory_monitor.percent = 96.0
RECOVERED: database01 - disk_monitor./.percent = 75.0 (WARNING -> OK)
REMINDER (CRITICAL): mailserver - cpu_monitor.load_1min = 12.5 (ongoing for 3600s)
Supported Metrics
All plugin metrics can be thresholded:
- CPU: cpu_percent, load_1min, load_5min, load_15min
- Memory: percent, available_mb, swap_percent
- Disk: Per-partition percent, free_gb, free_mb
- Network: errors_total, dropped packets, connection counts
- Nagios: exit_code mapping (0=OK, 1=WARNING, 2=CRITICAL)
See docs/THRESHOLD_ALERTING.md for comprehensive documentation including best practices, troubleshooting, and advanced configuration.
👥 User Management
Heartbeat supports optional user accounts with role-based access control per host.
Roles
- monitor — view status, plugin data, alerts
- manager — monitor + queue commands, trigger DNS, queue upgrades
- owner — manager + drop host, transfer ownership, update access
- admin (user flag) — owner-level access on every host
When no users are configured the server runs in unauthenticated mode — all existing behaviour is unchanged.
Quick setup
users:
alice:
full_name: Alice Smith
password: pbkdf2:sha256:... # hbd passwd alice
admin: true
default_owner: alice
hosts:
webserver01:
owner: alice
managers: [bob]
monitors: [carol]
# Generate a password hash
hbd passwd alice
Browser users are redirected to /login automatically. The session cookie is set on login, so fetch() calls from dashboards work without any JavaScript changes.
See docs/USERS.md for complete user management documentation.
🌐 HTTP API & Web UI
Heartbeat includes a built-in HTTP/WebSocket server that provides both a REST API and web-based dashboards for monitoring and visualization.
Features
- User auth: Optional session-based authentication with per-host role enforcement
- REST API: JSON endpoints for accessing plugin data, alerts, host information, and user management
- Live Dashboard: Real-time WebSocket-powered host status view
- Plugin Metrics: Interactive visualization of all plugin data with auto-refresh
- Alerts Dashboard: Comprehensive alert monitoring with filtering and summaries
Web Dashboards
- Login (
/login): Browser login form (shown automatically when auth is configured) - Live View (
/live): Real-time host connectivity, latency, and messages - Plugin Metrics (
/plugins): Browse and visualize metrics from all plugins - Alerts Dashboard (
/alerts): Monitor active alerts with severity filtering
API Endpoints
# Log in (when auth is configured)
TOKEN=$(curl -s -X POST http://localhost:50004/api/0/auth/login \
-H 'Content-Type: application/json' \
-d '{"username":"alice","password":"secret"}' | jq -r .token)
AUTH="-H \"Authorization: Bearer $TOKEN\""
# List all monitored hosts
curl $AUTH http://localhost:50004/api/0/hosts
# Get all plugin data for a host
curl $AUTH http://localhost:50004/api/0/hosts/webserver01/plugins
# Get detailed plugin history (last 50 samples)
curl $AUTH "http://localhost:50004/api/0/hosts/webserver01/plugins/cpu_monitor?limit=50"
# Get alert states for a specific host
curl $AUTH http://localhost:50004/api/0/hosts/webserver01/alerts
# Get all active alerts across all hosts
curl $AUTH http://localhost:50004/api/0/alerts
# View/update host access roles
curl $AUTH http://localhost:50004/api/0/hosts/webserver01/access
See docs/HTTP_API.md for complete API documentation including response formats, error handling, and integration examples.
⚙️ Quickstart
Prerequisites:
- Python 3.10+ (project uses language features from recent Python)
nsupdate(for DNS updates) if using dynamic DNS
Install dependencies (recommended into a venv):
This project now declares its dependencies in pyproject.toml. Instead
of the old requirements.txt flow, install the package into a virtualenv
using pip:
See scripts/install.sh for a way to install.
Run the daemon (example):
# run with default config lookup (~/.hb.yaml)
hbd -c .hb.yaml -f -v
You can also run it directly via the package entrypoint after installation:
python -m hbd.cli -c /path/to/config.yaml
Running the Client
The heartbeat client (hbc) sends periodic heartbeats and plugin data to the server:
# Basic usage pointing to server
python -m hbd.hbc --server your-server.example.com
# With custom configuration
python -m hbd.hbc --server 192.168.1.100 --port 50003 --interval 30
# Run with specific plugins enabled/disabled
python -m hbd.hbc --server hbd.local --disable-plugin os_info
Client configuration can also be specified in YAML:
server: hbd.example.com
port: 50003
interval: 30
plugins:
cpu_monitor:
interval: 300 # Check every 5 minutes (default)
per_core: true
memory_monitor:
interval: 300 # Check every 5 minutes (default)
disk_monitor:
interval: 300 # Check every 5 minutes (default)
network_monitor:
interval: 300 # Check every 5 minutes (default)
nagios_runner:
interval: 300 # Check every 5 minutes (default)
commands:
- /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6
- /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
All monitoring plugins default to 5-minute (300 second) intervals, but can be customized as needed.
🐞 Debugging in VS Code
This repository includes a ready-to-use .vscode/launch.json with configurations to run or attach the VS Code debugger to hbd.
- Ensure the Python extension is installed and select the project
.venvas the interpreter (bottom-left of VS Code). - Use F5 and pick one of these configurations from the Run view:
- Python: Run hbd (module) — runs
hbd.clias a module and setsPYTHONPATHto the workspace root (recommended). - Python: Run hbd with debugpy (listen) — launches
debugpyandhbdtogether; useful when you want the process to listen for a debugger. - Python: Attach (localhost:5678) — attach the debugger to a running process started with
debugpy.
- Python: Run hbd (module) — runs
To start hbd manually and wait for the debugger to attach, run:
PYTHONPATH=. python -m debugpy --listen 5678 --wait-for-client -m hbd.cli -c .hb.yaml -f -v
Set breakpoints in modules such as hbd/udp.py, hbd/dns.py, or hbd/server.py, and use the Attach configuration to connect. Use justMyCode: false if you need to step into third-party code.
🛠 Configuration
hbd reads YAML configuration (optional). If PyYAML is not installed, built-in defaults are used. Example configuration keys (see hbd/config.py):
hb_port: UDP port to listen for heartbeats (default: 50003)hbd_port: internal control port (default: 50004)hbd_host: bind address for HTTP/WSSpickfile: path for persisted statelogfile: path to log filelogfmt:textormsgpushsrv: push service (pushover|mattermost|all)interval/grace: heartbeat timing configurationdyndomains: list of dyndomains to update viansupdatensupdate_bin: path to nsupdate binaryws_port: port for plain WebSocket connections (default: 50005)wss_port: port for secure WebSocket (WSS) connections (default: none). If set,hbdwill attempt to serve WSS on this port whenwss_pemandwss_keySSL files are available undercert_path(see below).cert_path: directory where TLS certificate and key are looked up (default: /usr/local/etc/ssl/)wss_pem: filename for the certificate chain (default: fullchain.pem)wss_key: filename for the private key (default: privkey.pem)users: mapping of username → user attributes (full_name, avatar, password, admin, notification_channels)default_owner: username that owns hosts with no explicit owner (falls back to first admin user)
Example .hb.yaml (minimal):
hbd_host: 0.0.0.0
hbd_port: 50004
dyndomains:
- example.com
nsupdate_bin: /usr/bin/nsupdate
pushsrv: pushover
Tip:
config.DEFAULTSinhbd/config.pycontains the canonical defaults and accepted configuration keys.
🔧 Architecture & Modules
hbd.proto— serialization/deserialization of heartbeat messages (supports compressed payloads and plugin data)hbd.udp— UDP parsing andhandle_datagramimplementation (main state machine)hbd.dns—create_nsupdate_payload,nsupdate, and an asyncio DNS worker (start_dns_worker). The DNS worker now runs as anasynciotask and the package exposes a small thread-safe bridge so legacy synchronous code canput()updates into the queue; there is no longer a permanently-blocking backgroundthreading.Thread.hbd.notify— email and push notification helpershbd.ws— WebSocket server and thread-safe broadcast helpershbd.http— HTTP handler factory for the status UI/APIhbd.journal— message journal with size-based log rotation and backup managementhbd.plugin— plugin framework with base classes, registry, and dynamic loaderhbd.plugins/— built-in plugins (os_info, cpu_monitor, memory_monitor, disk_monitor, network_monitor, filesystem_info, nagios_runner)hbd.hbc— heartbeat client that sends heartbeats and plugin data to serverhbd.utils— small utility helpers (shortname,dur,initlog)hbd.cli— CLI entrypoint and argument parsinghbd.server— async orchestration to run UDP/HTTP/WSS components
This modular layout makes the code easier to test and maintain.
Runtime & Shutdown
- The main runtime is asyncio-based. Services (UDP listener, HTTP server, WebSocket server, monitor, and DNS worker) run as asyncio tasks.
- On SIGINT/SIGTERM the server triggers a graceful shutdown: it cancels active tasks, signals the DNS worker via a sentinel, and cleans up resources before exit.
- The DNS update worker is implemented as an
asynciotask; synchronous producers can still enqueue DNS updates via a small thread-safe bridge available athbd.hbdclass.Host.dnsQ.
Templates & Static Files
- Template files are located under
hbd/templatesby default. The HTTP server resolves templates relative to thehbdpackage but the path can be overridden with thetemplates_dirconfig key. - Static assets (CSS/JS/images) are served from
hbd/staticvia the/static/<path>HTTP route. Place your static files in that directory or configure the HTTP server as needed.
🧪 Testing & Dev
Tests are implemented using unittest and additional tests rely on pytest if you prefer. To run tests locally without installing anything beyond the dev requirements:
# with project root on PYTHONPATH
PYTHONPATH=. python -m unittest discover -v
# or with pytest if installed
pytest -q
Developer tooling included:
pyproject.toml— project metadata and dependenciestox.ini— convenience wrappers for running tests, lint, and mypy
To run linters and type checks locally:
# after installing dev deps
tox -e lint
tox -e mypy
🚀 Running in production
- Use your system service manager (systemd, launchd, etc.) to run
hbdin the background. - Ensure
nsupdateand necessary credentials are available for dynamic DNS updates. - Configure TLS for WSS if you enable secure websockets.
Note: The project contains a small example for obtaining DNS-verified certs (certbot with RFC2136) — see earlier commit history or ask me to re-add the example to this README if you want it documented here.
🤝 Contributing
Contributions welcome! Please:
- Open an issue to discuss larger changes.
- Create a topic branch and a clear PR.
- Add tests for new features and run linters.
- Keep changes focused and documented.
📜 License
This repository is licensed under the MIT license. See LICENSE for details.
If you'd like, I can also:
- add a GitHub Actions workflow that runs tests and lint on push/PR 🔁
- add a
CONTRIBUTING.mdtemplate for PRs and code style 💬
Which one should I do next? ✨