A Wrede andreas
  • Joined on 2022-01-20

hbd (5.1.1)

Published 2026-04-12 13:06:38 -04:00 by andreas

Installation

pip install --index-url  --extra-index-url https://pypi.org/simple hbd

About this package

Heartbeat monitoring system — client (hbc) and server (hbd)

Heartbeat Daemon (hbd)

A lightweight daemon that listens for UDP heartbeat messages and acts on them: keeps host state, optionally updates DNS records via nsupdate, forwards messages to WebSocket clients, and sends notifications (email, Pushover, Mattermost, Signal). It is a refactor of a previously monolithic script into a modular Python package (hbd).


📌 Features

  • Receive and parse heartbeat datagrams (text or zlib-compressed)
  • Maintain host state and detect up/down transitions
  • Queue DNS updates via nsupdate and run them in a background thread
  • WebSocket API for live updates (hosts & messages)
  • Notification pipeline (email, Pushover, Mattermost, Signal)
  • User management & access control
    • Optional user accounts with bcrypt-style password hashing (stdlib only)
    • Per-host roles: owner, manager, monitor
    • Session-based auth with cookie support (browser login page included)
    • Backwards compatible: no auth required when no users are configured
  • HTTP API & Web UI
    • REST API for plugin data, alerts, host information, and user management
    • Live dashboard with WebSocket updates
    • Interactive plugin metrics visualization
    • Alerts dashboard with filtering and summaries
  • Message journal with automatic log rotation
    • Logs all received messages in JSON format
    • Size-based automatic rotation
    • Configurable retention and backup management
  • Plugin system for extensible monitoring
    • Collect system metrics (CPU, memory, disk, network)
    • Execute existing Nagios monitoring plugins
    • Create custom plugins with simple Python classes
  • Threshold alerting system
    • Monitor metrics against configurable WARNING/CRITICAL thresholds
    • Hysteresis to prevent alert flapping
    • Automatic notifications on state changes
    • Re-notification for ongoing alerts
  • Modular codebase suitable for unit testing and CI

🔌 Plugin System

Heartbeat includes a comprehensive plugin architecture that extends monitoring beyond simple heartbeats. The plugin system allows you to:

  • Collect system information: OS details, hardware info, system configuration
  • Monitor resources: CPU usage, memory, disk space, network statistics
  • Run Nagios plugins: Execute thousands of existing Nagios monitoring plugins without modification
  • Create custom plugins: Build your own monitoring logic with simple Python classes

Plugin Types

  • InfoPlugin: Collects static information once (e.g., OS version, hardware specs)
  • MonitorPlugin: Collects metrics periodically (e.g., CPU usage every 30 seconds)

Built-in Plugins

  • os_info: Collects OS, kernel, distribution, and architecture information
  • cpu_monitor: Monitors CPU usage, load average, frequency, and process counts
  • memory_monitor: Monitors RAM and swap usage, available memory
  • disk_monitor: Monitors disk usage, I/O statistics, and filesystem metrics
  • network_monitor: Monitors network interface statistics, bandwidth, and connections
  • filesystem_info: Collects mounted filesystem information (physical filesystems only by default)
  • nagios_runner: Executes Nagios monitoring plugins (check_disk, check_load, check_http, etc.)

Nagios Integration

The nagios_runner plugin provides seamless integration with the vast Nagios plugin ecosystem. You can run any Nagios-compatible plugin and have the results automatically parsed and stored:

  • Executes plugins via subprocess with timeout protection
  • Parses exit codes (OK/WARNING/CRITICAL/UNKNOWN)
  • Extracts performance data with thresholds
  • Reports aggregated status across all configured checks

See docs/NAGIOS_INTEGRATION.md for complete integration guide including configuration examples and custom plugin development.

Creating Custom Plugins

from hbd.client.plugin import MonitorPlugin

class DiskMonitorPlugin(MonitorPlugin):
    name = "disk_monitor"
    interval = 60  # Run every 60 seconds
    
    async def collect(self):
        return {
            "disk_usage": get_disk_usage(),
            "timestamp": time.time()
        }

Place plugins in hbd/client/plugins/ and they'll be automatically discovered and loaded by the client.


📝 Message Journal

Heartbeat includes a message journal that logs all received messages with automatic rotation.

Features

  • JSON Format: All messages logged in JSONL (JSON Lines) format for easy parsing
  • Automatic Rotation: Size-based rotation with configurable thresholds
  • Backup Management: Keeps configurable number of rotated log files
  • Non-blocking: Async logging with minimal performance impact

Configuration

# Message journal settings
journal_enabled: true                    # Enable/disable journaling
journal_dir: /var/log/heartbeat         # Journal directory
journal_file: messages.journal           # Base filename
journal_max_size: 104857600             # Max size (100MB default)
journal_max_backups: 10                 # Number of backups to keep

Example Journal Entry

{"timestamp":1711234567.123,"datetime":"2026-03-28T12:34:56","source_ip":"192.168.1.100","source_port":50003,"message":{"ID":"HTB","name":"webserver1","interval":30}}

Analyzing Journal Files

# View recent messages
tail -100 /var/log/heartbeat/messages.journal | jq .

# Count messages by type
cat /var/log/heartbeat/messages.journal | jq -r '.message.ID' | sort | uniq -c

# Filter by hostname
cat /var/log/heartbeat/messages.journal | jq 'select(.message.name == "webserver1")'

See docs/MESSAGE_JOURNAL.md for complete documentation including rotation behavior, integration with log management systems, and analysis examples.


🚨 Threshold Alerting

Heartbeat includes a sophisticated threshold alerting system that monitors plugin metrics and triggers notifications when values exceed configured limits.

Features

  • Multi-level alerts: WARNING and CRITICAL severity levels
  • Flexible operators: Support for >, >=, <, <=, ==, != comparisons
  • Hysteresis: Prevents alert flapping with configurable recovery thresholds
  • Smart notifications: Alerts only on state changes, not every check
  • Re-notifications: Periodic reminders for ongoing alerts
  • Journal integration: All threshold events logged for audit trail

Configuration

thresholds:
  # RTT (Round-Trip Time) thresholds for heartbeat monitoring
  # These are checked on every HTB message arrival
  rtt:
    webserver01:
      warning: 100.0   # Warn when RTT > 100ms
      critical: 500.0  # Critical when RTT > 500ms
    
    database01:
      warning: 50.0
      critical: 200.0
  
  # Plugin metric thresholds
  cpu_monitor:
    cpu_percent:
      warning: 80.0      # Warn when CPU > 80%
      critical: 90.0     # Critical when CPU > 90%
      operator: ">"
      hysteresis: 0.1    # 10% hysteresis to prevent flapping
  
  memory_monitor:
    percent:
      warning: 85.0
      critical: 95.0
  
  disk_monitor:
    partitions:
      /:
        percent:
          warning: 80.0
          critical: 90.0
        free_gb:
          warning: 10.0   # Alert when < 10GB free
          critical: 5.0
          operator: "<"   # Inverse threshold

# Global settings
threshold_renotify_interval: 3600  # Re-notify every hour for ongoing alerts

RTT Monitoring

Heartbeat monitors network latency (Round-Trip Time) for each host's heartbeat messages. RTT thresholds are fully integrated with the threshold alerting system:

  • Per-host configuration: Set different thresholds for each monitored host
  • Real-time checking: Thresholds evaluated on every HTB message arrival
  • Alert state tracking: RTT alerts use the same state management as plugin metrics
  • Hysteresis support: Configurable hysteresis prevents rapid state transitions
  • Alerts dashboard: RTT alerts visible on the /alerts web page alongside plugin alerts
  • Smart notifications: Only triggers on state changes (OK → WARNING → CRITICAL)
  • Re-notification: Periodic reminders for ongoing RTT issues
  • Event & journal logging: All RTT events logged for audit trail

Configuration format:

thresholds:
  rtt:
    <hostname>:
      warning: <milliseconds>   # Warn when RTT > this value
      critical: <milliseconds>  # Critical when RTT > this value
      hysteresis: 0.1           # Optional: 10% hysteresis (default)

Example alerts:

WARNING: webserver01 - rtt.webserver01 = 125.3
CRITICAL: database01 - rtt.database01 = 520.1
RECOVERED: webserver01 - rtt.webserver01 = 45.2 (WARNING -> OK)

RTT alerts appear on the Alerts dashboard and can be filtered by severity level. The metric_path format is rtt.<hostname>, making it easy to distinguish from plugin metrics.

Alert Behavior

  1. State Changes: Notifications sent when crossing thresholds

    • OK → WARNING: Early notification
    • WARNING → CRITICAL: Escalation
    • CRITICAL → OK: Recovery
  2. Hysteresis: Prevents rapid state transitions

    Critical threshold: 90%
    Hysteresis: 10%
    Recovery threshold: 81% (90 - 10% of 90)
    
    Value 91% → CRITICAL (threshold crossed)
    Value 85% → CRITICAL (still above 81%)
    Value 79% → OK (below recovery threshold)
    
  3. Re-notifications: Periodic reminders for ongoing alerts

    • Default: Every 60 minutes
    • Configurable via threshold_renotify_interval

Example Notifications

WARNING: webserver01 - cpu_monitor.cpu_percent = 85.0
CRITICAL: webserver01 - memory_monitor.percent = 96.0
RECOVERED: database01 - disk_monitor./.percent = 75.0 (WARNING -> OK)
REMINDER (CRITICAL): mailserver - cpu_monitor.load_1min = 12.5 (ongoing for 3600s)

Supported Metrics

All plugin metrics can be thresholded:

  • CPU: cpu_percent, load_1min, load_5min, load_15min
  • Memory: percent, available_mb, swap_percent
  • Disk: Per-partition percent, free_gb, free_mb
  • Network: errors_total, dropped packets, connection counts
  • Nagios: exit_code mapping (0=OK, 1=WARNING, 2=CRITICAL)

See docs/THRESHOLD_ALERTING.md for comprehensive documentation including best practices, troubleshooting, and advanced configuration.


👥 User Management

Heartbeat supports optional user accounts with role-based access control per host.

Roles

  • monitor — view status, plugin data, alerts
  • manager — monitor + queue commands, trigger DNS, queue upgrades
  • owner — manager + drop host, transfer ownership, update access
  • admin (user flag) — owner-level access on every host

When no users are configured the server runs in unauthenticated mode — all existing behaviour is unchanged.

Quick setup

users:
  alice:
    full_name: Alice Smith
    password: pbkdf2:sha256:...    # hbd passwd alice
    admin: true

default_owner: alice

hosts:
  webserver01:
    owner: alice
    managers: [bob]
    monitors: [carol]
# Generate a password hash
hbd passwd alice

Browser users are redirected to /login automatically. The session cookie is set on login, so fetch() calls from dashboards work without any JavaScript changes.

See docs/USERS.md for complete user management documentation.


🌐 HTTP API & Web UI

Heartbeat includes a built-in HTTP/WebSocket server that provides both a REST API and web-based dashboards for monitoring and visualization.

Features

  • User auth: Optional session-based authentication with per-host role enforcement
  • REST API: JSON endpoints for accessing plugin data, alerts, host information, and user management
  • Live Dashboard: Real-time WebSocket-powered host status view
  • Plugin Metrics: Interactive visualization of all plugin data with auto-refresh
  • Alerts Dashboard: Comprehensive alert monitoring with filtering and summaries

Web Dashboards

  • Login (/login): Browser login form (shown automatically when auth is configured)
  • Live View (/live): Real-time host connectivity, latency, and messages
  • Plugin Metrics (/plugins): Browse and visualize metrics from all plugins
  • Alerts Dashboard (/alerts): Monitor active alerts with severity filtering

API Endpoints

# Log in (when auth is configured)
TOKEN=$(curl -s -X POST http://localhost:50004/api/0/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"username":"alice","password":"secret"}' | jq -r .token)
AUTH="-H \"Authorization: Bearer $TOKEN\""

# List all monitored hosts
curl $AUTH http://localhost:50004/api/0/hosts

# Get all plugin data for a host
curl $AUTH http://localhost:50004/api/0/hosts/webserver01/plugins

# Get detailed plugin history (last 50 samples)
curl $AUTH "http://localhost:50004/api/0/hosts/webserver01/plugins/cpu_monitor?limit=50"

# Get alert states for a specific host
curl $AUTH http://localhost:50004/api/0/hosts/webserver01/alerts

# Get all active alerts across all hosts
curl $AUTH http://localhost:50004/api/0/alerts

# View/update host access roles
curl $AUTH http://localhost:50004/api/0/hosts/webserver01/access

See docs/HTTP_API.md for complete API documentation including response formats, error handling, and integration examples.


⚙️ Quickstart

Prerequisites:

  • Python 3.11+ (project uses language features from recent Python)
  • nsupdate (for DNS updates) if using dynamic DNS

Install dependencies (recommended into a venv):

This project now declares its dependencies in pyproject.toml. Instead of the old requirements.txt flow, install the package into a virtualenv using pip:

See scripts/install.sh for a way to install.

Run the daemon (example):

# run with default config lookup (~/.hb.yaml)
hbd -c .hb.yaml -f -v

You can also run it directly via the package entrypoint after installation:

python -m hbd.server.cli -c /path/to/config.yaml

Running the Client

The heartbeat client (hbc) sends periodic heartbeats and plugin data to the server:

# Basic usage pointing to server (host is a positional argument)
hbc your-server.example.com

# Run as daemon with a config file
hbc -d -c /etc/hbc.yaml your-server.example.com

# Send a one-off boot message
hbc --boot your-server.example.com

# Verbose output
hbc -v your-server.example.com

You can also run it via the module entrypoint:

python -m hbd.client.main your-server.example.com

Client configuration can also be specified in YAML:

server: hbd.example.com
port: 50003
interval: 30
plugins:
  cpu_monitor:
    interval: 300      # Check every 5 minutes (default)
    per_core: true
  memory_monitor:
    interval: 300      # Check every 5 minutes (default)
  disk_monitor:
    interval: 300      # Check every 5 minutes (default)
  network_monitor:
    interval: 300      # Check every 5 minutes (default)
  nagios_runner:
    interval: 300      # Check every 5 minutes (default)
    commands:
      - /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6
      - /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /

All monitoring plugins default to 5-minute (300 second) intervals, but can be customized as needed.

🐞 Debugging in VS Code

This repository includes a ready-to-use .vscode/launch.json with configurations to run or attach the VS Code debugger to hbd.

  • Ensure the Python extension is installed and select the project .venv as the interpreter (bottom-left of VS Code).
  • Use F5 and pick one of these configurations from the Run view:
    • Python: Run hbd (module) — runs hbd.server.cli as a module and sets PYTHONPATH to the workspace root (recommended).
    • Python: Run hbd with debugpy (listen) — launches debugpy and hbd together; useful when you want the process to listen for a debugger.
    • Python: Attach (localhost:5678) — attach the debugger to a running process started with debugpy.

To start hbd manually and wait for the debugger to attach, run:

PYTHONPATH=. python -m debugpy --listen 5678 --wait-for-client -m hbd.server.cli -c .hb.yaml -f -v

Set breakpoints in modules such as hbd/server/udp.py, hbd/server/dns.py, or hbd/server/main.py, and use the Attach configuration to connect. Use justMyCode: false if you need to step into third-party code.


🛠 Configuration

hbd reads YAML configuration (optional). If PyYAML is not installed, built-in defaults are used. Example configuration keys (see hbd/server/config.py):

  • hb_port: UDP port to listen for heartbeats (default: 50003)
  • hbd_port: internal control port (default: 50004)
  • hbd_host: bind address for HTTP/WSS
  • pickfile: path for persisted state
  • logfile: path to log file
  • pushsrv: push service (pushover|mattermost|all)
  • interval / grace: heartbeat timing configuration
  • dyndomains: list of dyndomains to update via nsupdate
  • nsupdate_bin: path to nsupdate binary
  • ws_port: port for plain WebSocket connections (default: 50005)
  • wss_port: port for secure WebSocket (WSS) connections (default: none). If set, hbd will attempt to serve WSS on this port when wss_pem and wss_key SSL files are available under cert_path (see below).
  • cert_path: directory where TLS certificate and key are looked up (default: /usr/local/etc/ssl/)
  • wss_pem: filename for the certificate chain (default: fullchain.pem)
  • wss_key: filename for the private key (default: privkey.pem)
  • users: mapping of username → user attributes (full_name, avatar, password, admin, notification_channels)
  • default_owner: username that owns hosts with no explicit owner (falls back to first admin user)

Example .hb.yaml (minimal):

hbd_host: 0.0.0.0
hbd_port: 50004
dyndomains:
  - example.com
nsupdate_bin: /usr/bin/nsupdate
pushsrv: pushover

Tip: SERVER_DEFAULTS in hbd/server/config.py contains the canonical defaults and accepted configuration keys.


🔧 Architecture & Modules

The package is organized into three subpackages:

hbd.common — shared code used by both client and server:

  • hbd.common.proto — serialization/deserialization of heartbeat messages (supports compressed payloads and plugin data)
  • hbd.common.utils — small utility helpers (shortname, dur, initlog)

hbd.server — the heartbeat daemon (hbd):

  • hbd.server.cli — CLI entrypoint and argument parsing
  • hbd.server.main — async orchestration to run UDP/HTTP/WSS components
  • hbd.server.udp — UDP parsing and handle_datagram implementation (main state machine)
  • hbd.server.dnscreate_nsupdate_payload, nsupdate, and an asyncio DNS worker (start_dns_worker). The DNS worker runs as an asyncio task and the package exposes a small thread-safe bridge so legacy synchronous code can put() updates into the queue.
  • hbd.server.notify — email and push notification helpers
  • hbd.server.ws — WebSocket server and thread-safe broadcast helpers
  • hbd.server.http — HTTP handler factory for the status UI/API
  • hbd.server.journal — message journal with size-based log rotation and backup management
  • hbd.server.threshold — threshold alerting engine
  • hbd.server.monitor — host state monitoring
  • hbd.server.hbdclassHost class and shared server state
  • hbd.server.config — configuration loader and defaults

hbd.client — the heartbeat client (hbc):

  • hbd.client.main — client entrypoint; sends heartbeats and plugin data to the server
  • hbd.client.plugin — plugin framework with base classes, registry, and dynamic loader
  • hbd.client.plugins/ — built-in plugins (os_info, cpu_monitor, memory_monitor, disk_monitor, network_monitor, filesystem_info, nagios_runner)
  • hbd.client.config — client configuration loader

This modular layout makes the code easier to test and maintain.

Runtime & Shutdown

  • The main runtime is asyncio-based. Services (UDP listener, HTTP server, WebSocket server, monitor, and DNS worker) run as asyncio tasks.
  • On SIGINT/SIGTERM the server triggers a graceful shutdown: it cancels active tasks, signals the DNS worker via a sentinel, and cleans up resources before exit.
  • The DNS update worker is implemented as an asyncio task; synchronous producers can still enqueue DNS updates via a small thread-safe bridge available at hbd.server.hbdclass.Host.dnsQ.

Templates & Static Files

  • Template files are located under hbd/server/templates. The HTTP server resolves templates relative to the hbd.server package but the path can be overridden with the templates_dir config key.
  • Static assets (CSS/JS/images) are served from hbd/server/static via the /static/<path> HTTP route.

🧪 Testing & Dev

Tests are implemented using unittest and additional tests rely on pytest if you prefer. To run tests locally without installing anything beyond the dev requirements:

# with project root on PYTHONPATH
PYTHONPATH=. python -m unittest discover -v
# or with pytest if installed
pytest -q

Developer tooling included:

  • pyproject.toml — project metadata and dependencies
  • tox.ini — convenience wrappers for running tests, lint, and mypy

To run linters and type checks locally:

# after installing dev deps
tox -e lint
tox -e mypy

🚀 Running in production

  • Use your system service manager (systemd, launchd, etc.) to run hbd in the background.
  • Ensure nsupdate and necessary credentials are available for dynamic DNS updates.
  • Configure TLS for WSS if you enable secure websockets.

Note: The project contains a small example for obtaining DNS-verified certs (certbot with RFC2136) — see earlier commit history or ask me to re-add the example to this README if you want it documented here.


🤝 Contributing

Contributions welcome! Please:

  1. Open an issue to discuss larger changes.
  2. Create a topic branch and a clear PR.
  3. Add tests for new features and run linters.
  4. Keep changes focused and documented.

📜 License

This repository is licensed under the MIT license. See LICENSE for details.


If you'd like, I can also:

  • add a GitHub Actions workflow that runs tests and lint on push/PR 🔁
  • add a CONTRIBUTING.md template for PRs and code style 💬

Which one should I do next?

Requirements

Requires Python: >=3.11
Details
PyPI
2026-04-12 13:06:38 -04:00
33
heartbeat contributors
332 KiB
Assets (2)
Versions (5) View all
5.1.1 2026-04-12
5.1.0 2026-04-11
5.0.12 2026-04-08
5.0.11 2026-04-07
5.0.9 2026-04-07