A Wrede andreas

Joined on 2022-01-20

hbd (5.0.12)

Published 2026-04-08 16:47:19 -04:00 by andreas

To install the package using pip, run the following command:

pip install --index-url  --extra-index-url https://pypi.org/simple hbd

For more information on the PyPI registry, see the documentation.

Heartbeat monitoring system — client (hbc) and server (hbd)

Heartbeat Daemon (hbd) ✅

A lightweight daemon that listens for UDP heartbeat messages and acts on them: keeps host state, optionally updates DNS records via nsupdate, forwards messages to WebSocket clients, and sends notifications (email, Pushover, Mattermost, Signal). It is a refactor of a previously monolithic script into a modular Python package (hbd).

📌 Features

Receive and parse heartbeat datagrams (text or zlib-compressed) ✅
Maintain host state and detect up/down transitions ✅
Queue DNS updates via nsupdate and run them in a background thread ✅
WebSocket API for live updates (hosts & messages) ✅
Notification pipeline (email, Pushover, Mattermost, Signal) ✅
User management & access control ✅
- Optional user accounts with bcrypt-style password hashing (stdlib only)
- Per-host roles: owner, manager, monitor
- Session-based auth with cookie support (browser login page included)
- Backwards compatible: no auth required when no users are configured
HTTP API & Web UI ✅
- REST API for plugin data, alerts, host information, and user management
- Live dashboard with WebSocket updates
- Interactive plugin metrics visualization
- Alerts dashboard with filtering and summaries
Message journal with automatic log rotation ✅
- Logs all received messages in JSON format
- Size-based automatic rotation
- Configurable retention and backup management
Plugin system for extensible monitoring ✅
- Collect system metrics (CPU, memory, disk, network)
- Execute existing Nagios monitoring plugins
- Create custom plugins with simple Python classes
Threshold alerting system ✅
- Monitor metrics against configurable WARNING/CRITICAL thresholds
- Hysteresis to prevent alert flapping
- Automatic notifications on state changes
- Re-notification for ongoing alerts
Modular codebase suitable for unit testing and CI ✅

🔌 Plugin System

Heartbeat includes a comprehensive plugin architecture that extends monitoring beyond simple heartbeats. The plugin system allows you to:

Collect system information: OS details, hardware info, system configuration
Monitor resources: CPU usage, memory, disk space, network statistics
Run Nagios plugins: Execute thousands of existing Nagios monitoring plugins without modification
Create custom plugins: Build your own monitoring logic with simple Python classes

Plugin Types

InfoPlugin: Collects static information once (e.g., OS version, hardware specs)
MonitorPlugin: Collects metrics periodically (e.g., CPU usage every 30 seconds)

Built-in Plugins

os_info: Collects OS, kernel, distribution, and architecture information
cpu_monitor: Monitors CPU usage, load average, frequency, and process counts
memory_monitor: Monitors RAM and swap usage, available memory
disk_monitor: Monitors disk usage, I/O statistics, and filesystem metrics
network_monitor: Monitors network interface statistics, bandwidth, and connections
filesystem_info: Collects mounted filesystem information (physical filesystems only by default)
nagios_runner: Executes Nagios monitoring plugins (check_disk, check_load, check_http, etc.)

Nagios Integration

The nagios_runner plugin provides seamless integration with the vast Nagios plugin ecosystem. You can run any Nagios-compatible plugin and have the results automatically parsed and stored:

Executes plugins via subprocess with timeout protection
Parses exit codes (OK/WARNING/CRITICAL/UNKNOWN)
Extracts performance data with thresholds
Reports aggregated status across all configured checks

See docs/NAGIOS_INTEGRATION.md for complete integration guide including configuration examples and custom plugin development.

Creating Custom Plugins

from hbd.plugin import MonitorPlugin

class DiskMonitorPlugin(MonitorPlugin):
    name = "disk_monitor"
    interval = 60  # Run every 60 seconds
    
    async def collect(self):
        return {
            "disk_usage": get_disk_usage(),
            "timestamp": time.time()
        }

Place plugins in hbd/plugins/ and they'll be automatically discovered and loaded by the client.

📝 Message Journal

Heartbeat includes a message journal that logs all received messages with automatic rotation.

Features

JSON Format: All messages logged in JSONL (JSON Lines) format for easy parsing
Automatic Rotation: Size-based rotation with configurable thresholds
Backup Management: Keeps configurable number of rotated log files
Non-blocking: Async logging with minimal performance impact

Configuration

# Message journal settings
journal_enabled: true                    # Enable/disable journaling
journal_dir: /var/log/heartbeat         # Journal directory
journal_file: messages.journal           # Base filename
journal_max_size: 104857600             # Max size (100MB default)
journal_max_backups: 10                 # Number of backups to keep

Example Journal Entry

{"timestamp":1711234567.123,"datetime":"2026-03-28T12:34:56","source_ip":"192.168.1.100","source_port":50003,"message":{"ID":"HTB","name":"webserver1","interval":30}}

Analyzing Journal Files

# View recent messages
tail -100 /var/log/heartbeat/messages.journal | jq .

# Count messages by type
cat /var/log/heartbeat/messages.journal | jq -r '.message.ID' | sort | uniq -c

# Filter by hostname
cat /var/log/heartbeat/messages.journal | jq 'select(.message.name == "webserver1")'

See docs/MESSAGE_JOURNAL.md for complete documentation including rotation behavior, integration with log management systems, and analysis examples.

🚨 Threshold Alerting

Heartbeat includes a sophisticated threshold alerting system that monitors plugin metrics and triggers notifications when values exceed configured limits.

Features

Multi-level alerts: WARNING and CRITICAL severity levels
Flexible operators: Support for >, >=, <, <=, ==, != comparisons
Hysteresis: Prevents alert flapping with configurable recovery thresholds
Smart notifications: Alerts only on state changes, not every check
Re-notifications: Periodic reminders for ongoing alerts
Journal integration: All threshold events logged for audit trail

Configuration

thresholds:
  # RTT (Round-Trip Time) thresholds for heartbeat monitoring
  # These are checked on every HTB message arrival
  rtt:
    webserver01:
      warning: 100.0   # Warn when RTT > 100ms
      critical: 500.0  # Critical when RTT > 500ms
    
    database01:
      warning: 50.0
      critical: 200.0
  
  # Plugin metric thresholds
  cpu_monitor:
    cpu_percent:
      warning: 80.0      # Warn when CPU > 80%
      critical: 90.0     # Critical when CPU > 90%
      operator: ">"
      hysteresis: 0.1    # 10% hysteresis to prevent flapping
  
  memory_monitor:
    percent:
      warning: 85.0
      critical: 95.0
  
  disk_monitor:
    partitions:
      /:
        percent:
          warning: 80.0
          critical: 90.0
        free_gb:
          warning: 10.0   # Alert when < 10GB free
          critical: 5.0
          operator: "<"   # Inverse threshold

# Global settings
threshold_renotify_interval: 3600  # Re-notify every hour for ongoing alerts

RTT Monitoring

Heartbeat monitors network latency (Round-Trip Time) for each host's heartbeat messages. RTT thresholds are fully integrated with the threshold alerting system:

Per-host configuration: Set different thresholds for each monitored host
Real-time checking: Thresholds evaluated on every HTB message arrival
Alert state tracking: RTT alerts use the same state management as plugin metrics
Hysteresis support: Configurable hysteresis prevents rapid state transitions
Alerts dashboard: RTT alerts visible on the /alerts web page alongside plugin alerts
Smart notifications: Only triggers on state changes (OK → WARNING → CRITICAL)
Re-notification: Periodic reminders for ongoing RTT issues
Event & journal logging: All RTT events logged for audit trail

Configuration format:

thresholds:
  rtt:
    <hostname>:
      warning: <milliseconds>   # Warn when RTT > this value
      critical: <milliseconds>  # Critical when RTT > this value
      hysteresis: 0.1           # Optional: 10% hysteresis (default)

Example alerts:

WARNING: webserver01 - rtt.webserver01 = 125.3
CRITICAL: database01 - rtt.database01 = 520.1
RECOVERED: webserver01 - rtt.webserver01 = 45.2 (WARNING -> OK)

RTT alerts appear on the Alerts dashboard and can be filtered by severity level. The metric_path format is rtt.<hostname>, making it easy to distinguish from plugin metrics.

Alert Behavior

State Changes: Notifications sent when crossing thresholds
- OK → WARNING: Early notification
- WARNING → CRITICAL: Escalation
- CRITICAL → OK: Recovery

Hysteresis: Prevents rapid state transitions

Critical threshold: 90%
Hysteresis: 10%
Recovery threshold: 81% (90 - 10% of 90)

Value 91% → CRITICAL (threshold crossed)
Value 85% → CRITICAL (still above 81%)
Value 79% → OK (below recovery threshold)

Re-notifications: Periodic reminders for ongoing alerts
- Default: Every 60 minutes
- Configurable via threshold_renotify_interval

Example Notifications

WARNING: webserver01 - cpu_monitor.cpu_percent = 85.0
CRITICAL: webserver01 - memory_monitor.percent = 96.0
RECOVERED: database01 - disk_monitor./.percent = 75.0 (WARNING -> OK)
REMINDER (CRITICAL): mailserver - cpu_monitor.load_1min = 12.5 (ongoing for 3600s)

Supported Metrics

All plugin metrics can be thresholded:

CPU: cpu_percent, load_1min, load_5min, load_15min
Memory: percent, available_mb, swap_percent
Disk: Per-partition percent, free_gb, free_mb
Network: errors_total, dropped packets, connection counts
Nagios: exit_code mapping (0=OK, 1=WARNING, 2=CRITICAL)

See docs/THRESHOLD_ALERTING.md for comprehensive documentation including best practices, troubleshooting, and advanced configuration.

👥 User Management

Heartbeat supports optional user accounts with role-based access control per host.

Roles

monitor — view status, plugin data, alerts
manager — monitor + queue commands, trigger DNS, queue upgrades
owner — manager + drop host, transfer ownership, update access
admin (user flag) — owner-level access on every host

When no users are configured the server runs in unauthenticated mode — all existing behaviour is unchanged.

Quick setup

users:
  alice:
    full_name: Alice Smith
    password: pbkdf2:sha256:...    # hbd passwd alice
    admin: true

default_owner: alice

hosts:
  webserver01:
    owner: alice
    managers: [bob]
    monitors: [carol]

# Generate a password hash
hbd passwd alice

Browser users are redirected to /login automatically. The session cookie is set on login, so fetch() calls from dashboards work without any JavaScript changes.

See docs/USERS.md for complete user management documentation.

🌐 HTTP API & Web UI

Heartbeat includes a built-in HTTP/WebSocket server that provides both a REST API and web-based dashboards for monitoring and visualization.

Features

User auth: Optional session-based authentication with per-host role enforcement
REST API: JSON endpoints for accessing plugin data, alerts, host information, and user management
Live Dashboard: Real-time WebSocket-powered host status view
Plugin Metrics: Interactive visualization of all plugin data with auto-refresh
Alerts Dashboard: Comprehensive alert monitoring with filtering and summaries

Web Dashboards

Login (/login): Browser login form (shown automatically when auth is configured)
Live View (/live): Real-time host connectivity, latency, and messages
Plugin Metrics (/plugins): Browse and visualize metrics from all plugins
Alerts Dashboard (/alerts): Monitor active alerts with severity filtering

API Endpoints

# Log in (when auth is configured)
TOKEN=$(curl -s -X POST http://localhost:50004/api/0/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"username":"alice","password":"secret"}' | jq -r .token)
AUTH="-H \"Authorization: Bearer $TOKEN\""

# List all monitored hosts
curl $AUTH http://localhost:50004/api/0/hosts

# Get all plugin data for a host
curl $AUTH http://localhost:50004/api/0/hosts/webserver01/plugins

# Get detailed plugin history (last 50 samples)
curl $AUTH "http://localhost:50004/api/0/hosts/webserver01/plugins/cpu_monitor?limit=50"

# Get alert states for a specific host
curl $AUTH http://localhost:50004/api/0/hosts/webserver01/alerts

# Get all active alerts across all hosts
curl $AUTH http://localhost:50004/api/0/alerts

# View/update host access roles
curl $AUTH http://localhost:50004/api/0/hosts/webserver01/access

See docs/HTTP_API.md for complete API documentation including response formats, error handling, and integration examples.

⚙️ Quickstart

Prerequisites:

Python 3.10+ (project uses language features from recent Python)
nsupdate (for DNS updates) if using dynamic DNS

Install dependencies (recommended into a venv):

This project now declares its dependencies in pyproject.toml. Instead of the old requirements.txt flow, install the package into a virtualenv using pip:

See scripts/install.sh for a way to install.

Run the daemon (example):

# run with default config lookup (~/.hb.yaml)
hbd -c .hb.yaml -f -v

You can also run it directly via the package entrypoint after installation:

python -m hbd.cli -c /path/to/config.yaml

Running the Client

The heartbeat client (hbc) sends periodic heartbeats and plugin data to the server:

# Basic usage pointing to server
python -m hbd.hbc --server your-server.example.com

# With custom configuration
python -m hbd.hbc --server 192.168.1.100 --port 50003 --interval 30

# Run with specific plugins enabled/disabled
python -m hbd.hbc --server hbd.local --disable-plugin os_info

Client configuration can also be specified in YAML:

server: hbd.example.com
port: 50003
interval: 30
plugins:
  cpu_monitor:
    interval: 300      # Check every 5 minutes (default)
    per_core: true
  memory_monitor:
    interval: 300      # Check every 5 minutes (default)
  disk_monitor:
    interval: 300      # Check every 5 minutes (default)
  network_monitor:
    interval: 300      # Check every 5 minutes (default)
  nagios_runner:
    interval: 300      # Check every 5 minutes (default)
    commands:
      - /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6
      - /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /

All monitoring plugins default to 5-minute (300 second) intervals, but can be customized as needed.

🐞 Debugging in VS Code

This repository includes a ready-to-use .vscode/launch.json with configurations to run or attach the VS Code debugger to hbd.

Ensure the Python extension is installed and select the project .venv as the interpreter (bottom-left of VS Code).
Use F5 and pick one of these configurations from the Run view:
- Python: Run hbd (module) — runs hbd.cli as a module and sets PYTHONPATH to the workspace root (recommended).
- Python: Run hbd with debugpy (listen) — launches debugpy and hbd together; useful when you want the process to listen for a debugger.
- Python: Attach (localhost:5678) — attach the debugger to a running process started with debugpy.

To start hbd manually and wait for the debugger to attach, run:

PYTHONPATH=. python -m debugpy --listen 5678 --wait-for-client -m hbd.cli -c .hb.yaml -f -v

Set breakpoints in modules such as hbd/udp.py, hbd/dns.py, or hbd/server.py, and use the Attach configuration to connect. Use justMyCode: false if you need to step into third-party code.

🛠 Configuration

hbd reads YAML configuration (optional). If PyYAML is not installed, built-in defaults are used. Example configuration keys (see hbd/config.py):

hb_port: UDP port to listen for heartbeats (default: 50003)
hbd_port: internal control port (default: 50004)
hbd_host: bind address for HTTP/WSS
pickfile: path for persisted state
logfile: path to log file
logfmt: text or msg
pushsrv: push service (pushover|mattermost|all)
interval / grace: heartbeat timing configuration
dyndomains: list of dyndomains to update via nsupdate
nsupdate_bin: path to nsupdate binary
ws_port: port for plain WebSocket connections (default: 50005)
wss_port: port for secure WebSocket (WSS) connections (default: none). If set, hbd will attempt to serve WSS on this port when wss_pem and wss_key SSL files are available under cert_path (see below).
cert_path: directory where TLS certificate and key are looked up (default: /usr/local/etc/ssl/)
wss_pem: filename for the certificate chain (default: fullchain.pem)
wss_key: filename for the private key (default: privkey.pem)
users: mapping of username → user attributes (full_name, avatar, password, admin, notification_channels)
default_owner: username that owns hosts with no explicit owner (falls back to first admin user)

Example .hb.yaml (minimal):

hbd_host: 0.0.0.0
hbd_port: 50004
dyndomains:
  - example.com
nsupdate_bin: /usr/bin/nsupdate
pushsrv: pushover

Tip: config.DEFAULTS in hbd/config.py contains the canonical defaults and accepted configuration keys.

🔧 Architecture & Modules

hbd.proto — serialization/deserialization of heartbeat messages (supports compressed payloads and plugin data)
hbd.udp — UDP parsing and handle_datagram implementation (main state machine)
hbd.dns — create_nsupdate_payload, nsupdate, and an asyncio DNS worker (start_dns_worker). The DNS worker now runs as an asyncio task and the package exposes a small thread-safe bridge so legacy synchronous code can put() updates into the queue; there is no longer a permanently-blocking background threading.Thread.
hbd.notify — email and push notification helpers
hbd.ws — WebSocket server and thread-safe broadcast helpers
hbd.http — HTTP handler factory for the status UI/API
hbd.journal — message journal with size-based log rotation and backup management
hbd.plugin — plugin framework with base classes, registry, and dynamic loader
hbd.plugins/ — built-in plugins (os_info, cpu_monitor, memory_monitor, disk_monitor, network_monitor, filesystem_info, nagios_runner)
hbd.hbc — heartbeat client that sends heartbeats and plugin data to server
hbd.utils — small utility helpers (shortname, dur, initlog)
hbd.cli — CLI entrypoint and argument parsing
hbd.server — async orchestration to run UDP/HTTP/WSS components

This modular layout makes the code easier to test and maintain.

Runtime & Shutdown

The main runtime is asyncio-based. Services (UDP listener, HTTP server, WebSocket server, monitor, and DNS worker) run as asyncio tasks.
On SIGINT/SIGTERM the server triggers a graceful shutdown: it cancels active tasks, signals the DNS worker via a sentinel, and cleans up resources before exit.
The DNS update worker is implemented as an asyncio task; synchronous producers can still enqueue DNS updates via a small thread-safe bridge available at hbd.hbdclass.Host.dnsQ.

Templates & Static Files

Template files are located under hbd/templates by default. The HTTP server resolves templates relative to the hbd package but the path can be overridden with the templates_dir config key.
Static assets (CSS/JS/images) are served from hbd/static via the /static/<path> HTTP route. Place your static files in that directory or configure the HTTP server as needed.

🧪 Testing & Dev

Tests are implemented using unittest and additional tests rely on pytest if you prefer. To run tests locally without installing anything beyond the dev requirements:

# with project root on PYTHONPATH
PYTHONPATH=. python -m unittest discover -v
# or with pytest if installed
pytest -q

Developer tooling included:

pyproject.toml — project metadata and dependencies
tox.ini — convenience wrappers for running tests, lint, and mypy

To run linters and type checks locally:

# after installing dev deps
tox -e lint
tox -e mypy

🚀 Running in production

Use your system service manager (systemd, launchd, etc.) to run hbd in the background.
Ensure nsupdate and necessary credentials are available for dynamic DNS updates.
Configure TLS for WSS if you enable secure websockets.

Note: The project contains a small example for obtaining DNS-verified certs (certbot with RFC2136) — see earlier commit history or ask me to re-add the example to this README if you want it documented here.

🤝 Contributing

Contributions welcome! Please:

Open an issue to discuss larger changes.
Create a topic branch and a clear PR.
Add tests for new features and run linters.
Keep changes focused and documented.

📜 License

This repository is licensed under the MIT license. See LICENSE for details.

If you'd like, I can also:

add a GitHub Actions workflow that runs tests and lint on push/PR 🔁
add a CONTRIBUTING.md template for PRs and code style 💬

Which one should I do next? ✨

Requires Python: >=3.11

Details

PyPI

2026-04-08 16:47:19 -04:00

heartbeat contributors

305 KiB

Assets (2)

hbd-5.0.12-py3-none-any.whl 155 KiB

hbd-5.0.12.tar.gz 150 KiB

Versions (5) View all

5.1.1

2026-04-12

5.1.0

2026-04-11

5.0.12

2026-04-08

5.0.11

2026-04-07

5.0.9

2026-04-07

hbd (5.0.12)

Installation

About this package

Heartbeat Daemon (hbd) ✅

📌 Features

🔌 Plugin System

Plugin Types

Built-in Plugins

Nagios Integration

Creating Custom Plugins

📝 Message Journal

Features

Configuration

Example Journal Entry

Analyzing Journal Files

🚨 Threshold Alerting

Features

Configuration

RTT Monitoring

Alert Behavior

Example Notifications

Supported Metrics

👥 User Management

Roles

Quick setup

🌐 HTTP API & Web UI

Features

Web Dashboards

API Endpoints

⚙️ Quickstart

Running the Client

🐞 Debugging in VS Code

🛠 Configuration

🔧 Architecture & Modules

🧪 Testing & Dev

🚀 Running in production

🤝 Contributing

📜 License

Requirements