andreas 6282077fe0 fix: correct zero-safe pathconf checks and connectivity prefix match
- Use `is not None` for pathconf values so 0 is not silently dropped
- Broaden connectivity prefix check to catch bare "connectivity" key

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 13:07:54 -04:00
2026-04-11 15:25:23 -04:00
2026-05-21 22:34:59 -04:00
2026-05-30 15:17:36 -04:00
2026-06-06 08:28:43 -04:00
2026-05-12 23:45:55 -04:00
2026-06-06 08:28:43 -04:00
2026-06-06 08:28:43 -04:00

Heartbeat Daemon (hbd)

A lightweight UDP-based host monitoring system. Monitored hosts run a client (hbc) that sends periodic heartbeat packets and system metrics to a central server (hbd). The server tracks host reachability, evaluates metric thresholds, sends notifications, and serves a web dashboard.


Architecture

  [ host running hbc ]                [ server running hbd ]
  ┌────────────────────┐              ┌────────────────────────────┐
  │  heartbeat client  │  UDP 50003   │  heartbeat daemon          │
  │                    │ ──────────>  │                            │
  │  plugins:          │  HTB / PLG   │  host state tracking       │
  │  - cpu_monitor     │              │  threshold evaluation      │
  │  - memory_monitor  │  <────────── │  DNS updates (nsupdate)    │
  │  - disk_monitor    │  ACK/CMD/UPD │  notifications             │
  │  - nagios_runner   │              │  web dashboard + REST API  │
  │  - ...             │              │  WebSocket live updates    │
  └────────────────────┘              └────────────────────────────┘

Package: hbd v5.3.10 Python: 3.11+

Subpackages

Package Purpose
hbd.common Protocol encoding/decoding, shared utilities
hbd.server The hbd daemon
hbd.client The hbc client

Installation

Dependencies are declared in pyproject.toml. Install into a virtualenv:

# Server + client
pip install .

# Using the install script
scripts/hb_install.sh

Entry points:

  • hbd — server (hbd.server.cli:main)
  • hbc — client (hbd.client.main:main)

Runtime dependencies:

Component Packages
Both PyYAML ≥6.0
Client psutil ≥5.9.0
Server aiohttp ≥3.11, websockets ≥13.2, Jinja2 ≥3.1.6, ruamel.yaml ≥0.18, mattermostdriver ≥7.3.0, matrix-nio ≥0.24

Server (hbd)

Starting the server

# Foreground, verbose, with config file
hbd serve -c /etc/hb.yaml -f -v

# As a module
python -m hbd.server.cli serve -c /etc/hb.yaml

CLI subcommands

Command Description
hbd serve Start the daemon (default)
hbd passwd <username> Generate a password hash for config
hbd notify Test notification channels
hbd stop Stop a running daemon
hbd reload Reload config (send SIGHUP)
hbd restart Restart daemon

Configuration (~/.hb.yaml)

# Network
hb_port: 50003          # UDP port for heartbeat messages
hbd_port: 50004         # HTTP API / web UI port
hbd_host: ""            # Bind address (empty = all interfaces)
ws_port: 50005          # WebSocket port (plain)
wss_port: ~             # WebSocket port (TLS; requires cert_path/wss_pem/wss_key)

# Timing
interval: 20            # Expected heartbeat interval (seconds)
grace: 2                # Extra seconds before declaring a host overdue

# Persistence
pickfile: ~/.hb.pick    # Host state persistence
pidfile: ~/.hb.pid
logfile: ~/.hb.log

# Message journal
journal_enabled: true
journal_dir: /var/log/heartbeat
journal_file: messages.journal
journal_max_size: 104857600   # 100 MB
journal_max_backups: 10

# DNS
nsupdate_bin: /usr/bin/nsupdate
dyndomains:
  - example.com

# Threshold alert re-notification interval (seconds)
threshold_renotify_interval: 3600

# Notification channels
notification_channels:
  pushover_ops:
    type: pushover
    token: YOUR_APP_TOKEN
    user: YOUR_USER_KEY
  email_ops:
    type: email
    smtp_server: smtp.example.com
    port: 587
    user: alerts@example.com
    password: secret
    recipients: [ops@example.com]

# Users
users:
  alice:
    full_name: Alice Smith
    password: pbkdf2:sha256:...    # generate with: hbd passwd alice
    admin: true
    notification_channels: [pushover_ops]
  bob:
    password: pbkdf2:sha256:...
    notification_channels: [email_ops]

default_owner: alice

# Hosts
hosts:
  webserver01:
    dyndns: true          # Update DNS when address changes
    owner: alice
    managers: [bob]
    monitors: []
  database01:
    watch: false          # Suppress all notifications for this host

Send SIGHUP (or hbd reload) to reload configuration without restarting. Changes to ports, certificates, pickle path, and journal path require a full restart.

Persistence

Host state (reachability, plugin data, alert states) is saved to pickfile every 5 minutes and on clean shutdown. The server loads this state on startup.


Client (hbc)

Usage

# Basic — send heartbeats to a server
hbc your-server.example.com

# Multiple servers
hbc server1.example.com server2.example.com

# With config file, running as a daemon
hbc -d -c /etc/hbc.yaml your-server.example.com

# Send a boot message, then heartbeat normally
hbc -b your-server.example.com

# One-off message
hbc -m "maintenance starting" your-server.example.com

# Force IPv4 or IPv6 only
hbc -4 your-server.example.com
hbc -6 your-server.example.com

Options

Flag Description
-b, --boot Send a boot message at startup
-c, --config FILE Config file path (default: ~/.hbc.yaml)
-d, --daemon Daemonize (logs go to syslog)
-m, --message TEXT Send a one-off message and exit
-n, --name NAME Override reported hostname
-v, --verbose Verbose output
-x, --debug Debug level (repeatable)
-4 / -6 Restrict to IPv4 or IPv6

Configuration (~/.hbc.yaml)

hb_port: 50003         # Server UDP port
interval: 10           # Heartbeat interval (seconds)
owner: alice           # Optional: claim ownership of this host

plugins:
  cpu_monitor:
    interval: 300      # Override collection interval
    per_core: true     # Report per-core CPU usage
  memory_monitor:
    interval: 300
  disk_monitor:
    interval: 300
  network_monitor:
    interval: 300
  ping_monitor:
    interval: 60
    hosts: [8.8.8.8, 192.168.1.1]
  nagios_runner:
    interval: 300
    commands:
      - name: check_load
        command: /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6
      - name: check_disk_root
        command: /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
  zfs_monitor:
    interval: 300

Connection behaviour

  • The client sends heartbeats over UDP to each server address resolved from the hostname (IPv4 and IPv6).
  • If a connection fails to open at startup, IPv6 connections are dropped after 3 consecutive failures. IPv4 connections retry indefinitely.
  • In daemon mode (-d), all log output goes to syslog (LOG_DAEMON facility).

UDP Protocol

All messages are zlib-compressed key=value pairs with an ID prefix.

!<ID>: <zlib-compressed payload>

Payload format: key=value;key=value;...

Message Direction Purpose
HTB client → server Heartbeat (name, timestamp, RTT, acks, interval)
PLG client → server Plugin data (plugin name + metrics)
ACK server → client Acknowledgment
CMD server → client Execute a shell command on the client
UPD server → client Trigger self-update via hb_install.sh

Value encoding:

  • Floats: 5 decimal places
  • Lists/dicts: JSON prefixed with @
  • Booleans: 1 / 0

RTT is measured using kernel SO_TIMESTAMP when available (Linux, macOS, FreeBSD), falling back to application-layer timing.


Plugin System

Plugins run on the client and collect system metrics that are sent to the server as PLG messages.

Plugin types

Type interval When collected
InfoPlugin 0 Once at startup; re-collected on server request
MonitorPlugin 30 (default) Periodically on the configured interval

Built-in plugins

Plugin Type Data collected
os_info Info OS, kernel, distro, architecture, Python version, hbc version
cpu_monitor Monitor cpu_percent, per-core usage, load averages, process count, frequency
memory_monitor Monitor RAM and swap usage (ZFS ARC-aware)
disk_monitor Monitor Per-partition usage, disk I/O stats
network_monitor Monitor Per-interface byte/packet counts, connection count
ping_monitor Monitor RTT, packet loss, jitter per configured host
filesystem_info Info Mounted filesystems (excludes pseudo filesystems)
nagios_runner Monitor Output of configured Nagios-compatible check commands
zfs_monitor Monitor ZFS pool health, capacity, fragmentation, dedup ratio, I/O

Custom plugins

Create a .py file in hbd/client/plugins/:

from hbd.client.plugin import MonitorPlugin

class MyPlugin(MonitorPlugin):
    name = "my_plugin"
    interval = 60

    async def collect(self):
        return {"my_metric": 42}

initialize() is called once at load time; return False to disable the plugin (e.g., if a required binary is missing).

Nagios integration

The nagios_runner plugin executes any Nagios-compatible check binary:

plugins:
  nagios_runner:
    commands:
      - name: check_http
        command: /usr/lib/nagios/plugins/check_http -H example.com
  • Commands are validated (absolute paths, executable) at startup.
  • Exit codes map to OK / WARNING / CRITICAL / UNKNOWN.
  • Performance data fields are extracted and stored individually.
  • The nagios threshold operator maps exit codes directly to alert levels (see Threshold Alerting).

Threshold Alerting

The server evaluates plugin metrics against configurable thresholds and fires notifications on state changes.

Configuration

thresholds:
  cpu_monitor:
    cpu_percent:
      warning: 80.0
      critical: 90.0
      operator: ">"         # >, >=, <, <=, ==, != (default: >)
      hysteresis: 0.1       # 10%: recover at 81 when critical=90
      count: 1              # Require N consecutive breaches before alerting
      display: "CPU {cpu_percent}% (threshold: {op_symbol}{threshold_value})"

  memory_monitor:
    percent:
      warning: 85.0
      critical: 95.0

  disk_monitor:
    partitions:
      /:
        percent:
          warning: 80.0
          critical: 90.0
        free_gb:
          warning: 10.0
          critical: 5.0
          operator: "<"

  nagios_runner:
    status_code:
      operator: "nagios"    # 0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN
      display: "{check_name}: {output}"

Per-host threshold profiles

Named profiles let different hosts use different thresholds. A single name or a list is accepted; lists are applied left-to-right.

threshold_configs:
  default:
    thresholds:
      cpu_monitor:
        cpu_percent: {warning: 80, critical: 90}

  tight_cpu:
    thresholds:
      cpu_monitor:
        cpu_percent: {warning: 60, critical: 75}

hosts:
  web-01:
    threshold_config: default
  db-01:
    threshold_config: [default, tight_cpu]

Alert states

State Meaning
OK Metric within normal range
WARNING Metric crossed warning threshold
CRITICAL Metric crossed critical threshold
UNKNOWN Cannot determine (e.g. Nagios exit code 3)

Notifications are sent on state transitions (OK → WARNING, WARNING → CRITICAL, CRITICAL → OK). De-escalations (CRITICAL → WARNING) do not trigger a notification. Ongoing alerts generate a re-notification every threshold_renotify_interval seconds (default: 3600). Alerts can be acknowledged via the web UI or API to suppress re-notifications.

RTT thresholds

The server measures heartbeat round-trip time and supports RTT thresholds using the same format:

thresholds:
  rtt:
    webserver01:
      warning: 100.0    # ms
      critical: 500.0

Generic threshold matching

When a metric has no exact threshold entry, the server strips leading segments and retries. This allows one entry to cover all Nagios checks:

nagios_runner.check_disk_root_status_code → no match
nagios_runner.disk_root_status_code       → no match
nagios_runner.root_status_code            → no match
nagios_runner.status_code                 → matched ✓

The stripped prefix (check_disk_root) is available as {check_name} in the display template.

Display template variables

Variable Description
{value} Current metric value
{threshold_value} Threshold that was crossed
{op_symbol} Comparison operator
{check_name} Prefix stripped by generic matching
{metric_name} Full field name
{output} Nagios check output text
{status} Nagios status name (OK/WARNING/CRITICAL/UNKNOWN)
any plugin field Any field present in the plugin's data

Notification Channels

Notifications are dispatched to the host's owner, managers, and monitors. Each user specifies which channels to use.

Supported channel types

Type Required fields
pushover token, user
email smtp_server, recipients, sender, user, password, port
mattermost webhook_url, channel
matrix homeserver, user, password, room_id
signal phone_number, recipient
sms_voipms api_key, recipient

Each channel can set a min_level (WARNING or CRITICAL) to filter low-severity alerts.

Recovery notifications are only sent to channels that received the original alert.


Web Dashboard & HTTP API

The server exposes a web UI and REST API on hbd_port (default 50004).

Web pages

Path Description
/login Login form (shown automatically when auth is configured)
/live Real-time host connectivity, RTT, and message stream
/plugins/<host> Per-host plugin metrics
/alerts Active alerts with severity filtering
/settings Server config, users, notification channels, thresholds

Live views use WebSocket connections for real-time updates.

Non-admin users see only hosts where they have a role (monitor, manager, or owner). Admins see all hosts.

REST API

All endpoints are under /api/0/. When authentication is configured, include a session token:

# Log in, get a token
TOKEN=$(curl -s -X POST http://localhost:50004/api/0/auth/login \
  -H 'Content-Type: application/json' \
  -d '{"username":"alice","password":"secret"}' | jq -r .token)

# Use the token
curl -H "Authorization: Bearer $TOKEN" http://localhost:50004/api/0/hosts
Method Endpoint Description
GET /api/0/hosts All visible hosts
GET /api/0/alerts All active alerts
GET /api/0/alert_summary Count of ok/warning/critical
GET /api/0/messages Last 30 messages
GET /api/0/hosts/{host}/plugins All plugin data for host
GET /api/0/hosts/{host}/plugins/{plugin}?limit=N Plugin samples
GET /api/0/hosts/{host}/alerts Alert states for host
GET /api/0/hosts/{host}/access Access roles
PUT /api/0/hosts/{host}/access Update access roles
GET /api/0/hosts/{host}/info Host info (hbc version, thresholds)
POST /api/0/alerts/acknowledge Acknowledge alert
GET /api/0/users All users (admin only)
GET /api/0/users/me Current user profile
PUT /api/0/users/me Update own profile
POST /api/0/auth/login Create session
POST /api/0/auth/logout Destroy session
GET /api/0/config Server config (secrets redacted)
POST /api/0/config Update config
GET /api/0/config/backups List config backups
POST /api/0/config/rollback Roll back to previous config
GET /api/0/notification_channels List channels
POST /api/0/notification_channels Create channel
PUT /api/0/notification_channels/{name} Update channel
DELETE /api/0/notification_channels/{name} Delete channel

User Management & Authentication

When no users: block is in config, the server runs unauthenticated — all existing behaviour is preserved.

Roles

Role Capabilities
monitor View status, plugin data, alerts
manager monitor + queue commands, trigger DNS, queue upgrades
owner manager + drop host, transfer ownership, update access
admin Owner-level on all hosts + access to server config and users

Setup

users:
  alice:
    full_name: Alice Smith
    password: pbkdf2:sha256:...    # hbd passwd alice
    admin: true
    notification_channels: [pushover_ops]

default_owner: alice    # Owns any host with no explicit owner

hosts:
  webserver01:
    owner: alice
    managers: [bob]
    monitors: [carol]

Password hashing uses PBKDF2-HMAC-SHA256 (260,000 iterations). Sessions expire after 24 hours.

OAuth2 login (Gitea) is supported:

oauth:
  gitea:
    url: https://git.example.com
    client_id: xxx
    client_secret: yyy

Dynamic DNS

When dyndns: true is set on a host and dyndomains is configured, the server updates DNS via nsupdate whenever the host's source address changes.

nsupdate_bin: /usr/bin/nsupdate
dyndomains:
  - example.com

hosts:
  webserver01:
    dyndns: true

DNS updates run asynchronously in a background worker.


Message Journal

All received messages are logged in JSONL format with automatic size-based rotation.

journal_enabled: true
journal_dir: /var/log/heartbeat
journal_file: messages.journal
journal_max_size: 104857600    # 100 MB
journal_max_backups: 10

Example entry:

{"timestamp":1711234567.123,"datetime":"2026-03-28T12:34:56","source_ip":"192.168.1.100","source_port":50003,"message":{"ID":"HTB","name":"webserver01","interval":10}}

hbc_mini — Zero-dependency client

scripts/hbc_mini.py is a single-file client requiring only Python 3.8+ and no external packages. Copy it to any host and run directly.

python3 hbc_mini.py your-server.example.com
python3 hbc_mini.py -d your-server.example.com     # daemon mode
python3 hbc_mini.py -b your-server.example.com     # send boot message

Config: ~/.hbc.json (JSON format, same keys as ~/.hbc.yaml).

Available plugins:

Plugin Platform
os_info All
ping_monitor All
nagios_runner All (not Windows)
cpu_monitor Linux (/proc/stat; no per-core, no frequency)
memory_monitor Linux (/proc/meminfo)
disk_monitor Linux, macOS, BSD (df -P)
network_monitor Linux (/proc/net/dev)

Not available vs full hbc: no YAML config, no filesystem_info, no zfs_monitor, no IPv6 early-fail protection.


hbc_mini.c — C client

scripts/c/hbc_mini.c is a single-file C port of hbc_mini.py. It has no runtime dependencies beyond libc, zlib, pthreads, and libm, and runs on Linux, FreeBSD, NetBSD, and DragonFly BSD.

Build

cc -O2 -o hbc_mini scripts/c/hbc_mini.c -lz -lpthread -lm

Usage

The CLI is identical to hbc_mini.py:

./hbc_mini your-server.example.com
./hbc_mini -d your-server.example.com      # daemon mode (logs to syslog)
./hbc_mini -b your-server.example.com      # send boot message
./hbc_mini -m "note" your-server.example.com   # send one-shot message
./hbc_mini -4 your-server.example.com      # IPv4 only
./hbc_mini -6 your-server.example.com      # IPv6 only

Config: ~/.hbc.json (JSON, same keys as the Python version).

Architecture

The C client uses two threads:

  • Main thread — heartbeat sender loop + select()-based receive loop (1 s timeout). Sends HTB at the configured interval, receives ACK/CMD messages, and re-sends os_info on server request.
  • Monitor thread — all periodic plugins in a single thread with a 1-second sleep loop. Each plugin has its own next-run timestamp tracked independently.

SIGHUP causes the process to restart itself via execv(). SIGTERM/SIGINT trigger a clean shutdown (sends a shutdown heartbeat if -b was used).

Available plugins

Plugin Platform Data source
os_info Linux, FreeBSD, NetBSD, DragonFly uname(2), /etc/os-release, kern.osrelease sysctl
cpu_monitor Linux /proc/stat
cpu_monitor FreeBSD, DragonFly, NetBSD kern.cp_time sysctl
memory_monitor Linux /proc/meminfo (ZFS ARC-aware)
memory_monitor FreeBSD, DragonFly vm.stats.vm.* sysctl
memory_monitor NetBSD VM_UVMEXP sysctl
disk_monitor All df -P subprocess
network_monitor Linux /proc/net/dev
network_monitor FreeBSD, NetBSD, DragonFly getifaddrs() + AF_LINK
ping_monitor All ping subprocess
nagios_runner All popen() subprocess

cpu_monitor reports: cpu_percent, cpu_user, cpu_system, cpu_idle, cpu_iowait (Linux only), load averages, cpu_core_count, uptime_seconds.

memory_monitor reports: memory_total, memory_used, memory_available, memory_free, memory_percent, and swap fields when swap is present.

network_monitor reports per-interface cumulative bytes_recv/bytes_sent and interval deltas. The loopback interface (lo) is skipped by default; this is configurable:

{
  "plugins": {
    "network_monitor": {
      "skip_interfaces": ["lo", "docker0"]
    }
  }
}

disk_monitor reports per-mount total, used, free, percent. An optional mount filter restricts reporting to specific paths:

{
  "plugins": {
    "disk_monitor": {
      "mounts": ["/", "/data"]
    }
  }
}

Differences from hbc_mini.py

  • No filesystem_info or zfs_monitor plugins
  • UPD (self-update) messages are logged but not acted on
  • No IPv6 early-fail protection
  • Config is JSON only (~/.hbc.json), no YAML

Development

Running tests

PYTHONPATH=. python -m unittest discover -v
# or
pytest -q

Linting and type checking

tox -e lint
tox -e mypy

Debugging in VS Code

A .vscode/launch.json is included with configurations for running and attaching the debugger. Select the project .venv as the Python interpreter, then use F5.

To start with debugpy and wait for attach:

PYTHONPATH=. python -m debugpy --listen 5678 --wait-for-client -m hbd.server.cli serve -c .hb.yaml -f -v

License

MIT. See LICENSE for details.

S
Description
Machine heartbeat across the internet
Readme 5.6 MiB
v5.3.10 Latest
2026-06-06 08:33:18 -04:00
Languages
Python 66.3%
HTML 25%
C 5.5%
JavaScript 1.7%
Shell 0.6%
Other 0.9%