Host Overview (plugins.html): show Update and Delete buttons in the
host-right zone when the logged-in user is the host owner (or admin /
unauthenticated mode). Buttons link to /u?h=<host> and /d?h=<host>
with stopPropagation so they don't toggle the accordion; Delete prompts
for confirmation first.
ThresholdChecker.purge_stale_alerts(): removes alert states whose
metric_path has no matching threshold in the current config. Called
after startup pickle restore and after every SIGHUP config reload so
alerts orphaned by upgrades or config changes do not persist
indefinitely.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Override the global style.css body height/overflow that locks all pages
to the viewport height (a remnant of the old drawer-menu layout).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Show a colour-coded pie chart (red=critical, yellow=warning, green=ok)
to the left of the clock in the nav bar. Backed by a new
GET /api/0/alert_summary endpoint that counts hosts per alert level
for the current user's visible hosts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- threshold.py: add _find_threshold() with suffix fallback so thresholds
like ping_monitor.rtt_avg match ping_monitor.8_8_8_8_rtt_avg etc.;
each pinged host keeps its own alert state
- hbdclass.py: format RTT as integer ms (round())
- live.html: JS RTT display rounded to nearest ms (Math.round)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Hostnames in the live dashboard table are now links to /plugins#hostname,
which expands and scrolls to that host's card in the Host Overview page.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Only notify on worsening transitions (OK→WARNING, OK→CRITICAL,
WARNING→CRITICAL) and recovery (any→OK). De-escalation within alert
states no longer sends a duplicate notification since the metric never
recovered.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Add renderZfsTables() to plugins.html with health/capacity/frag/dedup
table and cumulative I/O table; colour-code health and capacity thresholds;
add zfs_monitor to plugin_order and summary/render dispatch.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- watch: true (default) per host; watch: false suppresses all notifications
for that host in udp.py and threshold.py
- Live Dashboard and Host Overview now show only hosts where the logged-in
user is owner or manager (admins see all); WebSocket broadcasts filtered
per-connection by the same rule
- Add hbd/client/plugins/zfs_monitor.py: collects per-pool health, capacity,
fragmentation, dedup ratio, and cumulative I/O ops/bandwidth via zpool(8)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
threshold_config in the hosts section now accepts a list of named
configs applied left-to-right on top of the defaults, so focused
override profiles can be mixed without duplication. Single-string
and legacy host_threshold_mapping forms are unchanged.
- Add threshold_raw_configs to store per-config overrides separately
- Normalise threshold_config to list on parse (string or list)
- get_thresholds_for_host folds the list over the default base
- Update README and docs/THRESHOLD_ALERTING.md with examples
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- fix: matrix/sms_voipms notifications blocked the event loop on timeout;
make send_notification async, dispatch all channel drivers as non-blocking
tasks (asyncio.to_thread for sync drivers, asyncio.wait_for for async);
update all call sites to fire-and-forget via create_task
- feat: add /about page with version, runtime, uptime counter, and repo link
- fix: hbc_mini plugin data format now matches full hbc client so Host
Overview displays memory, disk, and network metrics correctly
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Server now sends a bare UPD command; client runs hb_install.sh to
reinstall from the package registry, then restarts. hb_install.sh
also copies itself alongside hbc on client installs.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Replace pill-tab plugin view with an accordion layout that shows key
metrics (CPU%, MEM%, top disk%, net delta, nagios status) at a glance
in each host card header. Plugin sections expand as structured tables.
- Rename page to "Host Overview" (URL /plugins unchanged)
- Three-wave parallel data loading: glance plugins on host expand,
on-demand fetch for filesystem_info and extras
- Per-plugin table renderers with inline percent bars and threshold
colour coding
- Add escHtml() for XSS-safe rendering of all field values
- Remove stale planning docs (REFACTORING.md, hbd/Plan.md)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Threshold alerts (plugin metrics, RTT) were firing immediately on the
first breach. Now every state transition to WARNING/CRITICAL starts a
grace-period timer (grace_seconds from the 'grace' config key). The
notification is deferred until the next heartbeat after grace_seconds
have elapsed. If the metric recovers within the grace window, both the
alert and the recovery are suppressed — no spurious pages for transient
spikes.
Two helper methods added to ThresholdChecker:
- _apply_grace: handles the state-change path (defer or suppress)
- _check_pending_or_renotify: handles the stable-alert path (fire
deferred notification once grace expires, or fall through to reminders)
The overdue case is unchanged — on_overdue already fires only after
interval+grace seconds of silence, which is equivalent behaviour.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
threshold.py was emitting level="RECOVERED" for metric recoveries, which
failed the is_recover check in send_notification (which only matched "RECOVER"),
bypassing _alerted_channels routing and the min_level bypass added in the
previous commit. Changed to "RECOVER" so all recovery paths are consistent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- AlertState.update() now resets last_notification when the alert level
changes, so a WARNING→CRITICAL escalation restarts the reminder interval
rather than inheriting a nearly-expired timer.
- _dispatch_to_channel() bypasses min_level for RECOVER, so recovery
notifications are delivered even after a server restart when
_alerted_channels is empty and the fallback dispatch path is used.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>