heartbeat

Public Access

Author	SHA1	Message	Date
Andreas Wrede	691f62aa69	feat: host-level watch flag suppresses notifications; filter dashboard/overview by owner/manager; add ZFS monitor plugin - watch: true (default) per host; watch: false suppresses all notifications for that host in udp.py and threshold.py - Live Dashboard and Host Overview now show only hosts where the logged-in user is owner or manager (admins see all); WebSocket broadcasts filtered per-connection by the same rule - Add hbd/client/plugins/zfs_monitor.py: collects per-pool health, capacity, fragmentation, dedup ratio, and cumulative I/O ops/bandwidth via zpool(8) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 12:42:35 -04:00
Andreas Wrede	cffc9805f9	fix: mask api_password and access_token in settings page; add List to threshold imports Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 11:51:55 -04:00
Andreas Wrede	917d6a401b	feat: composable threshold_config list for per-host threshold layering threshold_config in the hosts section now accepts a list of named configs applied left-to-right on top of the defaults, so focused override profiles can be mixed without duplication. Single-string and legacy host_threshold_mapping forms are unchanged. - Add threshold_raw_configs to store per-config overrides separately - Normalise threshold_config to list on parse (string or list) - get_thresholds_for_host folds the list over the default base - Update README and docs/THRESHOLD_ALERTING.md with examples Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 10:35:23 -04:00
Andreas Wrede	c4f09e9ced	version 5.1.8 Release / release (push) Successful in 5s Details - fix: matrix/sms_voipms notifications blocked the event loop on timeout; make send_notification async, dispatch all channel drivers as non-blocking tasks (asyncio.to_thread for sync drivers, asyncio.wait_for for async); update all call sites to fire-and-forget via create_task - feat: add /about page with version, runtime, uptime counter, and repo link - fix: hbc_mini plugin data format now matches full hbc client so Host Overview displays memory, disk, and network metrics correctly Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 05:33:27 -04:00
andreas	990c658e65	Apply grace period to all threshold alerts before logging/notifying Threshold alerts (plugin metrics, RTT) were firing immediately on the first breach. Now every state transition to WARNING/CRITICAL starts a grace-period timer (grace_seconds from the 'grace' config key). The notification is deferred until the next heartbeat after grace_seconds have elapsed. If the metric recovers within the grace window, both the alert and the recovery are suppressed — no spurious pages for transient spikes. Two helper methods added to ThresholdChecker: - _apply_grace: handles the state-change path (defer or suppress) - _check_pending_or_renotify: handles the stable-alert path (fire deferred notification once grace expires, or fall through to reminders) The overdue case is unchanged — on_overdue already fires only after interval+grace seconds of silence, which is equivalent behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:00:40 +02:00
andreas	b78d6ac0fe	Fix RECOVER routing: use consistent level name and route via alerted channel threshold.py was emitting level="RECOVERED" for metric recoveries, which failed the is_recover check in send_notification (which only matched "RECOVER"), bypassing _alerted_channels routing and the min_level bypass added in the previous commit. Changed to "RECOVER" so all recovery paths are consistent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 11:29:04 +02:00
andreas	afd5060f59	Fix early reminder notifications and lost recovery notifications - AlertState.update() now resets last_notification when the alert level changes, so a WARNING→CRITICAL escalation restarts the reminder interval rather than inheriting a nearly-expired timer. - _dispatch_to_channel() bypasses min_level for RECOVER, so recovery notifications are delivered even after a server restart when _alerted_channels is empty and the fallback dispatch path is used. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 18:11:22 +02:00
Andreas Wrede	0199ca4693	re-factor notifications, add sms and matrix as channels	2026-04-12 11:21:21 -04:00
Andreas Wrede	2468386f24	adjust default log, pick and config locations. renotify on critical only, make user sessions persistem	2026-04-10 13:24:57 -04:00
Andreas Wrede	9eedbafe97	Show overdue in alerts instead of null	2026-04-10 09:20:28 -04:00
Andreas Wrede	a5f31c5cb5	update picked data strucures	2026-04-10 09:18:38 -04:00
Andreas Wrede	ba27d2e300	Add count to rtt threshold	2026-04-10 08:07:50 -04:00
Andreas Wrede	d281ac5a70	provide defaults for threshold_configs	2026-04-10 07:47:39 -04:00
Andreas Wrede	73aa89f8f4	fix web page issues	2026-04-04 12:43:30 -04:00
Andreas Wrede	941f3ea4b0	display and acknowledge alerts	2026-04-03 06:35:45 -04:00
Andreas Wrede	c5770006f7	hbc proper termination, hbd config reloadable	2026-04-02 07:17:00 -04:00
Andreas Wrede	460d2be9e9	Fix rtt, including bug in time compute	2026-04-01 19:41:53 -04:00
Andreas Wrede	090d341244	per-client threshold config	2026-04-01 15:22:42 -04:00
Andreas Wrede	079e84f729	display tag fro alterts, cleanup udp	2026-04-01 11:49:55 -04:00
Andreas Wrede	dd23d9d163	refactor monitor, add threshold rtesting	2026-03-31 12:22:03 -04:00
Andreas Wrede	ad7178ebcb	Move threshhold to server, move eventlog to notify	2026-03-29 20:29:33 -04:00

21 Commits