heartbeat

Public Access

Author	SHA1	Message	Date
andreas	3301dbfe34	feat: owner Update/Delete buttons on Host Overview; purge stale alerts on reload Host Overview (plugins.html): show Update and Delete buttons in the host-right zone when the logged-in user is the host owner (or admin / unauthenticated mode). Buttons link to /u?h=<host> and /d?h=<host> with stopPropagation so they don't toggle the accordion; Delete prompts for confirmation first. ThresholdChecker.purge_stale_alerts(): removes alert states whose metric_path has no matching threshold in the current config. Called after startup pickle restore and after every SIGHUP config reload so alerts orphaned by upgrades or config changes do not persist indefinitely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 08:03:46 -04:00
andreas	d00d903e7d	fix: make Alerts page scrollable Override the global style.css body height/overflow that locks all pages to the viewport height (a remnant of the old drawer-menu layout). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 13:33:08 +02:00
Andreas Wrede	a99b6b54c7	feat: add alert pie chart to nav bar Show a colour-coded pie chart (red=critical, yellow=warning, green=ok) to the left of the clock in the nav bar. Backed by a new GET /api/0/alert_summary endpoint that counts hosts per alert level for the current user's visible hosts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-03 13:45:15 -04:00
Andreas Wrede	a76d0fc840	feat: generic ping_monitor thresholds; round RTT to nearest ms - threshold.py: add _find_threshold() with suffix fallback so thresholds like ping_monitor.rtt_avg match ping_monitor.8_8_8_8_rtt_avg etc.; each pinged host keeps its own alert state - hbdclass.py: format RTT as integer ms (round()) - live.html: JS RTT display rounded to nearest ms (Math.round) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 06:08:11 -04:00
Andreas Wrede	ae60844a8a	feat: link hostnames in Live Dashboard to Host Overview Hostnames in the live dashboard table are now links to /plugins#hostname, which expands and scrolls to that host's card in the Host Overview page. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 14:37:08 -04:00
Andreas Wrede	49fa310361	feat: add Threshold Configurations section to settings page Reads threshold_configs (or legacy thresholds) from config and renders per-named-config tables showing metric path, operator, warning/critical values, hysteresis, and count. Disabled entries are dimmed. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 14:30:31 -04:00
Andreas Wrede	28e2180f7b	fix: suppress notifications on alert de-escalation (e.g. CRITICAL→WARNING) Only notify on worsening transitions (OK→WARNING, OK→CRITICAL, WARNING→CRITICAL) and recovery (any→OK). De-escalation within alert states no longer sends a duplicate notification since the metric never recovered. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 14:27:18 -04:00
Andreas Wrede	ce0590f015	fix: suppress recover messages for down durations under 4 seconds Transient blips caused by hbc client restarts no longer generate eventlog entries or notifications. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 14:18:58 -04:00
Andreas Wrede	72fc82b91f	feat: add ZFS pool renderer to Host Overview Add renderZfsTables() to plugins.html with health/capacity/frag/dedup table and cumulative I/O table; colour-code health and capacity thresholds; add zfs_monitor to plugin_order and summary/render dispatch. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 13:21:28 -04:00
Andreas Wrede	691f62aa69	feat: host-level watch flag suppresses notifications; filter dashboard/overview by owner/manager; add ZFS monitor plugin - watch: true (default) per host; watch: false suppresses all notifications for that host in udp.py and threshold.py - Live Dashboard and Host Overview now show only hosts where the logged-in user is owner or manager (admins see all); WebSocket broadcasts filtered per-connection by the same rule - Add hbd/client/plugins/zfs_monitor.py: collects per-pool health, capacity, fragmentation, dedup ratio, and cumulative I/O ops/bandwidth via zpool(8) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 12:42:35 -04:00
Andreas Wrede	cffc9805f9	fix: mask api_password and access_token in settings page; add List to threshold imports Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 11:51:55 -04:00
Andreas Wrede	917d6a401b	feat: composable threshold_config list for per-host threshold layering threshold_config in the hosts section now accepts a list of named configs applied left-to-right on top of the defaults, so focused override profiles can be mixed without duplication. Single-string and legacy host_threshold_mapping forms are unchanged. - Add threshold_raw_configs to store per-config overrides separately - Normalise threshold_config to list on parse (string or list) - get_thresholds_for_host folds the list over the default base - Update README and docs/THRESHOLD_ALERTING.md with examples Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 10:35:23 -04:00
Andreas Wrede	c4f09e9ced	version 5.1.8 Release / release (push) Successful in 5s Details - fix: matrix/sms_voipms notifications blocked the event loop on timeout; make send_notification async, dispatch all channel drivers as non-blocking tasks (asyncio.to_thread for sync drivers, asyncio.wait_for for async); update all call sites to fire-and-forget via create_task - feat: add /about page with version, runtime, uptime counter, and repo link - fix: hbc_mini plugin data format now matches full hbc client so Host Overview displays memory, disk, and network metrics correctly Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 05:33:27 -04:00
Andreas Wrede	64710fd4cd	tweak h1 margins	2026-05-01 04:51:11 -04:00
Andreas Wrede	1f5e7465a3	fix nav bar position	2026-05-01 04:32:04 -04:00
Andreas Wrede	b6dcce4f35	simplify eventlog usage, fix arguments	2026-04-30 15:38:46 -04:00
Andreas Wrede	c5ce41762e	feat: update hbc via hb_install.sh instead of code patching Server now sends a bare UPD command; client runs hb_install.sh to reinstall from the package registry, then restarts. hb_install.sh also copies itself alongside hbc on client installs. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-30 13:55:15 -04:00
Andreas Wrede	ddf7067d13	feat: redesign Plugin Metrics page as Host Overview Replace pill-tab plugin view with an accordion layout that shows key metrics (CPU%, MEM%, top disk%, net delta, nagios status) at a glance in each host card header. Plugin sections expand as structured tables. - Rename page to "Host Overview" (URL /plugins unchanged) - Three-wave parallel data loading: glance plugins on host expand, on-demand fetch for filesystem_info and extras - Per-plugin table renderers with inline percent bars and threshold colour coding - Add escHtml() for XSS-safe rendering of all field values - Remove stale planning docs (REFACTORING.md, hbd/Plan.md) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-30 08:12:07 -04:00
andreas	990c658e65	Apply grace period to all threshold alerts before logging/notifying Threshold alerts (plugin metrics, RTT) were firing immediately on the first breach. Now every state transition to WARNING/CRITICAL starts a grace-period timer (grace_seconds from the 'grace' config key). The notification is deferred until the next heartbeat after grace_seconds have elapsed. If the metric recovers within the grace window, both the alert and the recovery are suppressed — no spurious pages for transient spikes. Two helper methods added to ThresholdChecker: - _apply_grace: handles the state-change path (defer or suppress) - _check_pending_or_renotify: handles the stable-alert path (fire deferred notification once grace expires, or fall through to reminders) The overdue case is unchanged — on_overdue already fires only after interval+grace seconds of silence, which is equivalent behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:00:40 +02:00
andreas	b78d6ac0fe	Fix RECOVER routing: use consistent level name and route via alerted channel threshold.py was emitting level="RECOVERED" for metric recoveries, which failed the is_recover check in send_notification (which only matched "RECOVER"), bypassing _alerted_channels routing and the min_level bypass added in the previous commit. Changed to "RECOVER" so all recovery paths are consistent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 11:29:04 +02:00
andreas	afd5060f59	Fix early reminder notifications and lost recovery notifications - AlertState.update() now resets last_notification when the alert level changes, so a WARNING→CRITICAL escalation restarts the reminder interval rather than inheriting a nearly-expired timer. - _dispatch_to_channel() bypasses min_level for RECOVER, so recovery notifications are delivered even after a server restart when _alerted_channels is empty and the fallback dispatch path is used. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 18:11:22 +02:00
Andreas Wrede	5c382d2b8d	One more nit	2026-04-13 09:31:35 -04:00
Andreas Wrede	35bba451f5	Various formating nits	2026-04-13 09:27:51 -04:00
Andreas Wrede	80edfba0c0	fix inconsistencies in page layout, add swiss clock	2026-04-13 08:45:50 -04:00
Andreas Wrede	6bc8de192e	fix non-alerting of overdue hosts	2026-04-12 18:44:36 -04:00
Andreas Wrede	d0c8c186f4	Fix typo	2026-04-12 13:04:17 -04:00
Andreas Wrede	19f7c8312e	Mkae columns sortabel agian, check hbc version, provide modile html pages	2026-04-12 12:53:00 -04:00
Andreas Wrede	24b0e362fb	provide cli function stop, restart and reload for hbd Thought for 1s	2026-04-12 12:06:07 -04:00
Andreas Wrede	3a030548c0	Fix profile not updating	2026-04-12 11:57:12 -04:00
Andreas Wrede	094cb7ed9d	Merge branch 'master' of git.wrede.ca:andreas/heartbeat	2026-04-12 11:23:28 -04:00
Andreas Wrede	0199ca4693	re-factor notifications, add sms and matrix as channels	2026-04-12 11:21:21 -04:00
Andreas Wrede	75344ebbbd	re-factor notifications, add sms and matrix as channels	2026-04-12 11:04:00 -04:00
Andreas Wrede	7f049a4e26	accept websocket connection on http:.../ws	2026-04-12 06:44:32 -04:00
Andreas Wrede	6217f7a124	fix bogus notification on new clients	2026-04-10 13:39:18 -04:00
Andreas Wrede	2468386f24	adjust default log, pick and config locations. renotify on critical only, make user sessions persistem	2026-04-10 13:24:57 -04:00
Andreas Wrede	2015195112	Grace interval on restart of hbd, fix SIGHUP processing	2026-04-10 12:58:38 -04:00
Andreas Wrede	3426185383	Set SO_TIMESTAMP correctly for the various platforms	2026-04-10 11:19:47 -04:00
Andreas Wrede	9eedbafe97	Show overdue in alerts instead of null	2026-04-10 09:20:28 -04:00
Andreas Wrede	a5f31c5cb5	update picked data strucures	2026-04-10 09:18:38 -04:00
Andreas Wrede	2f72cf0118	typo	2026-04-10 09:17:57 -04:00
Andreas Wrede	ba27d2e300	Add count to rtt threshold	2026-04-10 08:07:50 -04:00
Andreas Wrede	381e37efce	fix log-section height	2026-04-10 08:01:22 -04:00
Andreas Wrede	97dfc08f4d	fix log level settiung	2026-04-10 08:00:51 -04:00
Andreas Wrede	d281ac5a70	provide defaults for threshold_configs	2026-04-10 07:47:39 -04:00
andreas	d77277857f	Add user management and a settings page	2026-04-08 16:21:55 -04:00
Andreas Wrede	8421f472f2	there is only one __version__	2026-04-07 11:00:22 -04:00
Andreas Wrede	51f9bdc2b5	use SO_TIMESTAMP, works on Linux, FreeBSD and macOS	2026-04-07 10:46:54 -04:00
andreas	02bc42fbf0	get rtt time differently	2026-04-07 10:40:12 -04:00
andreas	832a8b0bda	save state to pickle file, restart timers on restart	2026-04-06 17:24:59 -04:00
Andreas Wrede	73aa89f8f4	fix web page issues	2026-04-04 12:43:30 -04:00

1 2

59 Commits