heartbeat

Public Access

Author	SHA1	Message	Date
Andreas Wrede	a76d0fc840	feat: generic ping_monitor thresholds; round RTT to nearest ms - threshold.py: add _find_threshold() with suffix fallback so thresholds like ping_monitor.rtt_avg match ping_monitor.8_8_8_8_rtt_avg etc.; each pinged host keeps its own alert state - hbdclass.py: format RTT as integer ms (round()) - live.html: JS RTT display rounded to nearest ms (Math.round) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 06:08:11 -04:00
Andreas Wrede	94cbb31c48	version 5.1.15 Release / release (push) Successful in 6s Details	2026-05-02 14:37:11 -04:00
Andreas Wrede	ae60844a8a	feat: link hostnames in Live Dashboard to Host Overview Hostnames in the live dashboard table are now links to /plugins#hostname, which expands and scrolls to that host's card in the Host Overview page. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 14:37:08 -04:00
Andreas Wrede	49fa310361	feat: add Threshold Configurations section to settings page Reads threshold_configs (or legacy thresholds) from config and renders per-named-config tables showing metric path, operator, warning/critical values, hysteresis, and count. Disabled entries are dimmed. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 14:30:31 -04:00
Andreas Wrede	28e2180f7b	fix: suppress notifications on alert de-escalation (e.g. CRITICAL→WARNING) Only notify on worsening transitions (OK→WARNING, OK→CRITICAL, WARNING→CRITICAL) and recovery (any→OK). De-escalation within alert states no longer sends a duplicate notification since the metric never recovered. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 14:27:18 -04:00
Andreas Wrede	ce0590f015	fix: suppress recover messages for down durations under 4 seconds Transient blips caused by hbc client restarts no longer generate eventlog entries or notifications. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 14:18:58 -04:00
Andreas Wrede	f50acca509	version 5.1.14 Release / release (push) Successful in 5s Details	2026-05-02 13:21:40 -04:00
Andreas Wrede	72fc82b91f	feat: add ZFS pool renderer to Host Overview Add renderZfsTables() to plugins.html with health/capacity/frag/dedup table and cumulative I/O table; colour-code health and capacity thresholds; add zfs_monitor to plugin_order and summary/render dispatch. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 13:21:28 -04:00
Andreas Wrede	46f8c32c0b	version 5.1.13 Release / release (push) Successful in 5s Details	2026-05-02 12:43:06 -04:00
Andreas Wrede	691f62aa69	feat: host-level watch flag suppresses notifications; filter dashboard/overview by owner/manager; add ZFS monitor plugin - watch: true (default) per host; watch: false suppresses all notifications for that host in udp.py and threshold.py - Live Dashboard and Host Overview now show only hosts where the logged-in user is owner or manager (admins see all); WebSocket broadcasts filtered per-connection by the same rule - Add hbd/client/plugins/zfs_monitor.py: collects per-pool health, capacity, fragmentation, dedup ratio, and cumulative I/O ops/bandwidth via zpool(8) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 12:42:35 -04:00
Andreas Wrede	cffc9805f9	fix: mask api_password and access_token in settings page; add List to threshold imports Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 11:51:55 -04:00
Andreas Wrede	917d6a401b	feat: composable threshold_config list for per-host threshold layering threshold_config in the hosts section now accepts a list of named configs applied left-to-right on top of the defaults, so focused override profiles can be mixed without duplication. Single-string and legacy host_threshold_mapping forms are unchanged. - Add threshold_raw_configs to store per-config overrides separately - Normalise threshold_config to list on parse (string or list) - get_thresholds_for_host folds the list over the default base - Update README and docs/THRESHOLD_ALERTING.md with examples Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 10:35:23 -04:00
Andreas Wrede	2bd3a9beb6	feat: restart on SIGHUP in hbc and hbc_mini Sets dorestart and triggers a clean shutdown; os.execv re-execs the process with the original arguments after cleanup. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 10:06:26 -04:00
Andreas Wrede	5523c60866	version 5.1.12 Release / release (push) Successful in 5s Details	2026-05-02 08:56:04 -04:00
Andreas Wrede	26df08eeff	version 5.1.11 Release / release (push) Failing after 5s Details	2026-05-02 07:55:27 -04:00
Andreas Wrede	6fb67f8615	version 5.1.10 Release / release (push) Successful in 5s Details	2026-05-01 13:50:15 -04:00
Andreas Wrede	6aae2a1dab	version 5.1.9 Release / release (push) Successful in 6s Details	2026-05-01 11:13:51 -04:00
Andreas Wrede	c4f09e9ced	version 5.1.8 Release / release (push) Successful in 5s Details - fix: matrix/sms_voipms notifications blocked the event loop on timeout; make send_notification async, dispatch all channel drivers as non-blocking tasks (asyncio.to_thread for sync drivers, asyncio.wait_for for async); update all call sites to fire-and-forget via create_task - feat: add /about page with version, runtime, uptime counter, and repo link - fix: hbc_mini plugin data format now matches full hbc client so Host Overview displays memory, disk, and network metrics correctly Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-01 05:33:27 -04:00
Andreas Wrede	64710fd4cd	tweak h1 margins	2026-05-01 04:51:11 -04:00
Andreas Wrede	1f5e7465a3	fix nav bar position	2026-05-01 04:32:04 -04:00
Andreas Wrede	b290b21e23	track hbc type and version	2026-04-30 18:22:35 -04:00
Andreas Wrede	65c4267847	version 5.1.7 Release / release (push) Successful in 5s Details	2026-04-30 17:50:46 -04:00
Andreas Wrede	462a445235	feat: add hbc_mini single-file client; drop dead connections on protocol error - scripts/hbc_mini.py: self-contained hbc with no external deps; uses /proc for CPU/memory/network on Linux, df for disk, JSON config - hbc + hbc_mini: mark connection _dead and stop sending on protocol error - README: document hbc_mini usage, config, and plugin availability - pyproject.toml: include hbc_mini.py in script-files Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-30 17:50:19 -04:00
Andreas Wrede	6905bf266a	version 5.1.6 Release / release (push) Successful in 5s Details	2026-04-30 15:39:11 -04:00
Andreas Wrede	b6dcce4f35	simplify eventlog usage, fix arguments	2026-04-30 15:38:46 -04:00
Andreas Wrede	e6436fc236	version 5.1.5 Release / release (push) Successful in 5s Details	2026-04-30 13:55:21 -04:00
Andreas Wrede	c5ce41762e	feat: update hbc via hb_install.sh instead of code patching Server now sends a bare UPD command; client runs hb_install.sh to reinstall from the package registry, then restarts. hb_install.sh also copies itself alongside hbc on client installs. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-30 13:55:15 -04:00
Andreas Wrede	9af4006097	version 5.1.4 Release / release (push) Successful in 6s Details	2026-04-30 08:12:15 -04:00
Andreas Wrede	ddf7067d13	feat: redesign Plugin Metrics page as Host Overview Replace pill-tab plugin view with an accordion layout that shows key metrics (CPU%, MEM%, top disk%, net delta, nagios status) at a glance in each host card header. Plugin sections expand as structured tables. - Rename page to "Host Overview" (URL /plugins unchanged) - Three-wave parallel data loading: glance plugins on host expand, on-demand fetch for filesystem_info and extras - Per-plugin table renderers with inline percent bars and threshold colour coding - Add escHtml() for XSS-safe rendering of all field values - Remove stale planning docs (REFACTORING.md, hbd/Plan.md) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-30 08:12:07 -04:00
andreas	7d8ca5d8db	version 5.1.3 Release / release (push) Successful in 4s Details	2026-04-25 16:52:56 +02:00
andreas	65ceb31d8d	fix: use os.path.exists check for /dev/log instead of dead-code OSError catch	2026-04-25 16:36:00 +02:00
andreas	1c9b6c1ca9	fix: reconfigure logging to syslog after daemonize() instead of no-op basicConfig After daemonize() redirects stderr to /dev/null, the existing StreamHandler writes to /dev/null. logging.basicConfig() is a no-op when handlers are already configured, so log messages are silently lost. Replace the daemon block to: 1. Call daemonize() first 2. Explicitly remove existing handlers (pointing to /dev/null) 3. Add SysLogHandler pointing to /dev/log with fallback to UDP localhost:514 4. Log startup message to the new syslog handler Removes redundant syslog.openlog() call which is no longer needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:29:54 +02:00
andreas	d7e6b478e1	fix: use shlex.split() in nagios_runner path validation to handle quoted paths	2026-04-25 16:28:32 +02:00
andreas	535dbda47d	feat: validate absolute command paths at nagios_runner init	2026-04-25 16:24:33 +02:00
andreas	c9567dddae	fix: remove stale shell config key from NagiosRunnerPlugin docstring	2026-04-25 16:23:03 +02:00
andreas	b5963badd6	feat: async subprocess in nagios_runner with stderr capture and signal handling Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:18:09 +02:00
andreas	a76a39b4a0	fix: remove redundant no-commands log lines; fix skip_reason docstring style Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:15:44 +02:00
andreas	94e1597978	feat: set skip_reason on nagios_runner when no commands configured When NagiosRunnerPlugin has no commands configured, set skip_reason before returning False from initialize(). This allows PluginLoader to log INFO (not WARNING) when the plugin is skipped. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:13:03 +02:00
andreas	c9c2ed772f	fix: document skip_reason in Plugin docstring; remove unused import in test	2026-04-25 16:10:35 +02:00
andreas	aeb78dcb8e	feat: add skip_reason to Plugin; improve PluginLoader init messaging	2026-04-25 16:08:07 +02:00
andreas	c70a4807dc	version 5.1.2 Release / release (push) Successful in 6s Details	2026-04-25 07:25:06 +02:00
andreas	1a470e7cfa	Fix plugin config lookup shadowed by CLIENT_DEFAULTS plugins key CLIENT_DEFAULTS seeds "plugins": {} so raw_config.get("plugins", raw_config) always returned the empty subdict instead of falling back to the full config. Plugins configured at top-level (e.g. nagios_runner: ...) were therefore never found, resulting in "No Nagios commands configured". Now checks the plugins subdict first, then top-level keys, so both config layouts work correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:58:42 +02:00
andreas	990c658e65	Apply grace period to all threshold alerts before logging/notifying Threshold alerts (plugin metrics, RTT) were firing immediately on the first breach. Now every state transition to WARNING/CRITICAL starts a grace-period timer (grace_seconds from the 'grace' config key). The notification is deferred until the next heartbeat after grace_seconds have elapsed. If the metric recovers within the grace window, both the alert and the recovery are suppressed — no spurious pages for transient spikes. Two helper methods added to ThresholdChecker: - _apply_grace: handles the state-change path (defer or suppress) - _check_pending_or_renotify: handles the stable-alert path (fire deferred notification once grace expires, or fall through to reminders) The overdue case is unchanged — on_overdue already fires only after interval+grace seconds of silence, which is equivalent behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 12:00:40 +02:00
andreas	b78d6ac0fe	Fix RECOVER routing: use consistent level name and route via alerted channel threshold.py was emitting level="RECOVERED" for metric recoveries, which failed the is_recover check in send_notification (which only matched "RECOVER"), bypassing _alerted_channels routing and the min_level bypass added in the previous commit. Changed to "RECOVER" so all recovery paths are consistent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-24 11:29:04 +02:00
andreas	afd5060f59	Fix early reminder notifications and lost recovery notifications - AlertState.update() now resets last_notification when the alert level changes, so a WARNING→CRITICAL escalation restarts the reminder interval rather than inheriting a nearly-expired timer. - _dispatch_to_channel() bypasses min_level for RECOVER, so recovery notifications are delivered even after a server restart when _alerted_channels is empty and the fallback dispatch path is used. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-22 18:11:22 +02:00
Andreas Wrede	5c382d2b8d	One more nit	2026-04-13 09:31:35 -04:00
Andreas Wrede	35bba451f5	Various formating nits	2026-04-13 09:27:51 -04:00
Andreas Wrede	80edfba0c0	fix inconsistencies in page layout, add swiss clock	2026-04-13 08:45:50 -04:00
Andreas Wrede	6bc8de192e	fix non-alerting of overdue hosts	2026-04-12 18:44:36 -04:00
Andreas Wrede	ab33d81b30	catch syntax wanring when parsing version string	2026-04-12 16:39:51 -04:00

1 2 3 4 5 ...

292 Commits