Commit Graph

308 Commits

Author SHA1 Message Date
andreas c9567dddae fix: remove stale shell config key from NagiosRunnerPlugin docstring 2026-04-25 16:23:03 +02:00
andreas b5963badd6 feat: async subprocess in nagios_runner with stderr capture and signal handling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:18:09 +02:00
andreas a76a39b4a0 fix: remove redundant no-commands log lines; fix skip_reason docstring style
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:15:44 +02:00
andreas 94e1597978 feat: set skip_reason on nagios_runner when no commands configured
When NagiosRunnerPlugin has no commands configured, set skip_reason before
returning False from initialize(). This allows PluginLoader to log INFO
(not WARNING) when the plugin is skipped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:13:03 +02:00
andreas c9c2ed772f fix: document skip_reason in Plugin docstring; remove unused import in test 2026-04-25 16:10:35 +02:00
andreas aeb78dcb8e feat: add skip_reason to Plugin; improve PluginLoader init messaging 2026-04-25 16:08:07 +02:00
andreas c70a4807dc version 5.1.2
Release / release (push) Successful in 6s
2026-04-25 07:25:06 +02:00
andreas 1a470e7cfa Fix plugin config lookup shadowed by CLIENT_DEFAULTS plugins key
CLIENT_DEFAULTS seeds "plugins": {} so raw_config.get("plugins", raw_config)
always returned the empty subdict instead of falling back to the full config.
Plugins configured at top-level (e.g. nagios_runner: ...) were therefore
never found, resulting in "No Nagios commands configured".

Now checks the plugins subdict first, then top-level keys, so both
config layouts work correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:58:42 +02:00
andreas 990c658e65 Apply grace period to all threshold alerts before logging/notifying
Threshold alerts (plugin metrics, RTT) were firing immediately on the
first breach. Now every state transition to WARNING/CRITICAL starts a
grace-period timer (grace_seconds from the 'grace' config key). The
notification is deferred until the next heartbeat after grace_seconds
have elapsed. If the metric recovers within the grace window, both the
alert and the recovery are suppressed — no spurious pages for transient
spikes.

Two helper methods added to ThresholdChecker:
- _apply_grace: handles the state-change path (defer or suppress)
- _check_pending_or_renotify: handles the stable-alert path (fire
  deferred notification once grace expires, or fall through to reminders)

The overdue case is unchanged — on_overdue already fires only after
interval+grace seconds of silence, which is equivalent behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:00:40 +02:00
andreas b78d6ac0fe Fix RECOVER routing: use consistent level name and route via alerted channel
threshold.py was emitting level="RECOVERED" for metric recoveries, which
failed the is_recover check in send_notification (which only matched "RECOVER"),
bypassing _alerted_channels routing and the min_level bypass added in the
previous commit. Changed to "RECOVER" so all recovery paths are consistent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 11:29:04 +02:00
andreas afd5060f59 Fix early reminder notifications and lost recovery notifications
- AlertState.update() now resets last_notification when the alert level
  changes, so a WARNING→CRITICAL escalation restarts the reminder interval
  rather than inheriting a nearly-expired timer.
- _dispatch_to_channel() bypasses min_level for RECOVER, so recovery
  notifications are delivered even after a server restart when
  _alerted_channels is empty and the fallback dispatch path is used.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 18:11:22 +02:00
Andreas Wrede 5c382d2b8d One more nit 2026-04-13 09:31:35 -04:00
Andreas Wrede 35bba451f5 Various formating nits 2026-04-13 09:27:51 -04:00
Andreas Wrede 80edfba0c0 fix inconsistencies in page layout, add swiss clock 2026-04-13 08:45:50 -04:00
Andreas Wrede 6bc8de192e fix non-alerting of overdue hosts 2026-04-12 18:44:36 -04:00
Andreas Wrede ab33d81b30 catch syntax wanring when parsing version string 2026-04-12 16:39:51 -04:00
Andreas Wrede 1366c69cdc version 5.1.1
Release / release (push) Successful in 5s
2026-04-12 13:06:30 -04:00
Andreas Wrede d0c8c186f4 Fix typo 2026-04-12 13:04:17 -04:00
Andreas Wrede 19f7c8312e Mkae columns sortabel agian, check hbc version, provide modile html pages 2026-04-12 12:53:00 -04:00
Andreas Wrede 24b0e362fb provide cli function stop, restart and reload for hbd
Thought for 1s
2026-04-12 12:06:07 -04:00
Andreas Wrede 3a030548c0 Fix profile not updating 2026-04-12 11:57:12 -04:00
Andreas Wrede 094cb7ed9d Merge branch 'master' of git.wrede.ca:andreas/heartbeat 2026-04-12 11:23:28 -04:00
Andreas Wrede 0199ca4693 re-factor notifications, add sms and matrix as channels 2026-04-12 11:21:21 -04:00
Andreas Wrede 75344ebbbd re-factor notifications, add sms and matrix as channels 2026-04-12 11:04:00 -04:00
Andreas Wrede 7f049a4e26 accept websocket connection on http:.../ws 2026-04-12 06:44:32 -04:00
Andreas Wrede daf5277507 version 5.1.0
Release / release (push) Successful in 5s
2026-04-11 15:26:37 -04:00
Andreas Wrede ee3b72878f Add a ping monitor 2026-04-11 15:25:23 -04:00
Andreas Wrede 6217f7a124 fix bogus notification on new clients 2026-04-10 13:39:18 -04:00
Andreas Wrede 2468386f24 adjust default log, pick and config locations. renotify on critical only, make user sessions persistem 2026-04-10 13:24:57 -04:00
Andreas Wrede 2015195112 Grace interval on restart of hbd, fix SIGHUP processing 2026-04-10 12:58:38 -04:00
Andreas Wrede 3426185383 Set SO_TIMESTAMP correctly for the various platforms 2026-04-10 11:19:47 -04:00
Andreas Wrede 9eedbafe97 Show overdue in alerts instead of null 2026-04-10 09:20:28 -04:00
Andreas Wrede a5f31c5cb5 update picked data strucures 2026-04-10 09:18:38 -04:00
Andreas Wrede 2f72cf0118 typo 2026-04-10 09:17:57 -04:00
Andreas Wrede e9aa7a6f8b info only if no nagios command is defined 2026-04-10 08:19:59 -04:00
Andreas Wrede ba27d2e300 Add count to rtt threshold 2026-04-10 08:07:50 -04:00
Andreas Wrede 381e37efce fix log-section height 2026-04-10 08:01:22 -04:00
Andreas Wrede 97dfc08f4d fix log level settiung 2026-04-10 08:00:51 -04:00
Andreas Wrede d281ac5a70 provide defaults for threshold_configs 2026-04-10 07:47:39 -04:00
andreas 79bf00abfd version 5.0.12
Release / release (push) Successful in 6s
2026-04-08 16:47:12 -04:00
andreas d77277857f Add user management and a settings page 2026-04-08 16:21:55 -04:00
Andreas Wrede 3232239a85 version 5.0.11
Release / release (push) Successful in 5s
2026-04-07 14:19:46 -04:00
Andreas Wrede 68b1c65384 version 5.0.10 2026-04-07 14:15:46 -04:00
Andreas Wrede 5edbaacf81 version 5.0.9
Release / release (push) Successful in 15s
2026-04-07 11:02:19 -04:00
Andreas Wrede 8421f472f2 there is only one __version__ 2026-04-07 11:00:22 -04:00
Andreas Wrede 51f9bdc2b5 use SO_TIMESTAMP, works on Linux, FreeBSD and macOS 2026-04-07 10:46:54 -04:00
andreas 02bc42fbf0 get rtt time differently 2026-04-07 10:40:12 -04:00
andreas 832a8b0bda save state to pickle file, restart timers on restart 2026-04-06 17:24:59 -04:00
Andreas Wrede 57c4b86430 version 5.0.8
Release / release (push) Successful in 6s
2026-04-04 15:18:12 -04:00
Andreas Wrede 8dd002d159 version 5.0.7
Release / release (push) Failing after 1s
2026-04-04 14:45:10 -04:00