When NagiosRunnerPlugin has no commands configured, set skip_reason before
returning False from initialize(). This allows PluginLoader to log INFO
(not WARNING) when the plugin is skipped.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CLIENT_DEFAULTS seeds "plugins": {} so raw_config.get("plugins", raw_config)
always returned the empty subdict instead of falling back to the full config.
Plugins configured at top-level (e.g. nagios_runner: ...) were therefore
never found, resulting in "No Nagios commands configured".
Now checks the plugins subdict first, then top-level keys, so both
config layouts work correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Threshold alerts (plugin metrics, RTT) were firing immediately on the
first breach. Now every state transition to WARNING/CRITICAL starts a
grace-period timer (grace_seconds from the 'grace' config key). The
notification is deferred until the next heartbeat after grace_seconds
have elapsed. If the metric recovers within the grace window, both the
alert and the recovery are suppressed — no spurious pages for transient
spikes.
Two helper methods added to ThresholdChecker:
- _apply_grace: handles the state-change path (defer or suppress)
- _check_pending_or_renotify: handles the stable-alert path (fire
deferred notification once grace expires, or fall through to reminders)
The overdue case is unchanged — on_overdue already fires only after
interval+grace seconds of silence, which is equivalent behaviour.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
threshold.py was emitting level="RECOVERED" for metric recoveries, which
failed the is_recover check in send_notification (which only matched "RECOVER"),
bypassing _alerted_channels routing and the min_level bypass added in the
previous commit. Changed to "RECOVER" so all recovery paths are consistent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- AlertState.update() now resets last_notification when the alert level
changes, so a WARNING→CRITICAL escalation restarts the reminder interval
rather than inheriting a nearly-expired timer.
- _dispatch_to_channel() bypasses min_level for RECOVER, so recovery
notifications are delivered even after a server restart when
_alerted_channels is empty and the fallback dispatch path is used.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>