fix: correct ZFS pool status threshold operator and add per-metric grace

The default zfs_monitor.*.status threshold used operator '>' with warning=1,
so a DEGRADED pool (status=1) never alerted (1 > 1 is false) and a FAULTED
pool (status=2) only triggered WARNING instead of CRITICAL.

Fix the operator to '>=' in THRESHOLD_DEFAULTS and the example config.

Also adds a per-metric grace period override (ThresholdConfig.grace) so
individual thresholds can bypass or shorten the global grace delay. Alerts
with grace=0 fire immediately on state change rather than waiting for a
second collection cycle. Sets grace=0 on zfs_monitor.*.status so pool
degradation alerts fire on the first data report after the event.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Andreas Wrede
2026-05-13 06:33:06 -04:00
parent 4160e34a96
commit 0f90be659e
5 changed files with 49 additions and 16 deletions
+1
View File
@@ -248,6 +248,7 @@ def get_settings_sections(config: dict, threshold_checker=None) -> list:
"count": tc.count,
"enabled": tc.enabled,
"display": tc.display or "",
"grace": tc.grace,
}
threshold_config_list = []