fix: correct ZFS pool status threshold operator and add per-metric grace
The default zfs_monitor.*.status threshold used operator '>' with warning=1, so a DEGRADED pool (status=1) never alerted (1 > 1 is false) and a FAULTED pool (status=2) only triggered WARNING instead of CRITICAL. Fix the operator to '>=' in THRESHOLD_DEFAULTS and the example config. Also adds a per-metric grace period override (ThresholdConfig.grace) so individual thresholds can bypass or shorten the global grace delay. Alerts with grace=0 fire immediately on state change rather than waiting for a second collection cycle. Sets grace=0 on zfs_monitor.*.status so pool degradation alerts fire on the first data report after the event. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -61,6 +61,13 @@ def _insert_threshold_metric(thresholds: dict, metric_path: str, values: dict) -
|
||||
except (TypeError, ValueError):
|
||||
pass
|
||||
|
||||
grace = values.get("grace")
|
||||
if grace is not None:
|
||||
try:
|
||||
cfg["grace"] = float(grace)
|
||||
except (TypeError, ValueError):
|
||||
pass
|
||||
|
||||
count = values.get("count")
|
||||
if count is not None:
|
||||
try:
|
||||
|
||||
Reference in New Issue
Block a user