Commit Graph

20 Commits

Author SHA1 Message Date
Andreas Wrede ab0132a38d fix: correct THRESHOLD_DEFAULTS metric keys and add missing defaults
- Rename memory_monitor threshold key from 'percent' to 'memory_percent'
  so it matches exactly rather than relying on suffix stripping, which was
  causing swap_percent to be evaluated against the memory threshold
- Add swap_percent default thresholds (warning: 40%, critical: 75%)
- Add zfs_monitor pool capacity default thresholds (warning: 80%, critical: 90%)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-10 14:03:44 -04:00
andreas c9f15a3f1c fix: correct grace comment in config defaults — additional wait time, not a multiplier 2026-05-09 12:14:47 -04:00
andreas b95f1a5bb7 fix: agree: zpool ONLINE=OK, DEGRADED=WARNING, all else is CRITICAL 2026-05-08 17:18:41 -04:00
andreas 12f7eb722b fix: typo 2026-05-08 17:03:32 -04:00
andreas 217bba1b76 fix: change health_ok to status 2026-05-08 16:57:45 -04:00
andreas b9db0c552e feat: alert CRITICAL on degraded or suspended ZFS pools 2026-05-08 16:23:49 -04:00
andreas 73c697b6c5 feat: add oauth module skeleton and is_enabled()
Add hbd/server/oauth.py with OAuthError, _gitea_cfg(), and is_enabled()
to detect when all three required Gitea OAuth2 config keys are present.
Add "oauth": {} default to SERVER_DEFAULTS in config.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:24:27 -04:00
andreas b3aa7b585f udp/config: fall back to default_owner when os_info has no owner; log debug
- When os_info arrives with no owner field, apply default_owner from server config
- Stop applying default_owner unconditionally in get_host_access (now deferred to os_info handling)
- os_info plugin logs debug message when injecting owner from client config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 08:49:42 -04:00
andreas a534c06b26 feat: nagios operator for direct exit-code severity mapping
Add ComparisonOperator.NAGIOS ("nagios") that maps Nagios exit codes
directly to alert levels (0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN) without
requiring numeric warning/critical thresholds. Hysteresis is bypassed for
discrete codes. Display template defaults to "{check_name}: {output}".
_format_display() handles None threshold_value gracefully.

Add nagios_runner.status_code as a built-in default threshold config so
nagios checks alert out of the box.

Also: fix alerts.html scrolling (override html,body), make hostname a link
to /plugins#<hostname>, remove overall_status/overall_status_code/plugin_count
from nagios_runner and hbc_mini, replace with computed worst-status in
plugins.html via nagiosWorstStatus() helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 12:26:56 -04:00
Andreas Wrede 691f62aa69 feat: host-level watch flag suppresses notifications; filter dashboard/overview by owner/manager; add ZFS monitor plugin
- watch: true (default) per host; watch: false suppresses all notifications
  for that host in udp.py and threshold.py
- Live Dashboard and Host Overview now show only hosts where the logged-in
  user is owner or manager (admins see all); WebSocket broadcasts filtered
  per-connection by the same rule
- Add hbd/client/plugins/zfs_monitor.py: collects per-pool health, capacity,
  fragmentation, dedup ratio, and cumulative I/O ops/bandwidth via zpool(8)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 12:42:35 -04:00
Andreas Wrede 24b0e362fb provide cli function stop, restart and reload for hbd
Thought for 1s
2026-04-12 12:06:07 -04:00
Andreas Wrede 0199ca4693 re-factor notifications, add sms and matrix as channels 2026-04-12 11:21:21 -04:00
Andreas Wrede 2468386f24 adjust default log, pick and config locations. renotify on critical only, make user sessions persistem 2026-04-10 13:24:57 -04:00
Andreas Wrede 2f72cf0118 typo 2026-04-10 09:17:57 -04:00
Andreas Wrede ba27d2e300 Add count to rtt threshold 2026-04-10 08:07:50 -04:00
Andreas Wrede d281ac5a70 provide defaults for threshold_configs 2026-04-10 07:47:39 -04:00
andreas d77277857f Add user management and a settings page 2026-04-08 16:21:55 -04:00
Andreas Wrede c5770006f7 hbc proper termination, hbd config reloadable 2026-04-02 07:17:00 -04:00
Andreas Wrede 460d2be9e9 Fix rtt, including bug in time compute 2026-04-01 19:41:53 -04:00
Andreas Wrede 0543266c92 Major refactoring of the codebase, including restructuring of files and directories, renaming of modules and classes, and improvements to the overall organization and readability of the code. This refactoring aims to enhance maintainability, scalability, and clarity of the codebase while preserving existing functionality. The changes include:
- Restructuring of the project directory into client and server components
- Renaming of modules and classes to better reflect their purpose and functionality
- Moving common utilities and configurations to a shared location
- Updating import statements to reflect the new structure
- Adding new documentation files for better clarity on various aspects of the project
- Removing deprecated or unused code to streamline the codebase
- Ensuring that all existing functionality is preserved and that the codebase remains functional after the refactoring.
2026-03-29 11:13:40 -04:00