Commit Graph

107 Commits

Author SHA1 Message Date
andreas 0e8250362e feat: multi-provider OAuth2 login page and generic routes
Replace hardcoded Gitea OAuth handlers with generic {name}-parameterized
routes and update the login page to render a button for each configured
provider via oauth_mod.get_providers().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 08:48:23 -04:00
andreas 2f5da9fc5e fix: coerce malformed profile JSON to OAuthError; add redirect_uri assertion
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 08:46:19 -04:00
andreas 87aeec5999 feat: generic build_auth_url/exchange_code/fetch_user for multi-provider OAuth2
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 08:38:11 -04:00
andreas f24500a6b5 fix: copy field_map/profile_data_path in get_providers; improve caplog assertions 2026-05-09 08:34:48 -04:00
andreas 8207cd7b5f feat: add PROVIDER_DEFS, ResolvedProvider, get_providers() to oauth.py
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 08:29:07 -04:00
andreas aef9e7769b fix: zfs_monitor alerts dropped on restart with wildcard pool thresholds
purge_stale_alerts used _find_threshold to validate alert state keys,
but _find_threshold has no wildcard matching. A threshold configured as
"zfs_monitor.*.status" never matched the concrete alert state key
"zfs_monitor.tank.status", so every restart silently purged active ZFS
pool alert states and reset the grace period from scratch.

Also fix _check_pending_or_renotify to set last_notification after the
grace-period notification fires, so the re-notification interval is
anchored to when the alert was actually sent rather than the next PLG cycle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 07:42:09 -04:00
andreas 2e8bcb630d fix: show human-readable duration in re-notification messages
Replace raw seconds with d h m s format in "ongoing for ..." strings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 06:53:41 -04:00
andreas 338711181b feat: alerts host-filter field with URL query param and notify URL
- Add regex filter input to the Alerts dashboard that filters displayed
  hosts on every keystroke; invalid regex turns the border red
- Initialise the filter from ?filter= in the URL query string
- Change _build_url() to produce /alerts?filter=<hostname> so
  notification links (Pushover, email, Matrix, etc.) land on the
  alerts page pre-filtered to the alerting host

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 06:46:13 -04:00
andreas 43487f17e7 feat: optional logo on Gitea OAuth login button
Reads oauth.gitea.logo from config and, when set, renders an <img>
inside the button with flex alignment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 06:24:27 -04:00
andreas b95f1a5bb7 fix: agree: zpool ONLINE=OK, DEGRADED=WARNING, all else is CRITICAL 2026-05-08 17:18:41 -04:00
andreas 12f7eb722b fix: typo 2026-05-08 17:03:32 -04:00
andreas 217bba1b76 fix: change health_ok to status 2026-05-08 16:57:45 -04:00
andreas 967e05ed74 threshold: synthesize health_ok server-side for older ZFS clients
Older hbd clients send zfs_monitor data with a `health` string but no
`health_ok` numeric field (added in a recent plugin update). Without
health_ok in the data, the wildcard threshold check found nothing and
no CRITICAL alert was raised for DEGRADED/SUSPENDED pools.

Synthesize health_ok from the health string in the server's nested-
metric loop so alerts fire regardless of client version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 16:39:16 -04:00
andreas c20245b0ab docs: document ZFS pool health alerting; fix pushover sound+url_title 2026-05-08 16:25:55 -04:00
andreas b9db0c552e feat: alert CRITICAL on degraded or suspended ZFS pools 2026-05-08 16:23:49 -04:00
andreas 05045bafa2 fix: use base_url config for OAuth redirect URI to handle reverse proxy 2026-05-08 14:11:09 -04:00
andreas b06de6fdd3 fix: remove dead helper, add state logging, add integration-style oauth tests
- Remove unused `_gitea_cfg_url` module-level helper from http.py
- Add logger.warning on invalid/expired state in oauth_gitea_callback
- Add test_callback_invalid_state_rejects and test_full_oauth_flow_chain to tests/test_oauth.py (21 tests total, all passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:53:57 -04:00
andreas 940d0af35e fix: use error variable in login page template instead of hardcoded string 2026-05-08 13:50:02 -04:00
andreas d6d31aa2e3 feat: add Sign in with Gitea button to login page
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:48:28 -04:00
andreas 76edfe7577 feat: add Gitea OAuth2 redirect and callback routes 2026-05-08 13:44:12 -04:00
andreas d190029728 fix: guard unconfigured oauth calls; add missing test coverage; clean imports
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:42:21 -04:00
andreas b8307e7a9d feat: add authorization_url, exchange_code, fetch_user to oauth module
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:37:21 -04:00
andreas a2fdf091f5 fix: preserve OAuth users across config reload; fix test isolation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:34:57 -04:00
andreas 1914e6f28e feat: add provision_oauth_user() to users module
Creates or updates a user from an OAuth2 provider: new users are
inserted with an empty password_hash (OAuth-only login); existing users
have their display name and avatar refreshed while all other attributes
(admin flag, password_hash, notification_channels) are preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:32:08 -04:00
andreas 82cbce9615 test: fix shared state leak and fragile expiry assertion in oauth tests 2026-05-08 13:30:16 -04:00
andreas dbb779b013 feat: add OAuth2 CSRF state management
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:28:18 -04:00
andreas ca908ee967 fix: remove unused imports from oauth module and tests 2026-05-08 13:26:51 -04:00
andreas 73c697b6c5 feat: add oauth module skeleton and is_enabled()
Add hbd/server/oauth.py with OAuthError, _gitea_cfg(), and is_enabled()
to detect when all three required Gitea OAuth2 config keys are present.
Add "oauth": {} default to SERVER_DEFAULTS in config.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:24:27 -04:00
andreas b81a0d2a6c plugins: persist owner chip in glance strip across JS updates
Store owner in data-owner attribute; updateHostHeader always prepends it
so it survives innerHTML replacement. Render it immediately on page load
before JS fetches plugin data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:57:58 -04:00
andreas 1a19088cfe udp: resolve host owner from config, default_owner, or os_info on each PLG
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:50:42 -04:00
andreas 172f6e950f plugins: show host owner in glance strip for admin users
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:12:02 -04:00
andreas b3aa7b585f udp/config: fall back to default_owner when os_info has no owner; log debug
- When os_info arrives with no owner field, apply default_owner from server config
- Stop applying default_owner unconditionally in get_host_access (now deferred to os_info handling)
- os_info plugin logs debug message when injecting owner from client config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 08:49:42 -04:00
andreas 88a3c09b51 hbc/server: request InfoPlugin refresh when host has no plugin data; update docs
- Server sets request_update=1 in ACK when host.plugin_data is empty
- hbc: AsyncConnection.request_info_event; handle_ack sets it on request_update
- hbc: _info_plugin_refresh_loop clears InfoPlugin caches and resends on demand
- hbc_mini: same via _request_info event and _info_refresh_loop
- docs/USERS.md: document client-declared owner config key
- docs/PLUGIN_DEVELOPMENT.md: document server-initiated InfoPlugin refresh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 07:37:41 -04:00
andreas 0504402a8a hbc/hbc_mini: add owner config; include in os_info; server applies to host
- owner: optional top-level config key in ~/.hbc.yaml / ~/.hbc.json
- Propagated into plugin configs at load time so os_info can include it
- os_info PLG data carries owner field when set
- udp: sets host.owner from os_info if not already configured server-side
- live.html: format event log timestamps as YYYY-MM-DD HH:MM:SS (24-hour)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 07:25:47 -04:00
andreas ca58c18802 eventlog: store structured dicts; filter by user; clock: fix minute hand step
- eventlog() now stores {ts, host, level, service, message} dicts instead of strings
- WebSocket sends/broadcasts filter event log messages by the user's managed hosts
- live.html renders structured log entries with level-coloured spans
- Swiss railway clock minute hand now holds until second hand reaches 12, then steps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 07:00:17 -04:00
andreas 1ddc4b8132 threshold/alerts: strip _status_code suffix from displayed metric names
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 06:19:16 -04:00
andreas 5e1720ed32 notify: use plain URL in Mattermost plugin metrics link
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 10:43:18 -04:00
andreas 7ab17e26e2 hbc/hbc_mini: log name and version at startup; ui: bump alert-metric font size
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 10:15:03 -04:00
andreas 28f5fa951c ui: show metric name inline with hostname in alerts and notifications
Alerts page: move metric name into the header row alongside hostname.
Notifications: include metric name in title (hostname  metric) and
strip the metric prefix from the body so it contains only value/detail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 06:26:27 -04:00
andreas ca8ba84e65 fix: silence aiohttp.access log and strip plugin prefix in alerts UI
- main: disable aiohttp.access propagation unless --debug is active
- alerts.html: strip plugin-name prefix from metric_path display
  (nagios_runner.check_disk_root_status_code → check_disk_root_status_code)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 07:39:55 -04:00
andreas 1e4263b793 fix: threshold and logging improvements
- threshold: fix crash when display is None (_format_display now falls
  back to default format string instead of calling None.format())
- threshold: shorten notification messages by stripping plugin-name prefix
  from metric_path (cpu_percent instead of cpu_monitor.cpu_percent)
- main: demote aiohttp.access log records from INFO to DEBUG
- udp: replace debug print with proper logger.info for new host sign-on

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 07:06:56 -04:00
andreas 1824f637b4 fix: always show THRESHOLD_DEFAULTS in Settings threshold config
Seed threshold_configs["default"] from THRESHOLD_DEFAULTS at the start
of _parse_config() so the Settings page displays built-in defaults
regardless of whether the server config uses the multi-config format,
the legacy thresholds: format, or has no threshold config at all.
_parse_multi_config() overwrites the seed with the fully-merged
effective defaults when a threshold_configs section is present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 13:02:28 -04:00
andreas a534c06b26 feat: nagios operator for direct exit-code severity mapping
Add ComparisonOperator.NAGIOS ("nagios") that maps Nagios exit codes
directly to alert levels (0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN) without
requiring numeric warning/critical thresholds. Hysteresis is bypassed for
discrete codes. Display template defaults to "{check_name}: {output}".
_format_display() handles None threshold_value gracefully.

Add nagios_runner.status_code as a built-in default threshold config so
nagios checks alert out of the box.

Also: fix alerts.html scrolling (override html,body), make hostname a link
to /plugins#<hostname>, remove overall_status/overall_status_code/plugin_count
from nagios_runner and hbc_mini, replace with computed worst-status in
plugins.html via nagiosWorstStatus() helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 12:26:56 -04:00
andreas ae447ac4a6 feat: nagios_runner improvements and alerts page fixes
- nagios_runner: remove overall_status/overall_status_code/plugin_count fields;
  each command still reports its own <name>_status and <name>_status_code
- threshold: expose {output} and {status} aliases in display templates for
  nagios_runner generic matches (mapped from <check_name>_output/status)
- alerts.html: fix scrolling by overriding html,body height/overflow (style.css
  sets both); make hostname a link to /plugins/<hostname>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:05:45 -04:00
andreas b1985d0eb2 feat: generic threshold matching for nagios_runner with {check_name} display support
_find_threshold() now returns the stripped prefix ("check_name") alongside
the ThresholdConfig, enabling a single generic entry (e.g. nagios_runner.status_code)
to cover all per-command metrics (check_disk_root_status_code, check_load_status_code,
…). The prefix is threaded through to _format_display() as {check_name}, with
{metric_name} also available in display templates. purge_stale_alerts() updated
to use generic matching so it does not incorrectly drop alerts on generic-matched
metrics. README updated with Display Format Templates and Generic Threshold
Matching sections.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:48:17 -04:00
andreas de778f680f fix: reduce default hysteresis 10%→2%; show recovery threshold in alerts UI
The 10% default hysteresis created an unreasonably wide recovery band:
a 95% threshold would only clear once the value dropped below 85.5%,
causing alerts to linger long after the metric was well below the
trigger level.

Change default hysteresis to 2% across all threshold parsers (plugin
metrics, partitions, RTT). For a 95% threshold, recovery is now at
93.1% instead of 85.5%.

Add AlertState.hysteresis field (set on every check, cleared on OK) and
expose recovery_threshold in to_dict() so the Alerts dashboard can
display "recovers < 93.1" alongside the trigger threshold, making the
hysteresis band visible to the user. Pickle backward-compatible via
__setstate__.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 14:47:50 -04:00
andreas c93dbdc0f4 fix: settings thresholds show correct per-config metrics; misc hbc fixes
Settings page: pass threshold_checker to http.start so the Threshold
Configurations section has data. Use threshold_checker's already-parsed
ThresholdConfig objects instead of re-parsing the raw nested YAML.
Named (non-default) configs now display only their explicit overrides
via threshold_raw_configs, not the full merged set with defaults.

hbc/hbc_mini: send boot and shutdown messages on first connection only
to avoid duplicate packets when multiple servers are configured.
Replace print("Daemonizing...") with logging.info so output goes to
syslog in daemon mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 09:12:39 -04:00
andreas 3a546a1e5c feat: fetch-based Update/Delete buttons with toast notification on Host Overview
Replace href navigation with fetch() so the server response is captured
and displayed in a slide-up toast at the bottom of the page. Delete also
removes the host card from the DOM on success without a page reload.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 08:16:54 -04:00
andreas 3301dbfe34 feat: owner Update/Delete buttons on Host Overview; purge stale alerts on reload
Host Overview (plugins.html): show Update and Delete buttons in the
host-right zone when the logged-in user is the host owner (or admin /
unauthenticated mode). Buttons link to /u?h=<host> and /d?h=<host>
with stopPropagation so they don't toggle the accordion; Delete prompts
for confirmation first.

ThresholdChecker.purge_stale_alerts(): removes alert states whose
metric_path has no matching threshold in the current config. Called
after startup pickle restore and after every SIGHUP config reload so
alerts orphaned by upgrades or config changes do not persist
indefinitely.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 08:03:46 -04:00
andreas d00d903e7d fix: make Alerts page scrollable
Override the global style.css body height/overflow that locks all pages
to the viewport height (a remnant of the old drawer-menu layout).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 13:33:08 +02:00