Commit Graph

458 Commits

Author SHA1 Message Date
andreas b06de6fdd3 fix: remove dead helper, add state logging, add integration-style oauth tests
- Remove unused `_gitea_cfg_url` module-level helper from http.py
- Add logger.warning on invalid/expired state in oauth_gitea_callback
- Add test_callback_invalid_state_rejects and test_full_oauth_flow_chain to tests/test_oauth.py (21 tests total, all passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:53:57 -04:00
andreas 940d0af35e fix: use error variable in login page template instead of hardcoded string 2026-05-08 13:50:02 -04:00
andreas d6d31aa2e3 feat: add Sign in with Gitea button to login page
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:48:28 -04:00
andreas 76edfe7577 feat: add Gitea OAuth2 redirect and callback routes 2026-05-08 13:44:12 -04:00
andreas d190029728 fix: guard unconfigured oauth calls; add missing test coverage; clean imports
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:42:21 -04:00
andreas b8307e7a9d feat: add authorization_url, exchange_code, fetch_user to oauth module
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:37:21 -04:00
andreas a2fdf091f5 fix: preserve OAuth users across config reload; fix test isolation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:34:57 -04:00
andreas 1914e6f28e feat: add provision_oauth_user() to users module
Creates or updates a user from an OAuth2 provider: new users are
inserted with an empty password_hash (OAuth-only login); existing users
have their display name and avatar refreshed while all other attributes
(admin flag, password_hash, notification_channels) are preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:32:08 -04:00
andreas 82cbce9615 test: fix shared state leak and fragile expiry assertion in oauth tests 2026-05-08 13:30:16 -04:00
andreas dbb779b013 feat: add OAuth2 CSRF state management
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:28:18 -04:00
andreas ca908ee967 fix: remove unused imports from oauth module and tests 2026-05-08 13:26:51 -04:00
andreas 73c697b6c5 feat: add oauth module skeleton and is_enabled()
Add hbd/server/oauth.py with OAuthError, _gitea_cfg(), and is_enabled()
to detect when all three required Gitea OAuth2 config keys are present.
Add "oauth": {} default to SERVER_DEFAULTS in config.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:24:27 -04:00
andreas 3e2357380b docs: add Gitea OAuth2 design spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 13:11:50 -04:00
andreas cc4a103bae scripts/c: add .gitignore for build outputs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 12:24:29 -04:00
andreas 53fb10fdf5 scripts/c: remove committed binary
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 12:24:05 -04:00
andreas 2df2ad18c9 scripts/c: add single-file C port of hbc_mini
hbc_mini.c is a full port of scripts/hbc_mini.py requiring only zlib,
pthreads, and a C11 compiler. Supports Linux, FreeBSD, NetBSD, and
DragonFly BSD with platform-specific plugin backends:

  - cpu_monitor:    /proc/stat (Linux) or kern.cp_time sysctl (BSD)
  - memory_monitor: /proc/meminfo (Linux), vm.stats.vm.* (FreeBSD),
                    struct uvmexp (NetBSD)
  - network_monitor:/proc/net/dev (Linux) or getifaddrs()+if_data (BSD)
  - disk_monitor:   df -P (all platforms)
  - ping_monitor:   ping subprocess (all platforms)
  - nagios_runner:  shell commands with perfdata parsing (all platforms)
  - os_info:        uname() + /etc/os-release (Linux) or kern.osrelease (BSD)

Build: cc -O2 -o hbc_mini hbc_mini.c -lz -lpthread -lm

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 12:23:45 -04:00
andreas b81a0d2a6c plugins: persist owner chip in glance strip across JS updates
Store owner in data-owner attribute; updateHostHeader always prepends it
so it survives innerHTML replacement. Render it immediately on page load
before JS fetches plugin data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:57:58 -04:00
andreas 1a19088cfe udp: resolve host owner from config, default_owner, or os_info on each PLG
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:50:42 -04:00
andreas 172f6e950f plugins: show host owner in glance strip for admin users
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:12:02 -04:00
andreas 4349ae217a version 5.2.4
Release / release (push) Successful in 5s
v5.2.4
2026-05-08 08:50:06 -04:00
andreas b3aa7b585f udp/config: fall back to default_owner when os_info has no owner; log debug
- When os_info arrives with no owner field, apply default_owner from server config
- Stop applying default_owner unconditionally in get_host_access (now deferred to os_info handling)
- os_info plugin logs debug message when injecting owner from client config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 08:49:42 -04:00
andreas 88a3c09b51 hbc/server: request InfoPlugin refresh when host has no plugin data; update docs
- Server sets request_update=1 in ACK when host.plugin_data is empty
- hbc: AsyncConnection.request_info_event; handle_ack sets it on request_update
- hbc: _info_plugin_refresh_loop clears InfoPlugin caches and resends on demand
- hbc_mini: same via _request_info event and _info_refresh_loop
- docs/USERS.md: document client-declared owner config key
- docs/PLUGIN_DEVELOPMENT.md: document server-initiated InfoPlugin refresh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 07:37:41 -04:00
andreas 0504402a8a hbc/hbc_mini: add owner config; include in os_info; server applies to host
- owner: optional top-level config key in ~/.hbc.yaml / ~/.hbc.json
- Propagated into plugin configs at load time so os_info can include it
- os_info PLG data carries owner field when set
- udp: sets host.owner from os_info if not already configured server-side
- live.html: format event log timestamps as YYYY-MM-DD HH:MM:SS (24-hour)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 07:25:47 -04:00
andreas ca58c18802 eventlog: store structured dicts; filter by user; clock: fix minute hand step
- eventlog() now stores {ts, host, level, service, message} dicts instead of strings
- WebSocket sends/broadcasts filter event log messages by the user's managed hosts
- live.html renders structured log entries with level-coloured spans
- Swiss railway clock minute hand now holds until second hand reaches 12, then steps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 07:00:17 -04:00
andreas 1ddc4b8132 threshold/alerts: strip _status_code suffix from displayed metric names
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 06:19:16 -04:00
andreas 5e1720ed32 notify: use plain URL in Mattermost plugin metrics link
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 10:43:18 -04:00
andreas 77f127fe60 hbc/hbc_mini: consolidate startup log into single line
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 10:33:31 -04:00
andreas 54fbd8d73d version 5.2.3
Release / release (push) Successful in 5s
v5.2.3
2026-05-07 10:15:11 -04:00
andreas 7ab17e26e2 hbc/hbc_mini: log name and version at startup; ui: bump alert-metric font size
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 10:15:03 -04:00
andreas 28f5fa951c ui: show metric name inline with hostname in alerts and notifications
Alerts page: move metric name into the header row alongside hostname.
Notifications: include metric name in title (hostname  metric) and
strip the metric prefix from the body so it contains only value/detail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 06:26:27 -04:00
andreas 37f1c58969 docs: remove dead warning/critical keys from ping_monitor config example
These fields were never read by the plugin; thresholds are configured
server-side. Also document the -b flag in README.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-07 06:12:15 -04:00
andreas f006077a71 send shutdown msg only if we sent a boot msg. Don't send eithe when restarting. 2026-05-06 11:57:43 -04:00
andreas d9fc8d632f send shutdown msg only if we sent a boot msg. Don't send eithe when restarting. 2026-05-06 11:54:09 -04:00
andreas f640574e4f version 5.2.2
Release / release (push) Successful in 5s
v5.2.2
2026-05-06 09:57:43 -04:00
andreas 9a19424279 fix: retry connection on network error instead of permanently dropping it
error_received() no longer sets _dead=True; it just closes the transport
so the existing retry loop in heartbeat_sender (hbc) and sendto (hbc_mini)
reopens the connection on the next interval. This allows hbc to recover
when it starts before network connectivity is established.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 09:57:32 -04:00
andreas ca8ba84e65 fix: silence aiohttp.access log and strip plugin prefix in alerts UI
- main: disable aiohttp.access propagation unless --debug is active
- alerts.html: strip plugin-name prefix from metric_path display
  (nagios_runner.check_disk_root_status_code → check_disk_root_status_code)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 07:39:55 -04:00
andreas f3d08d1c9e version 5.2.1
Release / release (push) Successful in 5s
v5.2.1
2026-05-06 07:07:01 -04:00
andreas 1e4263b793 fix: threshold and logging improvements
- threshold: fix crash when display is None (_format_display now falls
  back to default format string instead of calling None.format())
- threshold: shorten notification messages by stripping plugin-name prefix
  from metric_path (cpu_percent instead of cpu_monitor.cpu_percent)
- main: demote aiohttp.access log records from INFO to DEBUG
- udp: replace debug print with proper logger.info for new host sign-on

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 07:06:56 -04:00
andreas e931acb9f5 version 5.2.0
Release / release (push) Successful in 5s
v5.2.0
2026-05-05 13:47:46 -04:00
andreas 018409e71d docs: correct README inaccuracies found during code audit
- Add ping_monitor to built-in plugins list
- Update cpu_monitor (uptime) and memory_monitor (ZFS ARC) descriptions
- Replace "aggregated status" bullet with accurate per-check reporting note
- Fix RTT hysteresis default: 0.1 → 0.02
- Fix client YAML config: remove non-existent server:/port: keys, use hb_port:
- Fix nagios_runner commands format: plain strings → {name:, command:} dicts
- Fix Supported Metrics: exit_code → actual <name>_status_code/<name>_status/<name>_output fields

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 13:45:43 -04:00
andreas 1824f637b4 fix: always show THRESHOLD_DEFAULTS in Settings threshold config
Seed threshold_configs["default"] from THRESHOLD_DEFAULTS at the start
of _parse_config() so the Settings page displays built-in defaults
regardless of whether the server config uses the multi-config format,
the legacy thresholds: format, or has no threshold config at all.
_parse_multi_config() overwrites the seed with the fully-merged
effective defaults when a threshold_configs section is present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 13:02:28 -04:00
andreas a534c06b26 feat: nagios operator for direct exit-code severity mapping
Add ComparisonOperator.NAGIOS ("nagios") that maps Nagios exit codes
directly to alert levels (0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN) without
requiring numeric warning/critical thresholds. Hysteresis is bypassed for
discrete codes. Display template defaults to "{check_name}: {output}".
_format_display() handles None threshold_value gracefully.

Add nagios_runner.status_code as a built-in default threshold config so
nagios checks alert out of the box.

Also: fix alerts.html scrolling (override html,body), make hostname a link
to /plugins#<hostname>, remove overall_status/overall_status_code/plugin_count
from nagios_runner and hbc_mini, replace with computed worst-status in
plugins.html via nagiosWorstStatus() helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 12:26:56 -04:00
andreas d7b5c97a4e version 5.1.21
Release / release (push) Successful in 6s
v5.1.21
2026-05-05 11:05:48 -04:00
andreas ae447ac4a6 feat: nagios_runner improvements and alerts page fixes
- nagios_runner: remove overall_status/overall_status_code/plugin_count fields;
  each command still reports its own <name>_status and <name>_status_code
- threshold: expose {output} and {status} aliases in display templates for
  nagios_runner generic matches (mapped from <check_name>_output/status)
- alerts.html: fix scrolling by overriding html,body height/overflow (style.css
  sets both); make hostname a link to /plugins/<hostname>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 11:05:45 -04:00
andreas d44ce3d124 version 5.1.20
Release / release (push) Successful in 6s
v5.1.20
2026-05-05 10:48:24 -04:00
andreas b1985d0eb2 feat: generic threshold matching for nagios_runner with {check_name} display support
_find_threshold() now returns the stripped prefix ("check_name") alongside
the ThresholdConfig, enabling a single generic entry (e.g. nagios_runner.status_code)
to cover all per-command metrics (check_disk_root_status_code, check_load_status_code,
…). The prefix is threaded through to _format_display() as {check_name}, with
{metric_name} also available in display templates. purge_stale_alerts() updated
to use generic matching so it does not incorrectly drop alerts on generic-matched
metrics. README updated with Display Format Templates and Generic Threshold
Matching sections.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 10:48:17 -04:00
andreas de778f680f fix: reduce default hysteresis 10%→2%; show recovery threshold in alerts UI
The 10% default hysteresis created an unreasonably wide recovery band:
a 95% threshold would only clear once the value dropped below 85.5%,
causing alerts to linger long after the metric was well below the
trigger level.

Change default hysteresis to 2% across all threshold parsers (plugin
metrics, partitions, RTT). For a 95% threshold, recovery is now at
93.1% instead of 85.5%.

Add AlertState.hysteresis field (set on every check, cleared on OK) and
expose recovery_threshold in to_dict() so the Alerts dashboard can
display "recovers < 93.1" alongside the trigger threshold, making the
hysteresis band visible to the user. Pickle backward-compatible via
__setstate__.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 14:47:50 -04:00
andreas d7b368c7c6 version 5.1.19
Release / release (push) Successful in 5s
v5.1.19
2026-05-04 12:10:01 -04:00
andreas e790663f9f feat: exclude ZFS ARC from memory_percent; add uptime_seconds to cpu_monitor
memory_monitor / hbc_mini: ZFS ARC is reclaimable but not reflected in
MemAvailable by the Linux kernel (not in SReclaimable). Read ARC size
from /proc/spl/kstat/zfs/arcstats and add it to available memory before
computing memory_percent and memory_used. No-op on systems without ZFS.

cpu_monitor: report uptime_seconds via psutil.boot_time() (full client)
and /proc/uptime (hbc_mini).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 12:09:58 -04:00
andreas 475319e248 fix: send boot/shutdown on first open connection, not blindly first in list
Replace break-after-first-iteration with next(c for c in connections if
c.transport) so the message goes to the first connection that actually
has an open transport. Falls back to connections[0] if none are open
yet (sendto will attempt reopen), avoiding silent message loss when the
leading connection is still connecting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 09:59:30 -04:00