Add hbd/server/oauth.py with OAuthError, _gitea_cfg(), and is_enabled()
to detect when all three required Gitea OAuth2 config keys are present.
Add "oauth": {} default to SERVER_DEFAULTS in config.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Store owner in data-owner attribute; updateHostHeader always prepends it
so it survives innerHTML replacement. Render it immediately on page load
before JS fetches plugin data.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- When os_info arrives with no owner field, apply default_owner from server config
- Stop applying default_owner unconditionally in get_host_access (now deferred to os_info handling)
- os_info plugin logs debug message when injecting owner from client config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Server sets request_update=1 in ACK when host.plugin_data is empty
- hbc: AsyncConnection.request_info_event; handle_ack sets it on request_update
- hbc: _info_plugin_refresh_loop clears InfoPlugin caches and resends on demand
- hbc_mini: same via _request_info event and _info_refresh_loop
- docs/USERS.md: document client-declared owner config key
- docs/PLUGIN_DEVELOPMENT.md: document server-initiated InfoPlugin refresh
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- owner: optional top-level config key in ~/.hbc.yaml / ~/.hbc.json
- Propagated into plugin configs at load time so os_info can include it
- os_info PLG data carries owner field when set
- udp: sets host.owner from os_info if not already configured server-side
- live.html: format event log timestamps as YYYY-MM-DD HH:MM:SS (24-hour)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- eventlog() now stores {ts, host, level, service, message} dicts instead of strings
- WebSocket sends/broadcasts filter event log messages by the user's managed hosts
- live.html renders structured log entries with level-coloured spans
- Swiss railway clock minute hand now holds until second hand reaches 12, then steps
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Alerts page: move metric name into the header row alongside hostname.
Notifications: include metric name in title (hostname metric) and
strip the metric prefix from the body so it contains only value/detail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- threshold: fix crash when display is None (_format_display now falls
back to default format string instead of calling None.format())
- threshold: shorten notification messages by stripping plugin-name prefix
from metric_path (cpu_percent instead of cpu_monitor.cpu_percent)
- main: demote aiohttp.access log records from INFO to DEBUG
- udp: replace debug print with proper logger.info for new host sign-on
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Seed threshold_configs["default"] from THRESHOLD_DEFAULTS at the start
of _parse_config() so the Settings page displays built-in defaults
regardless of whether the server config uses the multi-config format,
the legacy thresholds: format, or has no threshold config at all.
_parse_multi_config() overwrites the seed with the fully-merged
effective defaults when a threshold_configs section is present.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add ComparisonOperator.NAGIOS ("nagios") that maps Nagios exit codes
directly to alert levels (0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN) without
requiring numeric warning/critical thresholds. Hysteresis is bypassed for
discrete codes. Display template defaults to "{check_name}: {output}".
_format_display() handles None threshold_value gracefully.
Add nagios_runner.status_code as a built-in default threshold config so
nagios checks alert out of the box.
Also: fix alerts.html scrolling (override html,body), make hostname a link
to /plugins#<hostname>, remove overall_status/overall_status_code/plugin_count
from nagios_runner and hbc_mini, replace with computed worst-status in
plugins.html via nagiosWorstStatus() helper.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- nagios_runner: remove overall_status/overall_status_code/plugin_count fields;
each command still reports its own <name>_status and <name>_status_code
- threshold: expose {output} and {status} aliases in display templates for
nagios_runner generic matches (mapped from <check_name>_output/status)
- alerts.html: fix scrolling by overriding html,body height/overflow (style.css
sets both); make hostname a link to /plugins/<hostname>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_find_threshold() now returns the stripped prefix ("check_name") alongside
the ThresholdConfig, enabling a single generic entry (e.g. nagios_runner.status_code)
to cover all per-command metrics (check_disk_root_status_code, check_load_status_code,
…). The prefix is threaded through to _format_display() as {check_name}, with
{metric_name} also available in display templates. purge_stale_alerts() updated
to use generic matching so it does not incorrectly drop alerts on generic-matched
metrics. README updated with Display Format Templates and Generic Threshold
Matching sections.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 10% default hysteresis created an unreasonably wide recovery band:
a 95% threshold would only clear once the value dropped below 85.5%,
causing alerts to linger long after the metric was well below the
trigger level.
Change default hysteresis to 2% across all threshold parsers (plugin
metrics, partitions, RTT). For a 95% threshold, recovery is now at
93.1% instead of 85.5%.
Add AlertState.hysteresis field (set on every check, cleared on OK) and
expose recovery_threshold in to_dict() so the Alerts dashboard can
display "recovers < 93.1" alongside the trigger threshold, making the
hysteresis band visible to the user. Pickle backward-compatible via
__setstate__.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Settings page: pass threshold_checker to http.start so the Threshold
Configurations section has data. Use threshold_checker's already-parsed
ThresholdConfig objects instead of re-parsing the raw nested YAML.
Named (non-default) configs now display only their explicit overrides
via threshold_raw_configs, not the full merged set with defaults.
hbc/hbc_mini: send boot and shutdown messages on first connection only
to avoid duplicate packets when multiple servers are configured.
Replace print("Daemonizing...") with logging.info so output goes to
syslog in daemon mode.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace href navigation with fetch() so the server response is captured
and displayed in a slide-up toast at the bottom of the page. Delete also
removes the host card from the DOM on success without a page reload.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Host Overview (plugins.html): show Update and Delete buttons in the
host-right zone when the logged-in user is the host owner (or admin /
unauthenticated mode). Buttons link to /u?h=<host> and /d?h=<host>
with stopPropagation so they don't toggle the accordion; Delete prompts
for confirmation first.
ThresholdChecker.purge_stale_alerts(): removes alert states whose
metric_path has no matching threshold in the current config. Called
after startup pickle restore and after every SIGHUP config reload so
alerts orphaned by upgrades or config changes do not persist
indefinitely.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Override the global style.css body height/overflow that locks all pages
to the viewport height (a remnant of the old drawer-menu layout).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Show a colour-coded pie chart (red=critical, yellow=warning, green=ok)
to the left of the clock in the nav bar. Backed by a new
GET /api/0/alert_summary endpoint that counts hosts per alert level
for the current user's visible hosts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- threshold.py: add _find_threshold() with suffix fallback so thresholds
like ping_monitor.rtt_avg match ping_monitor.8_8_8_8_rtt_avg etc.;
each pinged host keeps its own alert state
- hbdclass.py: format RTT as integer ms (round())
- live.html: JS RTT display rounded to nearest ms (Math.round)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Hostnames in the live dashboard table are now links to /plugins#hostname,
which expands and scrolls to that host's card in the Host Overview page.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Only notify on worsening transitions (OK→WARNING, OK→CRITICAL,
WARNING→CRITICAL) and recovery (any→OK). De-escalation within alert
states no longer sends a duplicate notification since the metric never
recovered.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Add renderZfsTables() to plugins.html with health/capacity/frag/dedup
table and cumulative I/O table; colour-code health and capacity thresholds;
add zfs_monitor to plugin_order and summary/render dispatch.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- watch: true (default) per host; watch: false suppresses all notifications
for that host in udp.py and threshold.py
- Live Dashboard and Host Overview now show only hosts where the logged-in
user is owner or manager (admins see all); WebSocket broadcasts filtered
per-connection by the same rule
- Add hbd/client/plugins/zfs_monitor.py: collects per-pool health, capacity,
fragmentation, dedup ratio, and cumulative I/O ops/bandwidth via zpool(8)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
threshold_config in the hosts section now accepts a list of named
configs applied left-to-right on top of the defaults, so focused
override profiles can be mixed without duplication. Single-string
and legacy host_threshold_mapping forms are unchanged.
- Add threshold_raw_configs to store per-config overrides separately
- Normalise threshold_config to list on parse (string or list)
- get_thresholds_for_host folds the list over the default base
- Update README and docs/THRESHOLD_ALERTING.md with examples
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- fix: matrix/sms_voipms notifications blocked the event loop on timeout;
make send_notification async, dispatch all channel drivers as non-blocking
tasks (asyncio.to_thread for sync drivers, asyncio.wait_for for async);
update all call sites to fire-and-forget via create_task
- feat: add /about page with version, runtime, uptime counter, and repo link
- fix: hbc_mini plugin data format now matches full hbc client so Host
Overview displays memory, disk, and network metrics correctly
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Server now sends a bare UPD command; client runs hb_install.sh to
reinstall from the package registry, then restarts. hb_install.sh
also copies itself alongside hbc on client installs.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Replace pill-tab plugin view with an accordion layout that shows key
metrics (CPU%, MEM%, top disk%, net delta, nagios status) at a glance
in each host card header. Plugin sections expand as structured tables.
- Rename page to "Host Overview" (URL /plugins unchanged)
- Three-wave parallel data loading: glance plugins on host expand,
on-demand fetch for filesystem_info and extras
- Per-plugin table renderers with inline percent bars and threshold
colour coding
- Add escHtml() for XSS-safe rendering of all field values
- Remove stale planning docs (REFACTORING.md, hbd/Plan.md)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Threshold alerts (plugin metrics, RTT) were firing immediately on the
first breach. Now every state transition to WARNING/CRITICAL starts a
grace-period timer (grace_seconds from the 'grace' config key). The
notification is deferred until the next heartbeat after grace_seconds
have elapsed. If the metric recovers within the grace window, both the
alert and the recovery are suppressed — no spurious pages for transient
spikes.
Two helper methods added to ThresholdChecker:
- _apply_grace: handles the state-change path (defer or suppress)
- _check_pending_or_renotify: handles the stable-alert path (fire
deferred notification once grace expires, or fall through to reminders)
The overdue case is unchanged — on_overdue already fires only after
interval+grace seconds of silence, which is equivalent behaviour.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
threshold.py was emitting level="RECOVERED" for metric recoveries, which
failed the is_recover check in send_notification (which only matched "RECOVER"),
bypassing _alerted_channels routing and the min_level bypass added in the
previous commit. Changed to "RECOVER" so all recovery paths are consistent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- AlertState.update() now resets last_notification when the alert level
changes, so a WARNING→CRITICAL escalation restarts the reminder interval
rather than inheriting a nearly-expired timer.
- _dispatch_to_channel() bypasses min_level for RECOVER, so recovery
notifications are delivered even after a server restart when
_alerted_channels is empty and the fallback dispatch path is used.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>