Commit Graph

95 Commits

Author SHA1 Message Date
Andreas Wrede 1f5e7465a3 fix nav bar position 2026-05-01 04:32:04 -04:00
Andreas Wrede b6dcce4f35 simplify eventlog usage, fix arguments 2026-04-30 15:38:46 -04:00
Andreas Wrede c5ce41762e feat: update hbc via hb_install.sh instead of code patching
Server now sends a bare UPD command; client runs hb_install.sh to
reinstall from the package registry, then restarts. hb_install.sh
also copies itself alongside hbc on client installs.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-30 13:55:15 -04:00
Andreas Wrede ddf7067d13 feat: redesign Plugin Metrics page as Host Overview
Replace pill-tab plugin view with an accordion layout that shows key
metrics (CPU%, MEM%, top disk%, net delta, nagios status) at a glance
in each host card header. Plugin sections expand as structured tables.

- Rename page to "Host Overview" (URL /plugins unchanged)
- Three-wave parallel data loading: glance plugins on host expand,
  on-demand fetch for filesystem_info and extras
- Per-plugin table renderers with inline percent bars and threshold
  colour coding
- Add escHtml() for XSS-safe rendering of all field values
- Remove stale planning docs (REFACTORING.md, hbd/Plan.md)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-30 08:12:07 -04:00
andreas 990c658e65 Apply grace period to all threshold alerts before logging/notifying
Threshold alerts (plugin metrics, RTT) were firing immediately on the
first breach. Now every state transition to WARNING/CRITICAL starts a
grace-period timer (grace_seconds from the 'grace' config key). The
notification is deferred until the next heartbeat after grace_seconds
have elapsed. If the metric recovers within the grace window, both the
alert and the recovery are suppressed — no spurious pages for transient
spikes.

Two helper methods added to ThresholdChecker:
- _apply_grace: handles the state-change path (defer or suppress)
- _check_pending_or_renotify: handles the stable-alert path (fire
  deferred notification once grace expires, or fall through to reminders)

The overdue case is unchanged — on_overdue already fires only after
interval+grace seconds of silence, which is equivalent behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 12:00:40 +02:00
andreas b78d6ac0fe Fix RECOVER routing: use consistent level name and route via alerted channel
threshold.py was emitting level="RECOVERED" for metric recoveries, which
failed the is_recover check in send_notification (which only matched "RECOVER"),
bypassing _alerted_channels routing and the min_level bypass added in the
previous commit. Changed to "RECOVER" so all recovery paths are consistent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 11:29:04 +02:00
andreas afd5060f59 Fix early reminder notifications and lost recovery notifications
- AlertState.update() now resets last_notification when the alert level
  changes, so a WARNING→CRITICAL escalation restarts the reminder interval
  rather than inheriting a nearly-expired timer.
- _dispatch_to_channel() bypasses min_level for RECOVER, so recovery
  notifications are delivered even after a server restart when
  _alerted_channels is empty and the fallback dispatch path is used.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-22 18:11:22 +02:00
Andreas Wrede 5c382d2b8d One more nit 2026-04-13 09:31:35 -04:00
Andreas Wrede 35bba451f5 Various formating nits 2026-04-13 09:27:51 -04:00
Andreas Wrede 80edfba0c0 fix inconsistencies in page layout, add swiss clock 2026-04-13 08:45:50 -04:00
Andreas Wrede 6bc8de192e fix non-alerting of overdue hosts 2026-04-12 18:44:36 -04:00
Andreas Wrede d0c8c186f4 Fix typo 2026-04-12 13:04:17 -04:00
Andreas Wrede 19f7c8312e Mkae columns sortabel agian, check hbc version, provide modile html pages 2026-04-12 12:53:00 -04:00
Andreas Wrede 24b0e362fb provide cli function stop, restart and reload for hbd
Thought for 1s
2026-04-12 12:06:07 -04:00
Andreas Wrede 3a030548c0 Fix profile not updating 2026-04-12 11:57:12 -04:00
Andreas Wrede 094cb7ed9d Merge branch 'master' of git.wrede.ca:andreas/heartbeat 2026-04-12 11:23:28 -04:00
Andreas Wrede 0199ca4693 re-factor notifications, add sms and matrix as channels 2026-04-12 11:21:21 -04:00
Andreas Wrede 75344ebbbd re-factor notifications, add sms and matrix as channels 2026-04-12 11:04:00 -04:00
Andreas Wrede 7f049a4e26 accept websocket connection on http:.../ws 2026-04-12 06:44:32 -04:00
Andreas Wrede 6217f7a124 fix bogus notification on new clients 2026-04-10 13:39:18 -04:00
Andreas Wrede 2468386f24 adjust default log, pick and config locations. renotify on critical only, make user sessions persistem 2026-04-10 13:24:57 -04:00
Andreas Wrede 2015195112 Grace interval on restart of hbd, fix SIGHUP processing 2026-04-10 12:58:38 -04:00
Andreas Wrede 3426185383 Set SO_TIMESTAMP correctly for the various platforms 2026-04-10 11:19:47 -04:00
Andreas Wrede 9eedbafe97 Show overdue in alerts instead of null 2026-04-10 09:20:28 -04:00
Andreas Wrede a5f31c5cb5 update picked data strucures 2026-04-10 09:18:38 -04:00
Andreas Wrede 2f72cf0118 typo 2026-04-10 09:17:57 -04:00
Andreas Wrede ba27d2e300 Add count to rtt threshold 2026-04-10 08:07:50 -04:00
Andreas Wrede 381e37efce fix log-section height 2026-04-10 08:01:22 -04:00
Andreas Wrede 97dfc08f4d fix log level settiung 2026-04-10 08:00:51 -04:00
Andreas Wrede d281ac5a70 provide defaults for threshold_configs 2026-04-10 07:47:39 -04:00
andreas d77277857f Add user management and a settings page 2026-04-08 16:21:55 -04:00
Andreas Wrede 8421f472f2 there is only one __version__ 2026-04-07 11:00:22 -04:00
Andreas Wrede 51f9bdc2b5 use SO_TIMESTAMP, works on Linux, FreeBSD and macOS 2026-04-07 10:46:54 -04:00
andreas 02bc42fbf0 get rtt time differently 2026-04-07 10:40:12 -04:00
andreas 832a8b0bda save state to pickle file, restart timers on restart 2026-04-06 17:24:59 -04:00
Andreas Wrede 73aa89f8f4 fix web page issues 2026-04-04 12:43:30 -04:00
Andreas Wrede 941f3ea4b0 display and acknowledge alerts 2026-04-03 06:35:45 -04:00
Andreas Wrede c5770006f7 hbc proper termination, hbd config reloadable 2026-04-02 07:17:00 -04:00
Andreas Wrede 84c1aef51f removal of cver 2026-04-01 20:47:29 -04:00
Andreas Wrede 460d2be9e9 Fix rtt, including bug in time compute 2026-04-01 19:41:53 -04:00
Andreas Wrede 090d341244 per-client threshold config 2026-04-01 15:22:42 -04:00
Andreas Wrede 079e84f729 display tag fro alterts, cleanup udp 2026-04-01 11:49:55 -04:00
Andreas Wrede dd23d9d163 refactor monitor, add threshold rtesting 2026-03-31 12:22:03 -04:00
Andreas Wrede ad7178ebcb Move threshhold to server, move eventlog to notify 2026-03-29 20:29:33 -04:00
Andreas Wrede 0543266c92 Major refactoring of the codebase, including restructuring of files and directories, renaming of modules and classes, and improvements to the overall organization and readability of the code. This refactoring aims to enhance maintainability, scalability, and clarity of the codebase while preserving existing functionality. The changes include:
- Restructuring of the project directory into client and server components
- Renaming of modules and classes to better reflect their purpose and functionality
- Moving common utilities and configurations to a shared location
- Updating import statements to reflect the new structure
- Adding new documentation files for better clarity on various aspects of the project
- Removing deprecated or unused code to streamline the codebase
- Ensuring that all existing functionality is preserved and that the codebase remains functional after the refactoring.
2026-03-29 11:13:40 -04:00