Compare commits
110 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 40205bf5c7 | |||
| b95f1a5bb7 | |||
| 12f7eb722b | |||
| 217bba1b76 | |||
| 967e05ed74 | |||
| c20245b0ab | |||
| b9db0c552e | |||
| 05045bafa2 | |||
| 39f1b5de30 | |||
| b06de6fdd3 | |||
| 940d0af35e | |||
| d6d31aa2e3 | |||
| 76edfe7577 | |||
| d190029728 | |||
| b8307e7a9d | |||
| a2fdf091f5 | |||
| 1914e6f28e | |||
| 82cbce9615 | |||
| dbb779b013 | |||
| ca908ee967 | |||
| 73c697b6c5 | |||
| 3e2357380b | |||
| cc4a103bae | |||
| 53fb10fdf5 | |||
| 2df2ad18c9 | |||
| b81a0d2a6c | |||
| 1a19088cfe | |||
| 172f6e950f | |||
| 4349ae217a | |||
| b3aa7b585f | |||
| 88a3c09b51 | |||
| 0504402a8a | |||
| ca58c18802 | |||
| 1ddc4b8132 | |||
| 5e1720ed32 | |||
| 77f127fe60 | |||
| 54fbd8d73d | |||
| 7ab17e26e2 | |||
| 28f5fa951c | |||
| 37f1c58969 | |||
| f006077a71 | |||
| d9fc8d632f | |||
| f640574e4f | |||
| 9a19424279 | |||
| ca8ba84e65 | |||
| f3d08d1c9e | |||
| 1e4263b793 | |||
| e931acb9f5 | |||
| 018409e71d | |||
| 1824f637b4 | |||
| a534c06b26 | |||
| d7b5c97a4e | |||
| ae447ac4a6 | |||
| d44ce3d124 | |||
| b1985d0eb2 | |||
| de778f680f | |||
| d7b368c7c6 | |||
| e790663f9f | |||
| 475319e248 | |||
| ca5ef384a8 | |||
| c93dbdc0f4 | |||
| 3a546a1e5c | |||
| 74c89d098c | |||
| 3301dbfe34 | |||
| d00d903e7d | |||
| babb5d61aa | |||
| 11d1c718b3 | |||
| a99b6b54c7 | |||
| 8da3d550eb | |||
| a76d0fc840 | |||
| 94cbb31c48 | |||
| ae60844a8a | |||
| 49fa310361 | |||
| 28e2180f7b | |||
| ce0590f015 | |||
| f50acca509 | |||
| 72fc82b91f | |||
| 46f8c32c0b | |||
| 691f62aa69 | |||
| cffc9805f9 | |||
| 917d6a401b | |||
| 2bd3a9beb6 | |||
| 5523c60866 | |||
| ab37ac7194 | |||
| f811a19d80 | |||
| 6239825f43 | |||
| b56245bb23 | |||
| 331c4e804d | |||
| 9fd945a481 | |||
| 26df08eeff | |||
| 5819dd6b25 | |||
| 6fb67f8615 | |||
| e70ae6f176 | |||
| a77f6d380c | |||
| 6aae2a1dab | |||
| 85ee0e1040 | |||
| c4f09e9ced | |||
| 64710fd4cd | |||
| 1f5e7465a3 | |||
| b290b21e23 | |||
| 65c4267847 | |||
| 462a445235 | |||
| 368e178f93 | |||
| 6905bf266a | |||
| b6dcce4f35 | |||
| e6436fc236 | |||
| c5ce41762e | |||
| 26ca0c095f | |||
| 1eecd67594 | |||
| caf3c2c0ac |
@@ -27,6 +27,7 @@ A lightweight daemon that listens for UDP heartbeat messages and acts on them: k
|
|||||||
- Configurable retention and backup management
|
- Configurable retention and backup management
|
||||||
- **Plugin system for extensible monitoring** ✅
|
- **Plugin system for extensible monitoring** ✅
|
||||||
- Collect system metrics (CPU, memory, disk, network)
|
- Collect system metrics (CPU, memory, disk, network)
|
||||||
|
- Monitor ZFS pool health, capacity, and I/O via `zpool(8)`
|
||||||
- Execute existing Nagios monitoring plugins
|
- Execute existing Nagios monitoring plugins
|
||||||
- Create custom plugins with simple Python classes
|
- Create custom plugins with simple Python classes
|
||||||
- **Threshold alerting system** ✅
|
- **Threshold alerting system** ✅
|
||||||
@@ -34,6 +35,8 @@ A lightweight daemon that listens for UDP heartbeat messages and acts on them: k
|
|||||||
- Hysteresis to prevent alert flapping
|
- Hysteresis to prevent alert flapping
|
||||||
- Automatic notifications on state changes
|
- Automatic notifications on state changes
|
||||||
- Re-notification for ongoing alerts
|
- Re-notification for ongoing alerts
|
||||||
|
- **Per-host watch flag** — set `watch: false` on any host to silence all notifications for that host without removing its configuration ✅
|
||||||
|
- **Role-filtered dashboards** — Live Dashboard and Host Overview show only hosts where the logged-in user is owner or manager (admins see all) ✅
|
||||||
- Modular codebase suitable for unit testing and CI ✅
|
- Modular codebase suitable for unit testing and CI ✅
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -55,21 +58,26 @@ Heartbeat includes a comprehensive plugin architecture that extends monitoring b
|
|||||||
### Built-in Plugins
|
### Built-in Plugins
|
||||||
|
|
||||||
- `os_info`: Collects OS, kernel, distribution, and architecture information
|
- `os_info`: Collects OS, kernel, distribution, and architecture information
|
||||||
- `cpu_monitor`: Monitors CPU usage, load average, frequency, and process counts
|
- `cpu_monitor`: Monitors CPU usage, load average, frequency, process counts, and uptime
|
||||||
- `memory_monitor`: Monitors RAM and swap usage, available memory
|
- `memory_monitor`: Monitors RAM and swap usage, available memory (ZFS ARC-aware)
|
||||||
- `disk_monitor`: Monitors disk usage, I/O statistics, and filesystem metrics
|
- `disk_monitor`: Monitors disk usage, I/O statistics, and filesystem metrics
|
||||||
- `network_monitor`: Monitors network interface statistics, bandwidth, and connections
|
- `network_monitor`: Monitors network interface statistics, bandwidth, and connections
|
||||||
|
- `ping_monitor`: Measures round-trip latency to configured hosts
|
||||||
- `filesystem_info`: Collects mounted filesystem information (physical filesystems only by default)
|
- `filesystem_info`: Collects mounted filesystem information (physical filesystems only by default)
|
||||||
- `nagios_runner`: Executes Nagios monitoring plugins (check_disk, check_load, check_http, etc.)
|
- `nagios_runner`: Executes Nagios monitoring plugins (check_disk, check_load, check_http, etc.)
|
||||||
|
- `zfs_monitor`: Monitors ZFS pool health, capacity, fragmentation, dedup ratio, and cumulative I/O via `zpool(8)`
|
||||||
|
|
||||||
### Nagios Integration
|
### Nagios Integration
|
||||||
|
|
||||||
The `nagios_runner` plugin provides seamless integration with the vast Nagios plugin ecosystem. You can run any Nagios-compatible plugin and have the results automatically parsed and stored:
|
The `nagios_runner` plugin provides seamless integration with the vast Nagios plugin ecosystem. You can run any Nagios-compatible plugin and have the results automatically parsed and stored:
|
||||||
|
|
||||||
- Executes plugins via subprocess with timeout protection
|
- Executes plugins asynchronously (non-blocking) with timeout protection
|
||||||
|
- Captures both stdout and stderr; if stdout is empty, stderr is used as the status message
|
||||||
|
- Handles signal-killed processes (negative exit code → UNKNOWN status)
|
||||||
|
- Validates absolute command paths at startup and warns on missing or non-executable files
|
||||||
- Parses exit codes (OK/WARNING/CRITICAL/UNKNOWN)
|
- Parses exit codes (OK/WARNING/CRITICAL/UNKNOWN)
|
||||||
- Extracts performance data with thresholds
|
- Extracts performance data with thresholds
|
||||||
- Reports aggregated status across all configured checks
|
- Reports per-check status, exit code, and output; no aggregate rollup field
|
||||||
|
|
||||||
See [docs/NAGIOS_INTEGRATION.md](docs/NAGIOS_INTEGRATION.md) for complete integration guide including configuration examples and custom plugin development.
|
See [docs/NAGIOS_INTEGRATION.md](docs/NAGIOS_INTEGRATION.md) for complete integration guide including configuration examples and custom plugin development.
|
||||||
|
|
||||||
@@ -147,9 +155,11 @@ Heartbeat includes a sophisticated threshold alerting system that monitors plugi
|
|||||||
- **Multi-level alerts**: WARNING and CRITICAL severity levels
|
- **Multi-level alerts**: WARNING and CRITICAL severity levels
|
||||||
- **Flexible operators**: Support for >, >=, <, <=, ==, != comparisons
|
- **Flexible operators**: Support for >, >=, <, <=, ==, != comparisons
|
||||||
- **Hysteresis**: Prevents alert flapping with configurable recovery thresholds
|
- **Hysteresis**: Prevents alert flapping with configurable recovery thresholds
|
||||||
- **Smart notifications**: Alerts only on state changes, not every check
|
- **Smart notifications**: Alerts only on state changes, not every check; de-escalations (e.g. CRITICAL → WARNING) do not generate a notification
|
||||||
- **Re-notifications**: Periodic reminders for ongoing alerts
|
- **Re-notifications**: Periodic reminders for ongoing alerts
|
||||||
|
- **Short-duration suppression**: Recovery notifications are suppressed for down events under 4 seconds (avoids noise from transient blips)
|
||||||
- **Journal integration**: All threshold events logged for audit trail
|
- **Journal integration**: All threshold events logged for audit trail
|
||||||
|
- **`ping_monitor` thresholds**: Latency and packet-loss thresholds use the same format as all other plugin metrics
|
||||||
|
|
||||||
### Configuration
|
### Configuration
|
||||||
|
|
||||||
@@ -172,7 +182,8 @@ thresholds:
|
|||||||
warning: 80.0 # Warn when CPU > 80%
|
warning: 80.0 # Warn when CPU > 80%
|
||||||
critical: 90.0 # Critical when CPU > 90%
|
critical: 90.0 # Critical when CPU > 90%
|
||||||
operator: ">"
|
operator: ">"
|
||||||
hysteresis: 0.1 # 10% hysteresis to prevent flapping
|
hysteresis: 0.02 # 2% hysteresis to prevent flapping
|
||||||
|
display: "(threshold: {op_symbol} {threshold_value}%)" # optional
|
||||||
|
|
||||||
memory_monitor:
|
memory_monitor:
|
||||||
percent:
|
percent:
|
||||||
@@ -214,7 +225,7 @@ thresholds:
|
|||||||
<hostname>:
|
<hostname>:
|
||||||
warning: <milliseconds> # Warn when RTT > this value
|
warning: <milliseconds> # Warn when RTT > this value
|
||||||
critical: <milliseconds> # Critical when RTT > this value
|
critical: <milliseconds> # Critical when RTT > this value
|
||||||
hysteresis: 0.1 # Optional: 10% hysteresis (default)
|
hysteresis: 0.02 # Optional: 2% hysteresis (default)
|
||||||
```
|
```
|
||||||
|
|
||||||
**Example alerts:**
|
**Example alerts:**
|
||||||
@@ -265,7 +276,94 @@ All plugin metrics can be thresholded:
|
|||||||
- **Memory**: percent, available_mb, swap_percent
|
- **Memory**: percent, available_mb, swap_percent
|
||||||
- **Disk**: Per-partition percent, free_gb, free_mb
|
- **Disk**: Per-partition percent, free_gb, free_mb
|
||||||
- **Network**: errors_total, dropped packets, connection counts
|
- **Network**: errors_total, dropped packets, connection counts
|
||||||
- **Nagios**: exit_code mapping (0=OK, 1=WARNING, 2=CRITICAL)
|
- **Nagios**: Any field emitted by `nagios_runner` (`<name>_status_code`, `<name>_status`, `<name>_output`, performance data fields)
|
||||||
|
|
||||||
|
### Display Format Templates
|
||||||
|
|
||||||
|
Each threshold entry accepts an optional `display` field — a Python format string shown in notifications and on the Alerts dashboard:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
nagios_runner:
|
||||||
|
status_code:
|
||||||
|
warning: 1
|
||||||
|
critical: 2
|
||||||
|
operator: ">="
|
||||||
|
display: "{check_name}: exit {value} (expected < {threshold_value})"
|
||||||
|
```
|
||||||
|
|
||||||
|
Available variables:
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
|---|---|
|
||||||
|
| `{value}` | Current metric value |
|
||||||
|
| `{threshold_value}` | Threshold that was crossed |
|
||||||
|
| `{op_symbol}` | Comparison operator (`>`, `<`, `>=`, …); `"nagios"` for the nagios operator |
|
||||||
|
| `{check_name}` | Prefix stripped by generic matching (see below) |
|
||||||
|
| `{metric_name}` | Full field name within the plugin data |
|
||||||
|
| `{output}` | For `nagios_runner` generic matches: the matched check's status text (alias for `{check_name}_output`) |
|
||||||
|
| `{status}` | For `nagios_runner` generic matches: the matched check's status name — OK/WARNING/CRITICAL/UNKNOWN (alias for `{check_name}_status`) |
|
||||||
|
| any plugin field | Any other field present in the plugin's data |
|
||||||
|
|
||||||
|
### Generic Threshold Matching
|
||||||
|
|
||||||
|
When a metric name has no exact threshold entry, the server progressively strips leading underscore-separated segments and re-tries the lookup. This lets a single generic entry cover an entire family of metrics.
|
||||||
|
|
||||||
|
The classic use case is `nagios_runner`, which names each metric after the command that produced it:
|
||||||
|
|
||||||
|
```
|
||||||
|
nagios_runner.check_disk_root_status_code → no exact match
|
||||||
|
nagios_runner.disk_root_status_code → no match
|
||||||
|
nagios_runner.root_status_code → no match
|
||||||
|
nagios_runner.status_code → matched ✓
|
||||||
|
```
|
||||||
|
|
||||||
|
Configure the generic threshold once using the `nagios` operator, which maps exit codes directly to alert severity without requiring numeric warning/critical values:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
nagios_runner:
|
||||||
|
status_code:
|
||||||
|
operator: "nagios" # 0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN
|
||||||
|
display: "{check_name}: {output}"
|
||||||
|
```
|
||||||
|
|
||||||
|
The stripped prefix (`check_disk_root` in the example above) is available as `{check_name}` in the display template, so you can identify which check triggered the alert without writing a separate threshold entry per command.
|
||||||
|
|
||||||
|
Exact matches always take priority. A generic entry only applies when no specific one is defined.
|
||||||
|
|
||||||
|
### Per-Host Threshold Profiles
|
||||||
|
|
||||||
|
Named threshold configurations let different hosts use different limits. A host's `threshold_config` can be a single name or a **list** — lists are applied left-to-right so profiles compose without duplication:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
threshold_configs:
|
||||||
|
default:
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 80, critical: 90}
|
||||||
|
memory_monitor:
|
||||||
|
memory_percent: {warning: 85, critical: 95}
|
||||||
|
|
||||||
|
tight_cpu: # override CPU limits only
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 60, critical: 75}
|
||||||
|
|
||||||
|
db_disk: # add a database partition check
|
||||||
|
thresholds:
|
||||||
|
disk_monitor:
|
||||||
|
partitions:
|
||||||
|
/var/lib/postgresql:
|
||||||
|
percent: {warning: 75, critical: 88}
|
||||||
|
|
||||||
|
hosts:
|
||||||
|
web-01:
|
||||||
|
threshold_config: default # single profile
|
||||||
|
|
||||||
|
db-01:
|
||||||
|
threshold_config: [tight_cpu, db_disk] # layered: CPU override + extra disk check
|
||||||
|
```
|
||||||
|
|
||||||
|
Each named config's overrides are applied in order on top of the defaults. Metrics not mentioned in a profile are inherited unchanged.
|
||||||
|
|
||||||
See [docs/THRESHOLD_ALERTING.md](docs/THRESHOLD_ALERTING.md) for comprehensive documentation including best practices, troubleshooting, and advanced configuration.
|
See [docs/THRESHOLD_ALERTING.md](docs/THRESHOLD_ALERTING.md) for comprehensive documentation including best practices, troubleshooting, and advanced configuration.
|
||||||
|
|
||||||
@@ -328,9 +426,10 @@ Heartbeat includes a built-in HTTP/WebSocket server that provides both a REST AP
|
|||||||
### Web Dashboards
|
### Web Dashboards
|
||||||
|
|
||||||
- **Login** (`/login`): Browser login form (shown automatically when auth is configured)
|
- **Login** (`/login`): Browser login form (shown automatically when auth is configured)
|
||||||
- **Live View** (`/live`): Real-time host connectivity, latency, and messages
|
- **Live View** (`/live`): Real-time host connectivity, latency, and messages; hostnames link directly to the Host Overview page
|
||||||
- **Plugin Metrics** (`/plugins`): Browse and visualize metrics from all plugins
|
- **Host Overview** (`/plugins/<host>`): Per-host plugin metrics with ZFS pool visualization; filtered to hosts where the logged-in user is owner or manager (admins see all)
|
||||||
- **Alerts Dashboard** (`/alerts`): Monitor active alerts with severity filtering
|
- **Alerts Dashboard** (`/alerts`): Monitor active alerts with severity filtering; alert count pie chart shown in the navigation bar
|
||||||
|
- **Settings** (`/settings`): Server configuration, user management, and threshold configuration viewer
|
||||||
|
|
||||||
### API Endpoints
|
### API Endpoints
|
||||||
|
|
||||||
@@ -377,7 +476,7 @@ This project now declares its dependencies in `pyproject.toml`. Instead
|
|||||||
of the old `requirements.txt` flow, install the package into a virtualenv
|
of the old `requirements.txt` flow, install the package into a virtualenv
|
||||||
using `pip`:
|
using `pip`:
|
||||||
|
|
||||||
See `scripts/install.sh` for a way to install.
|
See `scripts/hb_install.sh` for a way to install.
|
||||||
|
|
||||||
Run the daemon (example):
|
Run the daemon (example):
|
||||||
|
|
||||||
@@ -408,6 +507,9 @@ hbc --boot your-server.example.com
|
|||||||
|
|
||||||
# Verbose output
|
# Verbose output
|
||||||
hbc -v your-server.example.com
|
hbc -v your-server.example.com
|
||||||
|
|
||||||
|
# Send 'boot' and 'shutdown' messages on start and exit
|
||||||
|
hbc -b your-server.example.com
|
||||||
```
|
```
|
||||||
|
|
||||||
You can also run it via the module entrypoint:
|
You can also run it via the module entrypoint:
|
||||||
@@ -416,12 +518,11 @@ You can also run it via the module entrypoint:
|
|||||||
python -m hbd.client.main your-server.example.com
|
python -m hbd.client.main your-server.example.com
|
||||||
```
|
```
|
||||||
|
|
||||||
Client configuration can also be specified in YAML:
|
Client configuration can also be specified in YAML (`~/.hbc.yaml`):
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
server: hbd.example.com
|
hb_port: 50003 # Server port (default: 50003)
|
||||||
port: 50003
|
interval: 30 # Heartbeat interval in seconds
|
||||||
interval: 30
|
|
||||||
plugins:
|
plugins:
|
||||||
cpu_monitor:
|
cpu_monitor:
|
||||||
interval: 300 # Check every 5 minutes (default)
|
interval: 300 # Check every 5 minutes (default)
|
||||||
@@ -435,12 +536,84 @@ plugins:
|
|||||||
nagios_runner:
|
nagios_runner:
|
||||||
interval: 300 # Check every 5 minutes (default)
|
interval: 300 # Check every 5 minutes (default)
|
||||||
commands:
|
commands:
|
||||||
- /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6
|
- name: check_load
|
||||||
- /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
|
command: /usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6
|
||||||
|
- name: check_disk
|
||||||
|
command: /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The server hostname is always passed as a positional command-line argument; there is no `server:` config key.
|
||||||
|
|
||||||
All monitoring plugins default to 5-minute (300 second) intervals, but can be customized as needed.
|
All monitoring plugins default to 5-minute (300 second) intervals, but can be customized as needed.
|
||||||
|
|
||||||
|
**Connection retry:** If a server is temporarily unreachable, `hbc` retries `open()` indefinitely on every heartbeat interval. IPv6 connections that never succeeded during early startup are dropped after 3 consecutive failures (to handle hosts without IPv6 routing), while IPv4 connections always retry.
|
||||||
|
|
||||||
|
**Daemon logging:** When running with `-d`, `hbc` routes all log output to syslog (`LOG_DAEMON` facility) after daemonizing. Without `-d`, logs go to stderr as usual.
|
||||||
|
|
||||||
|
### hbc_mini — single-file client (no external dependencies)
|
||||||
|
|
||||||
|
`scripts/hbc_mini.py` is a self-contained version of the heartbeat client that requires only Python 3.8+ and no external packages. Copy it to any host and run it directly — no virtualenv, no `pip install`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Basic usage
|
||||||
|
python3 hbc_mini.py your-server.example.com
|
||||||
|
|
||||||
|
# Run as daemon
|
||||||
|
python3 hbc_mini.py -d your-server.example.com
|
||||||
|
|
||||||
|
# Send a boot message
|
||||||
|
python3 hbc_mini.py -b your-server.example.com
|
||||||
|
|
||||||
|
# Send a one-off message
|
||||||
|
python3 hbc_mini.py -m "maintenance starting" your-server.example.com
|
||||||
|
```
|
||||||
|
|
||||||
|
**Config:** `~/.hbc.json` (same keys as `~/.hbc.yaml`, JSON format). Example:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"hb_port": 50003,
|
||||||
|
"interval": 30,
|
||||||
|
"plugins": {
|
||||||
|
"ping_monitor": {
|
||||||
|
"interval": 60,
|
||||||
|
"hosts": ["8.8.8.8", "192.168.1.1"]
|
||||||
|
},
|
||||||
|
"nagios_runner": {
|
||||||
|
"interval": 300,
|
||||||
|
"commands": [
|
||||||
|
{"name": "check_load", "command": "/usr/lib/nagios/plugins/check_load -w 5,4,3 -c 10,8,6"}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Plugin availability:**
|
||||||
|
|
||||||
|
| Plugin | Platform | Data source |
|
||||||
|
|---|---|---|
|
||||||
|
| `os_info` | all | `platform` stdlib |
|
||||||
|
| `ping_monitor` | all | `ping` subprocess |
|
||||||
|
| `nagios_runner` | all (not Windows) | subprocess |
|
||||||
|
| `cpu_monitor` | Linux | `/proc/stat` |
|
||||||
|
| `memory_monitor` | Linux | `/proc/meminfo` |
|
||||||
|
| `disk_monitor` | Linux, macOS, BSD | `df -P` subprocess |
|
||||||
|
| `network_monitor` | Linux | `/proc/net/dev` |
|
||||||
|
|
||||||
|
**What is not available compared to the full `hbc`:**
|
||||||
|
|
||||||
|
- No YAML config (use JSON instead)
|
||||||
|
- No `filesystem_info` plugin
|
||||||
|
- No `zfs_monitor` plugin (requires `zpool(8)` and the full plugin loader)
|
||||||
|
- `cpu_monitor` does not report per-core usage or CPU frequency (no psutil)
|
||||||
|
- Plugins cannot be loaded from external `.py` files — all plugins are compiled in
|
||||||
|
- No IPv6 early-fail protection — connections that fail to open at startup are silently skipped rather than retried
|
||||||
|
|
||||||
|
Everything else — heartbeat protocol, ACK/CMD/UPD handling, `hb_install.sh`-based self-update, daemonize, syslog — is identical to the full client.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## 🐞 Debugging in VS Code
|
## 🐞 Debugging in VS Code
|
||||||
|
|
||||||
This repository includes a ready-to-use `.vscode/launch.json` with configurations to run or attach the VS Code debugger to `hbd`.
|
This repository includes a ready-to-use `.vscode/launch.json` with configurations to run or attach the VS Code debugger to `hbd`.
|
||||||
|
|||||||
@@ -104,11 +104,6 @@ The `nagios_runner` plugin collects:
|
|||||||
- `{name}_{metric}_min` - Minimum value (if present)
|
- `{name}_{metric}_min` - Minimum value (if present)
|
||||||
- `{name}_{metric}_max` - Maximum value (if present)
|
- `{name}_{metric}_max` - Maximum value (if present)
|
||||||
|
|
||||||
**Overall:**
|
|
||||||
- `overall_status` - Worst status from all commands
|
|
||||||
- `overall_status_code` - Worst status code
|
|
||||||
- `plugin_count` - Number of Nagios plugins executed
|
|
||||||
|
|
||||||
## Configuration Options
|
## Configuration Options
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ This guide explains how to create custom plugins for the Heartbeat monitoring sy
|
|||||||
- [Plugin Types](#plugin-types)
|
- [Plugin Types](#plugin-types)
|
||||||
- [Creating a Plugin](#creating-a-plugin)
|
- [Creating a Plugin](#creating-a-plugin)
|
||||||
- [Plugin Lifecycle](#plugin-lifecycle)
|
- [Plugin Lifecycle](#plugin-lifecycle)
|
||||||
|
- [Server-initiated InfoPlugin refresh](#server-initiated-infoplugin-refresh)
|
||||||
- [Configuration](#configuration)
|
- [Configuration](#configuration)
|
||||||
- [Best Practices](#best-practices)
|
- [Best Practices](#best-practices)
|
||||||
- [Examples](#examples)
|
- [Examples](#examples)
|
||||||
@@ -250,6 +251,28 @@ Understanding the plugin lifecycle helps you implement plugins correctly:
|
|||||||
└─> Plugin releases resources, closes connections
|
└─> Plugin releases resources, closes connections
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Server-initiated InfoPlugin refresh
|
||||||
|
|
||||||
|
When a heartbeat packet arrives from a host the server has no plugin data for (e.g. after a server restart), the server sets `request_update = 1` in the ACK reply. The client detects this flag and immediately re-runs all InfoPlugins — clearing their cached results first — then resends the data as PLG messages.
|
||||||
|
|
||||||
|
This means InfoPlugin data will always reach the server as soon as possible without requiring a client restart. No action is needed from plugin authors: the framework handles cache invalidation and re-collection automatically.
|
||||||
|
|
||||||
|
The lifecycle for this case looks like:
|
||||||
|
|
||||||
|
```
|
||||||
|
Server restarts, host reconnects
|
||||||
|
└─> hbd receives HTB with no existing plugin_data for host
|
||||||
|
└─> hbd sets request_update=1 in ACK
|
||||||
|
|
||||||
|
Client receives ACK
|
||||||
|
└─> Detects request_update flag
|
||||||
|
└─> Clears _cache on every registered InfoPlugin
|
||||||
|
└─> Calls collect() on each InfoPlugin
|
||||||
|
└─> Sends fresh PLG messages to server
|
||||||
|
```
|
||||||
|
|
||||||
|
If you write an `InfoPlugin` with side effects in `_collect_info()` (opening connections, writing files, etc.), be aware it may be called more than once per client session when this mechanism triggers.
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
### Plugin-Specific Configuration
|
### Plugin-Specific Configuration
|
||||||
|
|||||||
+231
-72
@@ -256,6 +256,56 @@ disk_monitor:
|
|||||||
operator: "<"
|
operator: "<"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### ZFS Monitor
|
||||||
|
|
||||||
|
ZFS pool health is checked automatically for every pool. A pool in any state
|
||||||
|
other than `ONLINE` (e.g. `DEGRADED`, `SUSPENDED`, `FAULTED`, `UNAVAIL`) raises
|
||||||
|
a **CRITICAL** alert by default — no configuration required.
|
||||||
|
|
||||||
|
The default threshold is equivalent to:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
zfs_monitor:
|
||||||
|
pools:
|
||||||
|
'*':
|
||||||
|
status:
|
||||||
|
warning: 1
|
||||||
|
critical: 2
|
||||||
|
operator: ">"
|
||||||
|
hysteresis: 0.0
|
||||||
|
display: "ZFS pool {pool_name} is {health}"
|
||||||
|
```
|
||||||
|
|
||||||
|
`'*'` matches every pool on the host. The notification message includes the pool
|
||||||
|
name and its current health string, e.g. `ZFS pool tank is DEGRADED`.
|
||||||
|
|
||||||
|
**Override for specific pools** — named pool entries take priority over `'*'`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
zfs_monitor:
|
||||||
|
pools:
|
||||||
|
# Suppress health alerts for a scratch pool (not mission-critical)
|
||||||
|
scratch:
|
||||||
|
status:
|
||||||
|
enabled: false
|
||||||
|
|
||||||
|
# Capacity threshold for a specific pool
|
||||||
|
tank:
|
||||||
|
capacity:
|
||||||
|
warning: 75.0
|
||||||
|
critical: 90.0
|
||||||
|
operator: ">"
|
||||||
|
hysteresis: 0.05
|
||||||
|
```
|
||||||
|
|
||||||
|
**Alert state paths** follow the pattern `zfs_monitor.<pool_name>.status`,
|
||||||
|
so acknowledgements and silences target individual pools:
|
||||||
|
|
||||||
|
```
|
||||||
|
zfs_monitor.tank.status
|
||||||
|
zfs_monitor.backup.status
|
||||||
|
```
|
||||||
|
|
||||||
### Network Monitor
|
### Network Monitor
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
@@ -814,42 +864,39 @@ Planned features:
|
|||||||
|
|
||||||
## Multi-Threshold Configuration
|
## Multi-Threshold Configuration
|
||||||
|
|
||||||
**New in version 2.0**: Support for multiple named threshold configurations with per-host mapping.
|
Support for multiple named threshold configurations with per-host mapping and composable layering.
|
||||||
|
|
||||||
### Overview
|
### Overview
|
||||||
|
|
||||||
The multi-threshold feature allows you to:
|
The multi-threshold feature allows you to:
|
||||||
- Define multiple sets of threshold configurations
|
- Define multiple named threshold configurations
|
||||||
- Map different hosts to different threshold sets
|
- Assign one or more configurations to each host
|
||||||
|
- Compose configurations by layering — each named config's overrides are applied in order on top of the defaults
|
||||||
- Use different sensitivity levels for different environments
|
- Use different sensitivity levels for different environments
|
||||||
- Maintain a default configuration for unmapped hosts
|
|
||||||
|
|
||||||
### Configuration Structure
|
### Configuration Structure
|
||||||
|
|
||||||
|
Named configurations are defined under `threshold_configs`. Each host selects which ones to use via `threshold_config` in the `hosts` section (a string for a single config, or a list to layer multiple):
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
# Optional: Set the default configuration name (defaults to "default")
|
# Optional: set the default configuration name (defaults to "default")
|
||||||
default_threshold_config: "default"
|
default_threshold_config: "default"
|
||||||
|
|
||||||
# Define multiple named threshold configurations
|
|
||||||
threshold_configs:
|
threshold_configs:
|
||||||
# Configuration name 1
|
|
||||||
default:
|
default:
|
||||||
thresholds:
|
thresholds:
|
||||||
# Standard threshold definitions
|
|
||||||
cpu_monitor:
|
cpu_monitor:
|
||||||
cpu_percent:
|
cpu_percent:
|
||||||
warning: 80.0
|
warning: 80.0
|
||||||
critical: 90.0
|
critical: 90.0
|
||||||
|
|
||||||
# Configuration name 2
|
|
||||||
high_sensitivity:
|
high_sensitivity:
|
||||||
thresholds:
|
thresholds:
|
||||||
cpu_monitor:
|
cpu_monitor:
|
||||||
cpu_percent:
|
cpu_percent:
|
||||||
warning: 60.0
|
warning: 60.0
|
||||||
critical: 75.0
|
critical: 75.0
|
||||||
|
|
||||||
# Configuration name 3
|
|
||||||
low_sensitivity:
|
low_sensitivity:
|
||||||
thresholds:
|
thresholds:
|
||||||
cpu_monitor:
|
cpu_monitor:
|
||||||
@@ -857,14 +904,77 @@ threshold_configs:
|
|||||||
warning: 90.0
|
warning: 90.0
|
||||||
critical: 95.0
|
critical: 95.0
|
||||||
|
|
||||||
# Map specific hosts to specific configurations
|
hosts:
|
||||||
host_threshold_mapping:
|
prod-web-01:
|
||||||
prod-web-01: high_sensitivity
|
threshold_config: high_sensitivity # single config
|
||||||
prod-web-02: high_sensitivity
|
|
||||||
dev-server-01: low_sensitivity
|
dev-server-01:
|
||||||
# Unmapped hosts use default_threshold_config
|
threshold_config: low_sensitivity
|
||||||
|
|
||||||
|
# Hosts with no threshold_config use default_threshold_config
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Composable Configurations (list form)
|
||||||
|
|
||||||
|
`threshold_config` can be a list. Configs are applied **left to right**: the defaults are the base, then each named config's overrides are layered on top. Later entries in the list win on any metric they define.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
threshold_configs:
|
||||||
|
default:
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 80, critical: 90}
|
||||||
|
memory_monitor:
|
||||||
|
memory_percent: {warning: 85, critical: 95}
|
||||||
|
disk_monitor:
|
||||||
|
partitions:
|
||||||
|
/:
|
||||||
|
percent: {warning: 80, critical: 90}
|
||||||
|
|
||||||
|
# Tighter CPU limits for busy servers
|
||||||
|
high_cpu_load:
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 60, critical: 75}
|
||||||
|
|
||||||
|
# Tighter disk limits for data-heavy servers
|
||||||
|
busy_disk:
|
||||||
|
thresholds:
|
||||||
|
disk_monitor:
|
||||||
|
partitions:
|
||||||
|
/:
|
||||||
|
percent: {warning: 70, critical: 85}
|
||||||
|
|
||||||
|
hosts:
|
||||||
|
# Gets default thresholds only
|
||||||
|
web-01:
|
||||||
|
threshold_config: default
|
||||||
|
|
||||||
|
# Gets tighter CPU limits, default memory and disk
|
||||||
|
build-server:
|
||||||
|
threshold_config: high_cpu_load
|
||||||
|
|
||||||
|
# Layers both: tighter CPU AND tighter disk, default memory
|
||||||
|
db-01:
|
||||||
|
threshold_config: [high_cpu_load, busy_disk]
|
||||||
|
|
||||||
|
# Three layers: busy_disk overrides high_cpu_load if they conflict
|
||||||
|
storage-01:
|
||||||
|
threshold_config: [default, high_cpu_load, busy_disk]
|
||||||
|
```
|
||||||
|
|
||||||
|
**How layering works:**
|
||||||
|
|
||||||
|
Starting from the `default` thresholds:
|
||||||
|
|
||||||
|
| Layer | Applied config | Effect |
|
||||||
|
|-------|---------------|--------|
|
||||||
|
| Base | `default` | all default thresholds |
|
||||||
|
| +1 | `high_cpu_load` | cpu_percent overridden to 60/75 |
|
||||||
|
| +2 | `busy_disk` | disk percent overridden to 70/85; cpu_percent stays at 60/75 |
|
||||||
|
|
||||||
|
Each named config only overrides the metrics it explicitly defines. Metrics not mentioned in a config inherit from the layers beneath.
|
||||||
|
|
||||||
### Use Cases
|
### Use Cases
|
||||||
|
|
||||||
#### 1. Environment-Based Thresholds
|
#### 1. Environment-Based Thresholds
|
||||||
@@ -879,7 +989,7 @@ threshold_configs:
|
|||||||
cpu_percent:
|
cpu_percent:
|
||||||
warning: 70.0 # Alert earlier in production
|
warning: 70.0 # Alert earlier in production
|
||||||
critical: 85.0
|
critical: 85.0
|
||||||
|
|
||||||
development:
|
development:
|
||||||
thresholds:
|
thresholds:
|
||||||
cpu_monitor:
|
cpu_monitor:
|
||||||
@@ -887,11 +997,15 @@ threshold_configs:
|
|||||||
warning: 90.0 # More relaxed for dev
|
warning: 90.0 # More relaxed for dev
|
||||||
critical: 98.0
|
critical: 98.0
|
||||||
|
|
||||||
host_threshold_mapping:
|
hosts:
|
||||||
prod-web-01: production
|
prod-web-01:
|
||||||
prod-web-02: production
|
threshold_config: production
|
||||||
dev-web-01: development
|
prod-web-02:
|
||||||
dev-web-02: development
|
threshold_config: production
|
||||||
|
dev-web-01:
|
||||||
|
threshold_config: development
|
||||||
|
dev-web-02:
|
||||||
|
threshold_config: development
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 2. Server Role-Based Thresholds
|
#### 2. Server Role-Based Thresholds
|
||||||
@@ -906,7 +1020,7 @@ threshold_configs:
|
|||||||
cpu_percent:
|
cpu_percent:
|
||||||
warning: 80.0
|
warning: 80.0
|
||||||
critical: 90.0
|
critical: 90.0
|
||||||
|
|
||||||
database:
|
database:
|
||||||
thresholds:
|
thresholds:
|
||||||
cpu_monitor:
|
cpu_monitor:
|
||||||
@@ -914,7 +1028,7 @@ threshold_configs:
|
|||||||
warning: 70.0
|
warning: 70.0
|
||||||
critical: 85.0
|
critical: 85.0
|
||||||
memory_monitor:
|
memory_monitor:
|
||||||
percent:
|
memory_percent:
|
||||||
warning: 90.0 # Databases can use high memory
|
warning: 90.0 # Databases can use high memory
|
||||||
critical: 97.0
|
critical: 97.0
|
||||||
disk_monitor:
|
disk_monitor:
|
||||||
@@ -923,21 +1037,27 @@ threshold_configs:
|
|||||||
percent:
|
percent:
|
||||||
warning: 75.0
|
warning: 75.0
|
||||||
critical: 85.0
|
critical: 85.0
|
||||||
|
|
||||||
cache:
|
cache:
|
||||||
thresholds:
|
thresholds:
|
||||||
memory_monitor:
|
memory_monitor:
|
||||||
percent:
|
memory_percent:
|
||||||
warning: 95.0 # Redis/Memcached can use very high memory
|
warning: 95.0 # Redis/Memcached can use very high memory
|
||||||
critical: 99.0
|
critical: 99.0
|
||||||
|
|
||||||
host_threshold_mapping:
|
hosts:
|
||||||
web-01: webserver
|
web-01:
|
||||||
web-02: webserver
|
threshold_config: webserver
|
||||||
db-01: database
|
web-02:
|
||||||
db-02: database
|
threshold_config: webserver
|
||||||
redis-01: cache
|
db-01:
|
||||||
memcached-01: cache
|
threshold_config: database
|
||||||
|
db-02:
|
||||||
|
threshold_config: database
|
||||||
|
redis-01:
|
||||||
|
threshold_config: cache
|
||||||
|
memcached-01:
|
||||||
|
threshold_config: cache
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 3. Sensitivity Levels
|
#### 3. Sensitivity Levels
|
||||||
@@ -952,10 +1072,10 @@ threshold_configs:
|
|||||||
partitions:
|
partitions:
|
||||||
/:
|
/:
|
||||||
percent:
|
percent:
|
||||||
warning: 70.0 # Very sensitive
|
warning: 70.0
|
||||||
critical: 80.0
|
critical: 80.0
|
||||||
hysteresis: 0.15
|
hysteresis: 0.15
|
||||||
|
|
||||||
standard:
|
standard:
|
||||||
thresholds:
|
thresholds:
|
||||||
disk_monitor:
|
disk_monitor:
|
||||||
@@ -965,7 +1085,7 @@ threshold_configs:
|
|||||||
warning: 85.0
|
warning: 85.0
|
||||||
critical: 95.0
|
critical: 95.0
|
||||||
hysteresis: 0.1
|
hysteresis: 0.1
|
||||||
|
|
||||||
relaxed:
|
relaxed:
|
||||||
thresholds:
|
thresholds:
|
||||||
disk_monitor:
|
disk_monitor:
|
||||||
@@ -976,52 +1096,91 @@ threshold_configs:
|
|||||||
critical: 98.0
|
critical: 98.0
|
||||||
hysteresis: 0.05
|
hysteresis: 0.05
|
||||||
|
|
||||||
host_threshold_mapping:
|
hosts:
|
||||||
payment-gateway: critical
|
payment-gateway:
|
||||||
auth-server: critical
|
threshold_config: critical
|
||||||
web-01: standard
|
auth-server:
|
||||||
web-02: standard
|
threshold_config: critical
|
||||||
test-server: relaxed
|
web-01:
|
||||||
|
threshold_config: standard
|
||||||
|
web-02:
|
||||||
|
threshold_config: standard
|
||||||
|
test-server:
|
||||||
|
threshold_config: relaxed
|
||||||
```
|
```
|
||||||
|
|
||||||
### Backward Compatibility
|
#### 4. Composable Profiles
|
||||||
|
|
||||||
The legacy single threshold configuration is fully supported:
|
Build host-specific thresholds by combining small, focused configs:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
# Old format - still works
|
|
||||||
thresholds:
|
|
||||||
cpu_monitor:
|
|
||||||
cpu_percent:
|
|
||||||
warning: 80.0
|
|
||||||
critical: 90.0
|
|
||||||
```
|
|
||||||
|
|
||||||
This is equivalent to:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# New format
|
|
||||||
threshold_configs:
|
threshold_configs:
|
||||||
|
# Baseline — everything at default levels
|
||||||
default:
|
default:
|
||||||
thresholds:
|
thresholds:
|
||||||
cpu_monitor:
|
cpu_monitor:
|
||||||
cpu_percent:
|
cpu_percent: {warning: 80, critical: 90}
|
||||||
warning: 80.0
|
memory_monitor:
|
||||||
critical: 90.0
|
memory_percent: {warning: 85, critical: 95}
|
||||||
```
|
|
||||||
|
|
||||||
|
# Overlay: tighter CPU only
|
||||||
|
tight_cpu:
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 60, critical: 75}
|
||||||
|
|
||||||
|
# Overlay: tighter memory only
|
||||||
|
tight_memory:
|
||||||
|
thresholds:
|
||||||
|
memory_monitor:
|
||||||
|
memory_percent: {warning: 70, critical: 85}
|
||||||
|
|
||||||
|
# Overlay: extra disk partition for database servers
|
||||||
|
db_disk:
|
||||||
|
thresholds:
|
||||||
|
disk_monitor:
|
||||||
|
partitions:
|
||||||
|
/var/lib/postgresql:
|
||||||
|
percent: {warning: 75, critical: 88}
|
||||||
|
|
||||||
|
hosts:
|
||||||
|
# Plain web server
|
||||||
|
web-01:
|
||||||
|
threshold_config: default
|
||||||
|
|
||||||
|
# Build server: tight CPU, default memory and disk
|
||||||
|
build-01:
|
||||||
|
threshold_config: tight_cpu
|
||||||
|
|
||||||
|
# Database: tight CPU + tight memory + extra disk partition
|
||||||
|
db-01:
|
||||||
|
threshold_config: [tight_cpu, tight_memory, db_disk]
|
||||||
|
|
||||||
|
# Replica database: tight memory + extra disk, normal CPU
|
||||||
|
db-02:
|
||||||
|
threshold_config: [tight_memory, db_disk]
|
||||||
|
```
|
||||||
### Configuration Priority
|
### Configuration Priority
|
||||||
|
|
||||||
1. **Host-specific mapping**: If host is in `host_threshold_mapping`, use that config
|
1. **Host `threshold_config` (list)**: Layer each named config's overrides left-to-right on top of the defaults
|
||||||
2. **Default config**: Use `default_threshold_config`
|
2. **Host `threshold_config` (string)**: Use that single named config directly
|
||||||
3. **First alphabetically**: If default not found, use first config alphabetically
|
3. **`host_threshold_mapping`** (legacy): Same as above, string only
|
||||||
4. **Legacy fallback**: If `threshold_configs` not present, use `thresholds`
|
4. **`default_threshold_config`**: Used for hosts with no mapping
|
||||||
|
5. **First alphabetically**: If the default config is not found, use the first config alphabetically
|
||||||
|
6. **Legacy `thresholds` section**: Used when `threshold_configs` is absent entirely
|
||||||
|
|
||||||
### Example: Complete Multi-Threshold Setup
|
### Backward Compatibility
|
||||||
|
|
||||||
See `hbd/config_multi_threshold_example.yaml` for a complete example with:
|
The legacy `host_threshold_mapping` top-level key and the flat `thresholds` section are still fully supported:
|
||||||
- 4 named configurations (default, high_sensitivity, low_sensitivity, database)
|
|
||||||
- Host-to-config mappings for production, development, and test systems
|
```yaml
|
||||||
- Specialized database server thresholds
|
# Still works — equivalent to hosts: {prod-web-01: {threshold_config: high_sensitivity}}
|
||||||
- Custom display messages with plugin data
|
host_threshold_mapping:
|
||||||
|
prod-web-01: high_sensitivity
|
||||||
|
|
||||||
|
# Still works — equivalent to threshold_configs: {default: {thresholds: ...}}
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 80, critical: 90}
|
||||||
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -46,6 +46,24 @@ default_owner: andreas # owns hosts with no explicit owner
|
|||||||
# falls back to the first admin user if omitted
|
# falls back to the first admin user if omitted
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Client-declared host ownership
|
||||||
|
|
||||||
|
A host can declare its own owner directly in the hbc or hbc_mini client configuration. This is useful for hosts that are not listed in the server config, or during initial setup before a server-side config entry has been created.
|
||||||
|
|
||||||
|
**`~/.hbc.yaml`** (hbc):
|
||||||
|
```yaml
|
||||||
|
owner: andreas
|
||||||
|
```
|
||||||
|
|
||||||
|
**`~/.hbc.json`** (hbc_mini):
|
||||||
|
```json
|
||||||
|
{ "owner": "andreas" }
|
||||||
|
```
|
||||||
|
|
||||||
|
When set, the value is included in the `os_info` plugin data sent to the server. The server applies it as `host.owner` the first time `os_info` arrives, provided no owner has been configured server-side for that host. Server-configured ownership always takes precedence.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
### Assigning roles to hosts
|
### Assigning roles to hosts
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
|
|||||||
@@ -0,0 +1,781 @@
|
|||||||
|
# Gitea OAuth2 Authentication Implementation Plan
|
||||||
|
|
||||||
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||||
|
|
||||||
|
**Goal:** Add Gitea as an OAuth2 login provider that coexists with password auth, auto-provisioning new users on first login.
|
||||||
|
|
||||||
|
**Architecture:** A new `oauth.py` module owns all Gitea-specific logic (CSRF state, URL building, token exchange, user-info fetch). `users.py` gains one function to upsert an OAuth-sourced user. `http.py` gets two new route handlers and a small login-page change. No new dependencies — `aiohttp.ClientSession` is already used in the codebase.
|
||||||
|
|
||||||
|
**Tech Stack:** Python 3.12, aiohttp 3.x, pytest, pytest-asyncio
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## File Map
|
||||||
|
|
||||||
|
| Action | Path | Responsibility |
|
||||||
|
|--------|------|----------------|
|
||||||
|
| Modify | `hbd/server/config.py` | Add `"oauth": {}` default |
|
||||||
|
| Create | `hbd/server/oauth.py` | CSRF state, URL builder, token exchange, user-info fetch |
|
||||||
|
| Modify | `hbd/server/users.py` | Add `provision_oauth_user()` |
|
||||||
|
| Modify | `hbd/server/http.py` | Import oauth, two new routes, login page button |
|
||||||
|
| Create | `tests/test_oauth.py` | All new unit tests |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 1: Add config default and `is_enabled()`
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `hbd/server/config.py:34` (after the `"users"` line)
|
||||||
|
- Create: `hbd/server/oauth.py`
|
||||||
|
- Create: `tests/test_oauth.py`
|
||||||
|
|
||||||
|
- [ ] **Step 1: Write the failing test**
|
||||||
|
|
||||||
|
Create `tests/test_oauth.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pytest
|
||||||
|
from hbd.server import oauth
|
||||||
|
|
||||||
|
|
||||||
|
CFG_OFF = {}
|
||||||
|
CFG_ON = {
|
||||||
|
"oauth": {
|
||||||
|
"gitea": {
|
||||||
|
"url": "https://git.example.com",
|
||||||
|
"client_id": "cid",
|
||||||
|
"client_secret": "csec",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
CFG_PARTIAL = {"oauth": {"gitea": {"url": "https://git.example.com"}}}
|
||||||
|
|
||||||
|
|
||||||
|
def test_is_enabled_when_all_keys_present():
|
||||||
|
assert oauth.is_enabled(CFG_ON) is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_is_enabled_false_when_no_oauth_key():
|
||||||
|
assert oauth.is_enabled(CFG_OFF) is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_is_enabled_false_when_partial_config():
|
||||||
|
assert oauth.is_enabled(CFG_PARTIAL) is False
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Run to confirm failure**
|
||||||
|
|
||||||
|
```
|
||||||
|
pytest tests/test_oauth.py -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `ModuleNotFoundError: No module named 'hbd.server.oauth'`
|
||||||
|
|
||||||
|
- [ ] **Step 3: Add config default**
|
||||||
|
|
||||||
|
In `hbd/server/config.py`, add after the `"default_owner"` line (currently line 35):
|
||||||
|
|
||||||
|
```python
|
||||||
|
# OAuth2 providers
|
||||||
|
"oauth": {}, # oauth.gitea.{url,client_id,client_secret}
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4: Create `hbd/server/oauth.py` with `is_enabled`**
|
||||||
|
|
||||||
|
```python
|
||||||
|
"""Gitea OAuth2 support.
|
||||||
|
|
||||||
|
Config shape (in ~/.hb.yaml):
|
||||||
|
|
||||||
|
oauth:
|
||||||
|
gitea:
|
||||||
|
url: https://git.example.com
|
||||||
|
client_id: <client-id>
|
||||||
|
client_secret: <client-secret>
|
||||||
|
|
||||||
|
Register a Gitea OAuth2 application at:
|
||||||
|
Gitea → Settings → Applications → OAuth2
|
||||||
|
Set the redirect URI to:
|
||||||
|
https://<hbd-host>/login/oauth/gitea/callback
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import secrets
|
||||||
|
import time
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
STATE_TTL = 600 # 10 minutes
|
||||||
|
|
||||||
|
# state_token -> expiry timestamp
|
||||||
|
_states: dict[str, float] = {}
|
||||||
|
|
||||||
|
|
||||||
|
class OAuthError(Exception):
|
||||||
|
"""Raised when the OAuth2 flow fails for any reason."""
|
||||||
|
|
||||||
|
|
||||||
|
def _gitea_cfg(config: dict) -> dict:
|
||||||
|
"""Return the gitea sub-dict or {} if absent/incomplete."""
|
||||||
|
return config.get("oauth", {}).get("gitea", {})
|
||||||
|
|
||||||
|
|
||||||
|
def is_enabled(config: dict) -> bool:
|
||||||
|
"""Return True when all three required Gitea OAuth keys are present."""
|
||||||
|
g = _gitea_cfg(config)
|
||||||
|
return bool(g.get("url") and g.get("client_id") and g.get("client_secret"))
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 5: Run to confirm tests pass**
|
||||||
|
|
||||||
|
```
|
||||||
|
pytest tests/test_oauth.py -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: 3 passed
|
||||||
|
|
||||||
|
- [ ] **Step 6: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add hbd/server/config.py hbd/server/oauth.py tests/test_oauth.py
|
||||||
|
git commit -m "feat: add oauth module skeleton and is_enabled()"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 2: CSRF state management
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `hbd/server/oauth.py` (add `make_state`, `validate_state`)
|
||||||
|
- Modify: `tests/test_oauth.py` (add state tests)
|
||||||
|
|
||||||
|
- [ ] **Step 1: Write the failing tests**
|
||||||
|
|
||||||
|
Append to `tests/test_oauth.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import time as time_mod
|
||||||
|
|
||||||
|
|
||||||
|
def test_make_state_returns_unique_tokens():
|
||||||
|
s1 = oauth.make_state()
|
||||||
|
s2 = oauth.make_state()
|
||||||
|
assert s1 != s2
|
||||||
|
assert len(s1) == 64 # 32 bytes hex
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_state_valid():
|
||||||
|
state = oauth.make_state()
|
||||||
|
assert oauth.validate_state(state) is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_state_consumed_on_use():
|
||||||
|
state = oauth.make_state()
|
||||||
|
oauth.validate_state(state)
|
||||||
|
assert oauth.validate_state(state) is False # replay rejected
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_state_unknown():
|
||||||
|
assert oauth.validate_state("notastate") is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_state_expired(monkeypatch):
|
||||||
|
state = oauth.make_state()
|
||||||
|
# Wind expiry into the past
|
||||||
|
monkeypatch.setitem(oauth._states, state, time_mod.time() - 1)
|
||||||
|
assert oauth.validate_state(state) is False
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Run to confirm failure**
|
||||||
|
|
||||||
|
```
|
||||||
|
pytest tests/test_oauth.py -v -k "state"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `AttributeError: module 'hbd.server.oauth' has no attribute 'make_state'`
|
||||||
|
|
||||||
|
- [ ] **Step 3: Implement state functions**
|
||||||
|
|
||||||
|
Add to `hbd/server/oauth.py` after the `_states` dict definition:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def make_state() -> str:
|
||||||
|
"""Generate a CSRF state token, store it with TTL, and return it."""
|
||||||
|
_purge_states()
|
||||||
|
token = secrets.token_hex(32)
|
||||||
|
_states[token] = time.time() + STATE_TTL
|
||||||
|
return token
|
||||||
|
|
||||||
|
|
||||||
|
def validate_state(state: str) -> bool:
|
||||||
|
"""Return True if *state* is known and unexpired; always removes it."""
|
||||||
|
expiry = _states.pop(state, None)
|
||||||
|
if expiry is None:
|
||||||
|
return False
|
||||||
|
return time.time() < expiry
|
||||||
|
|
||||||
|
|
||||||
|
def _purge_states() -> None:
|
||||||
|
now = time.time()
|
||||||
|
expired = [k for k, exp in list(_states.items()) if exp < now]
|
||||||
|
for k in expired:
|
||||||
|
del _states[k]
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4: Run to confirm tests pass**
|
||||||
|
|
||||||
|
```
|
||||||
|
pytest tests/test_oauth.py -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: 8 passed
|
||||||
|
|
||||||
|
- [ ] **Step 5: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add hbd/server/oauth.py tests/test_oauth.py
|
||||||
|
git commit -m "feat: add OAuth2 CSRF state management"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 3: `provision_oauth_user` in users.py
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `hbd/server/users.py` (add `provision_oauth_user`)
|
||||||
|
- Modify: `tests/test_oauth.py` (add provisioning tests)
|
||||||
|
|
||||||
|
- [ ] **Step 1: Write the failing tests**
|
||||||
|
|
||||||
|
Append to `tests/test_oauth.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from hbd.server import users as users_mod
|
||||||
|
from hbd.server.users import User
|
||||||
|
|
||||||
|
|
||||||
|
def _reset_users(entries=None):
|
||||||
|
users_mod.users = entries or {}
|
||||||
|
|
||||||
|
|
||||||
|
def test_provision_oauth_user_new():
|
||||||
|
_reset_users()
|
||||||
|
user = users_mod.provision_oauth_user("gituser", "Git User", "https://example.com/avatar.png")
|
||||||
|
assert user.username == "gituser"
|
||||||
|
assert user.full_name == "Git User"
|
||||||
|
assert user.avatar == "https://example.com/avatar.png"
|
||||||
|
assert user.admin is False
|
||||||
|
assert user.password_hash == ""
|
||||||
|
assert "gituser" in users_mod.users
|
||||||
|
|
||||||
|
|
||||||
|
def test_provision_oauth_user_no_password_login():
|
||||||
|
_reset_users()
|
||||||
|
user = users_mod.provision_oauth_user("gituser", "Git User", "")
|
||||||
|
assert user.check_password("anything") is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_provision_oauth_user_existing_updates_profile():
|
||||||
|
existing = User(
|
||||||
|
username="alice",
|
||||||
|
full_name="Old Name",
|
||||||
|
avatar="old.png",
|
||||||
|
password_hash="pbkdf2:sha256:1:salt:abc",
|
||||||
|
admin=True,
|
||||||
|
notification_channels=["chan1"],
|
||||||
|
)
|
||||||
|
_reset_users({"alice": existing})
|
||||||
|
user = users_mod.provision_oauth_user("alice", "New Name", "new.png")
|
||||||
|
assert user.full_name == "New Name"
|
||||||
|
assert user.avatar == "new.png"
|
||||||
|
# Preserved
|
||||||
|
assert user.admin is True
|
||||||
|
assert user.password_hash == "pbkdf2:sha256:1:salt:abc"
|
||||||
|
assert user.notification_channels == ["chan1"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_provision_oauth_user_does_not_overwrite_with_empty():
|
||||||
|
existing = User(username="bob", full_name="Bob", avatar="bob.png")
|
||||||
|
_reset_users({"bob": existing})
|
||||||
|
user = users_mod.provision_oauth_user("bob", "", "")
|
||||||
|
assert user.full_name == "Bob"
|
||||||
|
assert user.avatar == "bob.png"
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Run to confirm failure**
|
||||||
|
|
||||||
|
```
|
||||||
|
pytest tests/test_oauth.py -v -k "provision"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `AttributeError: module 'hbd.server.users' has no attribute 'provision_oauth_user'`
|
||||||
|
|
||||||
|
- [ ] **Step 3: Implement `provision_oauth_user`**
|
||||||
|
|
||||||
|
Add to `hbd/server/users.py` after the `authenticate()` function (after line 187):
|
||||||
|
|
||||||
|
```python
|
||||||
|
def provision_oauth_user(username: str, full_name: str, avatar: str) -> "User":
|
||||||
|
"""Create or update a user sourced from an OAuth2 provider.
|
||||||
|
|
||||||
|
New users are inserted with no password_hash — they can only authenticate
|
||||||
|
via OAuth. Existing users (e.g. defined in config with a password) have
|
||||||
|
their display name and avatar refreshed; all other attributes are preserved.
|
||||||
|
"""
|
||||||
|
user = users.get(username)
|
||||||
|
if user is None:
|
||||||
|
user = User(username=username, full_name=full_name, avatar=avatar)
|
||||||
|
users[username] = user
|
||||||
|
logger.info("Provisioned OAuth user %r", username)
|
||||||
|
else:
|
||||||
|
if full_name:
|
||||||
|
user.full_name = full_name
|
||||||
|
if avatar:
|
||||||
|
user.avatar = avatar
|
||||||
|
return user
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4: Run to confirm tests pass**
|
||||||
|
|
||||||
|
```
|
||||||
|
pytest tests/test_oauth.py -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: 12 passed
|
||||||
|
|
||||||
|
- [ ] **Step 5: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add hbd/server/users.py tests/test_oauth.py
|
||||||
|
git commit -m "feat: add provision_oauth_user() to users module"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 4: URL builder, token exchange, and user-info fetch
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `hbd/server/oauth.py` (add `authorization_url`, `exchange_code`, `fetch_user`)
|
||||||
|
- Modify: `tests/test_oauth.py` (add async tests with mocked HTTP)
|
||||||
|
|
||||||
|
- [ ] **Step 1: Write the failing tests**
|
||||||
|
|
||||||
|
Append to `tests/test_oauth.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import pytest
|
||||||
|
from unittest.mock import AsyncMock, MagicMock, patch
|
||||||
|
from urllib.parse import urlparse, parse_qs
|
||||||
|
|
||||||
|
|
||||||
|
def test_authorization_url_shape():
|
||||||
|
state = "teststate"
|
||||||
|
redirect_uri = "https://hbd.example.com/login/oauth/gitea/callback"
|
||||||
|
url = oauth.authorization_url(CFG_ON, state, redirect_uri)
|
||||||
|
parsed = urlparse(url)
|
||||||
|
qs = parse_qs(parsed.query)
|
||||||
|
assert parsed.scheme == "https"
|
||||||
|
assert parsed.netloc == "git.example.com"
|
||||||
|
assert parsed.path == "/login/oauth/authorize"
|
||||||
|
assert qs["client_id"] == ["cid"]
|
||||||
|
assert qs["state"] == ["teststate"]
|
||||||
|
assert qs["redirect_uri"] == [redirect_uri]
|
||||||
|
assert qs["scope"] == ["user:email"]
|
||||||
|
assert qs["response_type"] == ["code"]
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_exchange_code_returns_token():
|
||||||
|
redirect_uri = "https://hbd.example.com/login/oauth/gitea/callback"
|
||||||
|
mock_response = AsyncMock()
|
||||||
|
mock_response.status = 200
|
||||||
|
mock_response.json = AsyncMock(return_value={"access_token": "tok123"})
|
||||||
|
|
||||||
|
mock_session = MagicMock()
|
||||||
|
mock_session.post = MagicMock(return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_response),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
))
|
||||||
|
|
||||||
|
with patch("hbd.server.oauth.aiohttp.ClientSession", return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_session),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
)):
|
||||||
|
token = await oauth.exchange_code(CFG_ON, "mycode", redirect_uri)
|
||||||
|
assert token == "tok123"
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_exchange_code_raises_on_error_status():
|
||||||
|
redirect_uri = "https://hbd.example.com/login/oauth/gitea/callback"
|
||||||
|
mock_response = AsyncMock()
|
||||||
|
mock_response.status = 401
|
||||||
|
mock_response.text = AsyncMock(return_value="unauthorized")
|
||||||
|
|
||||||
|
mock_session = MagicMock()
|
||||||
|
mock_session.post = MagicMock(return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_response),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
))
|
||||||
|
|
||||||
|
with patch("hbd.server.oauth.aiohttp.ClientSession", return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_session),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
)):
|
||||||
|
with pytest.raises(oauth.OAuthError):
|
||||||
|
await oauth.exchange_code(CFG_ON, "badcode", redirect_uri)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_fetch_user_returns_profile():
|
||||||
|
mock_response = AsyncMock()
|
||||||
|
mock_response.status = 200
|
||||||
|
mock_response.json = AsyncMock(return_value={
|
||||||
|
"login": "alice",
|
||||||
|
"full_name": "Alice Smith",
|
||||||
|
"avatar_url": "https://git.example.com/avatars/alice.png",
|
||||||
|
})
|
||||||
|
|
||||||
|
mock_session = MagicMock()
|
||||||
|
mock_session.get = MagicMock(return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_response),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
))
|
||||||
|
|
||||||
|
with patch("hbd.server.oauth.aiohttp.ClientSession", return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_session),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
)):
|
||||||
|
profile = await oauth.fetch_user(CFG_ON, "tok123")
|
||||||
|
assert profile == {
|
||||||
|
"login": "alice",
|
||||||
|
"full_name": "Alice Smith",
|
||||||
|
"avatar_url": "https://git.example.com/avatars/alice.png",
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Run to confirm failure**
|
||||||
|
|
||||||
|
```
|
||||||
|
pytest tests/test_oauth.py -v -k "url or exchange or fetch"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: `AttributeError: module 'hbd.server.oauth' has no attribute 'authorization_url'`
|
||||||
|
|
||||||
|
- [ ] **Step 3: Implement the three functions**
|
||||||
|
|
||||||
|
Add to `hbd/server/oauth.py`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import urllib.parse
|
||||||
|
|
||||||
|
|
||||||
|
def authorization_url(config: dict, state: str, redirect_uri: str) -> str:
|
||||||
|
"""Return the Gitea OAuth2 authorization URL to redirect the browser to."""
|
||||||
|
g = _gitea_cfg(config)
|
||||||
|
params = urllib.parse.urlencode({
|
||||||
|
"client_id": g["client_id"],
|
||||||
|
"redirect_uri": redirect_uri,
|
||||||
|
"response_type": "code",
|
||||||
|
"scope": "user:email",
|
||||||
|
"state": state,
|
||||||
|
})
|
||||||
|
return f"{g['url'].rstrip('/')}/login/oauth/authorize?{params}"
|
||||||
|
|
||||||
|
|
||||||
|
async def exchange_code(config: dict, code: str, redirect_uri: str) -> str:
|
||||||
|
"""Exchange an authorization *code* for a Gitea access token.
|
||||||
|
|
||||||
|
Returns the access token string. Raises OAuthError on any failure.
|
||||||
|
"""
|
||||||
|
g = _gitea_cfg(config)
|
||||||
|
url = f"{g['url'].rstrip('/')}/login/oauth/access_token"
|
||||||
|
payload = {
|
||||||
|
"client_id": g["client_id"],
|
||||||
|
"client_secret": g["client_secret"],
|
||||||
|
"code": code,
|
||||||
|
"grant_type": "authorization_code",
|
||||||
|
"redirect_uri": redirect_uri,
|
||||||
|
}
|
||||||
|
timeout = aiohttp.ClientTimeout(total=10)
|
||||||
|
try:
|
||||||
|
async with aiohttp.ClientSession(timeout=timeout) as session:
|
||||||
|
async with session.post(url, json=payload, headers={"Accept": "application/json"}) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
text = await resp.text()
|
||||||
|
raise OAuthError(f"Token exchange failed ({resp.status}): {text}")
|
||||||
|
data = await resp.json()
|
||||||
|
except aiohttp.ClientError as exc:
|
||||||
|
raise OAuthError(f"Token exchange network error: {exc}") from exc
|
||||||
|
token = data.get("access_token")
|
||||||
|
if not token:
|
||||||
|
raise OAuthError(f"No access_token in response: {data}")
|
||||||
|
return token
|
||||||
|
|
||||||
|
|
||||||
|
async def fetch_user(config: dict, token: str) -> dict:
|
||||||
|
"""Fetch the authenticated user's profile from Gitea.
|
||||||
|
|
||||||
|
Returns a dict with keys: login, full_name, avatar_url.
|
||||||
|
Raises OAuthError on any failure.
|
||||||
|
"""
|
||||||
|
g = _gitea_cfg(config)
|
||||||
|
url = f"{g['url'].rstrip('/')}/api/v1/user"
|
||||||
|
timeout = aiohttp.ClientTimeout(total=10)
|
||||||
|
try:
|
||||||
|
async with aiohttp.ClientSession(timeout=timeout) as session:
|
||||||
|
async with session.get(url, headers={"Authorization": f"token {token}"}) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
text = await resp.text()
|
||||||
|
raise OAuthError(f"User fetch failed ({resp.status}): {text}")
|
||||||
|
data = await resp.json()
|
||||||
|
except aiohttp.ClientError as exc:
|
||||||
|
raise OAuthError(f"User fetch network error: {exc}") from exc
|
||||||
|
return {
|
||||||
|
"login": data.get("login", ""),
|
||||||
|
"full_name": data.get("full_name", ""),
|
||||||
|
"avatar_url": data.get("avatar_url", ""),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Also add `import urllib.parse` at the top of `oauth.py` (alongside the existing imports).
|
||||||
|
|
||||||
|
- [ ] **Step 4: Run to confirm tests pass**
|
||||||
|
|
||||||
|
```
|
||||||
|
pytest tests/test_oauth.py -v
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected: 17 passed
|
||||||
|
|
||||||
|
- [ ] **Step 5: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add hbd/server/oauth.py tests/test_oauth.py
|
||||||
|
git commit -m "feat: add authorization_url, exchange_code, fetch_user to oauth module"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 5: HTTP routes — redirect and callback
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `hbd/server/http.py`
|
||||||
|
|
||||||
|
`http.py` defines all handlers inside `async def start(...)`. The two new handlers go in the same block, just before the `app = web.Application()` line (~line 900). The import goes at the top of the file.
|
||||||
|
|
||||||
|
- [ ] **Step 1: Add the import**
|
||||||
|
|
||||||
|
In `hbd/server/http.py`, add after the existing local imports (after `from . import users as users_mod`):
|
||||||
|
|
||||||
|
```python
|
||||||
|
from . import oauth as oauth_mod
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Add the two route handlers**
|
||||||
|
|
||||||
|
In `hbd/server/http.py`, add the two handlers immediately before the `app = web.Application()` line:
|
||||||
|
|
||||||
|
```python
|
||||||
|
async def oauth_gitea_redirect(request):
|
||||||
|
"""GET /login/oauth/gitea — kick off the Gitea OAuth2 flow."""
|
||||||
|
if not oauth_mod.is_enabled(config):
|
||||||
|
return web.Response(status=404, text="OAuth not configured")
|
||||||
|
state = oauth_mod.make_state()
|
||||||
|
redirect_uri = f"{request.url.origin()}/login/oauth/gitea/callback"
|
||||||
|
raise web.HTTPFound(oauth_mod.authorization_url(config, state, redirect_uri))
|
||||||
|
|
||||||
|
async def oauth_gitea_callback(request):
|
||||||
|
"""GET /login/oauth/gitea/callback — handle Gitea's redirect back."""
|
||||||
|
if not oauth_mod.is_enabled(config):
|
||||||
|
return web.Response(status=404, text="OAuth not configured")
|
||||||
|
code = request.rel_url.query.get("code", "")
|
||||||
|
state = request.rel_url.query.get("state", "")
|
||||||
|
if not code or not state:
|
||||||
|
return web.Response(status=400, text="Missing code or state")
|
||||||
|
if not oauth_mod.validate_state(state):
|
||||||
|
raise web.HTTPFound("/login?error=1")
|
||||||
|
redirect_uri = f"{request.url.origin()}/login/oauth/gitea/callback"
|
||||||
|
try:
|
||||||
|
token = await oauth_mod.exchange_code(config, code, redirect_uri)
|
||||||
|
profile = await oauth_mod.fetch_user(config, token)
|
||||||
|
except oauth_mod.OAuthError as exc:
|
||||||
|
logger.warning("OAuth error: %s", exc)
|
||||||
|
raise web.HTTPFound("/login?error=1")
|
||||||
|
user = users_mod.provision_oauth_user(
|
||||||
|
profile["login"],
|
||||||
|
profile["full_name"],
|
||||||
|
profile["avatar_url"],
|
||||||
|
)
|
||||||
|
session_token = users_mod.create_session(user.username)
|
||||||
|
resp = web.HTTPFound("/")
|
||||||
|
resp.set_cookie(
|
||||||
|
SESSION_COOKIE,
|
||||||
|
session_token,
|
||||||
|
max_age=users_mod.SESSION_TTL,
|
||||||
|
httponly=True,
|
||||||
|
samesite="Lax",
|
||||||
|
)
|
||||||
|
raise resp
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 3: Register the routes**
|
||||||
|
|
||||||
|
In `hbd/server/http.py`, add to the route list after the existing auth routes (after `web.post("/api/0/auth/logout", api_logout)`):
|
||||||
|
|
||||||
|
```python
|
||||||
|
web.get("/login/oauth/gitea", oauth_gitea_redirect),
|
||||||
|
web.get("/login/oauth/gitea/callback", oauth_gitea_callback),
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 4: Manual smoke test**
|
||||||
|
|
||||||
|
Start the server locally with OAuth configured in `~/.hb.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
oauth:
|
||||||
|
gitea:
|
||||||
|
url: https://your-gitea-instance.example.com
|
||||||
|
client_id: your-client-id
|
||||||
|
client_secret: your-client-secret
|
||||||
|
```
|
||||||
|
|
||||||
|
Visit `http://localhost:50004/login/oauth/gitea` — confirm you are redirected to Gitea's authorization page.
|
||||||
|
|
||||||
|
- [ ] **Step 5: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add hbd/server/http.py
|
||||||
|
git commit -m "feat: add Gitea OAuth2 redirect and callback routes"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Task 6: Login page — "Sign in with Gitea" button
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- Modify: `hbd/server/http.py` (update `login_page` handler, ~line 625)
|
||||||
|
|
||||||
|
- [ ] **Step 1: Replace the login page HTML**
|
||||||
|
|
||||||
|
In `hbd/server/http.py`, find the `html = f"""` block inside `login_page` and replace it with:
|
||||||
|
|
||||||
|
```python
|
||||||
|
gitea_button = ""
|
||||||
|
if oauth_mod.is_enabled(config):
|
||||||
|
gitea_url = _gitea_cfg_url(config)
|
||||||
|
gitea_button = f"""
|
||||||
|
<div class="divider">or</div>
|
||||||
|
<a href="/login/oauth/gitea" class="gitea-btn">
|
||||||
|
Sign in with Gitea
|
||||||
|
</a>"""
|
||||||
|
|
||||||
|
html = f"""<!DOCTYPE html>
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8">
|
||||||
|
<title>Heartbeat — Login</title>
|
||||||
|
<style>
|
||||||
|
body {{ font-family: sans-serif; background: #f5f5f5; display: flex;
|
||||||
|
justify-content: center; align-items: center; height: 100vh; margin: 0; }}
|
||||||
|
.box {{ background: #fff; padding: 2em 2.5em; border-radius: 8px;
|
||||||
|
box-shadow: 0 2px 12px rgba(0,0,0,.15); min-width: 300px; }}
|
||||||
|
h2 {{ margin: 0 0 1.2em; color: #333; font-size: 1.4em; }}
|
||||||
|
label {{ display: block; margin-bottom: .3em; font-size: .9em; color: #555; }}
|
||||||
|
input {{ width: 100%; padding: .5em .7em; border: 1px solid #ccc;
|
||||||
|
border-radius: 4px; font-size: 1em; box-sizing: border-box; }}
|
||||||
|
button {{ margin-top: 1.2em; width: 100%; padding: .6em; background: #0066cc;
|
||||||
|
color: #fff; border: none; border-radius: 4px; font-size: 1em; cursor: pointer; }}
|
||||||
|
button:hover {{ background: #0055aa; }}
|
||||||
|
.error {{ color: #c00; font-size: .9em; margin-bottom: .8em; }}
|
||||||
|
.field {{ margin-bottom: .9em; }}
|
||||||
|
.divider {{ text-align: center; margin: 1.2em 0 .8em; color: #999;
|
||||||
|
font-size: .85em; border-top: 1px solid #eee; padding-top: .8em; }}
|
||||||
|
.gitea-btn {{ display: block; width: 100%; padding: .6em; background: #609926;
|
||||||
|
color: #fff; border-radius: 4px; font-size: 1em; text-align: center;
|
||||||
|
text-decoration: none; box-sizing: border-box; }}
|
||||||
|
.gitea-btn:hover {{ background: #4e7d1e; }}
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div class="box">
|
||||||
|
<h2>Heartbeat</h2>
|
||||||
|
{'<p class="error">Invalid username, password, or OAuth error.</p>' if error else ''}
|
||||||
|
<form method="post">
|
||||||
|
<div class="field"><label>Username</label><input name="username" autofocus></div>
|
||||||
|
<div class="field"><label>Password</label><input name="password" type="password"></div>
|
||||||
|
<button type="submit">Sign in</button>
|
||||||
|
</form>{gitea_button}
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>"""
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 2: Add the `_gitea_cfg_url` helper**
|
||||||
|
|
||||||
|
Add this small helper in `hbd/server/http.py` just before the `login_page` handler (around line 600) so the template can read the Gitea display URL without importing internal oauth details:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def _gitea_cfg_url(config: dict) -> str:
|
||||||
|
return config.get("oauth", {}).get("gitea", {}).get("url", "")
|
||||||
|
```
|
||||||
|
|
||||||
|
Also update the `login_page` handler's `error` logic to show the error when the `?error=1` query param is present (set by the callback on OAuth failure):
|
||||||
|
|
||||||
|
```python
|
||||||
|
async def login_page(request):
|
||||||
|
"""GET /login — show login form; POST /login — process and redirect."""
|
||||||
|
if not users_mod.users_enabled():
|
||||||
|
raise web.HTTPFound("/")
|
||||||
|
|
||||||
|
error = ""
|
||||||
|
if request.method == "POST":
|
||||||
|
form = await request.post()
|
||||||
|
username = form.get("username", "")
|
||||||
|
password = form.get("password", "")
|
||||||
|
user = users_mod.authenticate(username, password)
|
||||||
|
if user:
|
||||||
|
token = users_mod.create_session(username)
|
||||||
|
redirect_to = request.rel_url.query.get("next", "/")
|
||||||
|
resp = web.HTTPFound(redirect_to)
|
||||||
|
resp.set_cookie(
|
||||||
|
SESSION_COOKIE,
|
||||||
|
token,
|
||||||
|
max_age=users_mod.SESSION_TTL,
|
||||||
|
httponly=True,
|
||||||
|
samesite="Lax",
|
||||||
|
)
|
||||||
|
raise resp
|
||||||
|
error = "Invalid username or password."
|
||||||
|
elif request.rel_url.query.get("error"):
|
||||||
|
error = "Sign-in failed. Please try again."
|
||||||
|
```
|
||||||
|
|
||||||
|
- [ ] **Step 3: Manual verification**
|
||||||
|
|
||||||
|
Start the server with OAuth configured. Visit `/login`. Confirm:
|
||||||
|
- The "Sign in with Gitea" button appears (green, below a divider)
|
||||||
|
- Clicking it redirects to Gitea
|
||||||
|
- After authorising on Gitea, you are redirected back and land on `/` with a valid session cookie
|
||||||
|
|
||||||
|
Without OAuth configured, confirm the button does not appear.
|
||||||
|
|
||||||
|
- [ ] **Step 4: Commit**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add hbd/server/http.py
|
||||||
|
git commit -m "feat: add Sign in with Gitea button to login page"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Self-Review Notes
|
||||||
|
|
||||||
|
- All 5 spec requirements covered: coexist ✓, auto-provision ✓, regular user ✓, any Gitea user ✓, config-driven ✓
|
||||||
|
- `exchange_code` signature in Task 4 matches usage in Task 5 (`config, code, redirect_uri`) ✓
|
||||||
|
- `fetch_user` returns `{login, full_name, avatar_url}` — matched in callback handler ✓
|
||||||
|
- `validate_state` removes state on use (replay protection) ✓
|
||||||
|
- `provision_oauth_user` skips empty strings so existing avatar/name aren't erased ✓
|
||||||
|
- `_gitea_cfg_url` is a plain `def`, not `async` — safe to call in template prep ✓
|
||||||
@@ -0,0 +1,184 @@
|
|||||||
|
# Gitea OAuth2 Authentication — Design Spec
|
||||||
|
|
||||||
|
Date: 2026-05-08
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Add Gitea as an OAuth2 login provider alongside the existing username/password
|
||||||
|
authentication. Any user on the configured Gitea instance can sign in; their
|
||||||
|
local account is auto-provisioned on first login as a regular (non-admin) user.
|
||||||
|
Password login continues to work unchanged.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Config
|
||||||
|
|
||||||
|
A new optional `oauth.gitea` block in `~/.hb.yaml`. OAuth is disabled when the
|
||||||
|
block is absent or any of the three required keys is missing.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
oauth:
|
||||||
|
gitea:
|
||||||
|
url: https://git.example.com # Gitea base URL, no trailing slash
|
||||||
|
client_id: <gitea-app-client-id>
|
||||||
|
client_secret: <gitea-app-client-secret>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gitea setup:** Create an OAuth2 application in Gitea under
|
||||||
|
*Settings → Applications → OAuth2*. Set the redirect URI to
|
||||||
|
`https://<hbd-host>/login/oauth/gitea/callback`.
|
||||||
|
|
||||||
|
`config.py` default:
|
||||||
|
|
||||||
|
```python
|
||||||
|
"oauth": {},
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## New module: `hbd/server/oauth.py`
|
||||||
|
|
||||||
|
Owns all OAuth2 logic. No new dependencies — uses `aiohttp.ClientSession`
|
||||||
|
already present in the codebase.
|
||||||
|
|
||||||
|
### CSRF state store
|
||||||
|
|
||||||
|
```python
|
||||||
|
# state -> expires (float)
|
||||||
|
_states: dict[str, float] = {}
|
||||||
|
STATE_TTL = 600 # 10 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
`_states` is an in-memory dict. Entries are created on redirect and deleted on
|
||||||
|
use or expiry. A purge runs on every new state generation.
|
||||||
|
|
||||||
|
### Public API
|
||||||
|
|
||||||
|
| Function | Description |
|
||||||
|
|---|---|
|
||||||
|
| `is_enabled(config)` | Returns `True` when url, client_id, and client_secret are all set |
|
||||||
|
| `make_state()` | Generates a random state token, stores it with TTL, returns it |
|
||||||
|
| `validate_state(state)` | Returns `True` and removes the state if valid and unexpired |
|
||||||
|
| `authorization_url(config, state, redirect_uri)` | Builds the Gitea `/login/oauth/authorize` redirect URL with `client_id`, `redirect_uri`, `scope=user:email`, `state` |
|
||||||
|
| `exchange_code(config, code, redirect_uri)` async | POSTs to Gitea `/login/oauth/access_token` with code and redirect_uri, returns the access token string or raises `OAuthError` |
|
||||||
|
| `fetch_user(config, token)` async | GETs Gitea `/api/v1/user` with Bearer token, returns `{"login", "full_name", "avatar_url"}` or raises `OAuthError` |
|
||||||
|
|
||||||
|
### Error handling
|
||||||
|
|
||||||
|
`OAuthError(message)` is a module-level exception. The callback route catches it
|
||||||
|
and renders the login page with an error message — identical to an invalid
|
||||||
|
password error in UX terms.
|
||||||
|
|
||||||
|
Network timeouts use a 10-second `aiohttp` timeout. Any non-2xx response from
|
||||||
|
Gitea raises `OAuthError`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Change: `hbd/server/users.py`
|
||||||
|
|
||||||
|
One new function added to the public API:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def provision_oauth_user(username: str, full_name: str, avatar: str) -> User:
|
||||||
|
```
|
||||||
|
|
||||||
|
- If the username does not exist in the live `users` dict, creates a `User`
|
||||||
|
with no `password_hash` (so password login is impossible for this account)
|
||||||
|
and inserts it.
|
||||||
|
- If the username already exists (e.g. was defined in config with a password),
|
||||||
|
updates `full_name` and `avatar` from the OAuth profile and returns the
|
||||||
|
existing user unchanged in all other respects (preserving admin flag,
|
||||||
|
notification channels, etc.).
|
||||||
|
- Logs a one-line INFO message on first provision.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changes: `hbd/server/http.py`
|
||||||
|
|
||||||
|
### Two new route handlers
|
||||||
|
|
||||||
|
**`GET /login/oauth/gitea`**
|
||||||
|
|
||||||
|
1. Checks `oauth.is_enabled(config)` — returns 404 if not.
|
||||||
|
2. Calls `oauth.make_state()`.
|
||||||
|
3. Constructs `redirect_uri` as `{request.url.origin()}/login/oauth/gitea/callback` using aiohttp's `request.url.origin()`.
|
||||||
|
4. Redirects the browser to `oauth.authorization_url(config, state, redirect_uri)`.
|
||||||
|
|
||||||
|
**`GET /login/oauth/gitea/callback`**
|
||||||
|
|
||||||
|
1. Reads `code` and `state` query params; returns 400 if either is missing.
|
||||||
|
2. Calls `oauth.validate_state(state)` — redirects to `/login` with error if
|
||||||
|
invalid (CSRF or replay protection).
|
||||||
|
3. Reconstructs the same `redirect_uri` as the redirect handler (required by OAuth2 spec for token exchange).
|
||||||
|
4. Calls `await oauth.exchange_code(config, code, redirect_uri)` to get the access token.
|
||||||
|
4. Calls `await oauth.fetch_user(config, token)` to get the Gitea user profile.
|
||||||
|
5. Calls `users_mod.provision_oauth_user(login, full_name, avatar_url)`.
|
||||||
|
6. Calls `users_mod.create_session(username)` to get a session token.
|
||||||
|
7. Sets `hbd_session` cookie (same flags as password login: httponly, Lax,
|
||||||
|
24h TTL).
|
||||||
|
8. Redirects to `/`.
|
||||||
|
9. Any `OAuthError` re-renders the login page with a generic error message.
|
||||||
|
|
||||||
|
### Login page change
|
||||||
|
|
||||||
|
When `oauth.is_enabled(config)` is `True`, the existing login form gains a
|
||||||
|
separator and a "Sign in with Gitea" link button pointing to
|
||||||
|
`/login/oauth/gitea`. The password form is always rendered regardless.
|
||||||
|
|
||||||
|
### Route registration
|
||||||
|
|
||||||
|
```python
|
||||||
|
web.get("/login/oauth/gitea", oauth_redirect),
|
||||||
|
web.get("/login/oauth/gitea/callback", oauth_callback),
|
||||||
|
```
|
||||||
|
|
||||||
|
Added alongside the existing `/login` and `/logout` routes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data flow
|
||||||
|
|
||||||
|
```
|
||||||
|
Browser hbd Gitea
|
||||||
|
| | |
|
||||||
|
|-- GET /login ----------->| |
|
||||||
|
|<- login page (+ button) -| |
|
||||||
|
| | |
|
||||||
|
|-- GET /login/oauth/gitea>| |
|
||||||
|
|<- 302 Gitea /authorize --| |
|
||||||
|
| | |
|
||||||
|
|-- GET /login/oauth/authorize ----------------------->|
|
||||||
|
|<- 302 /login/oauth/gitea/callback?code=..&state=.. --|
|
||||||
|
| | |
|
||||||
|
|-- GET /callback -------->| |
|
||||||
|
| |-- POST /access_token ---->|
|
||||||
|
| |<- {access_token} ---------|
|
||||||
|
| |-- GET /api/v1/user ------>|
|
||||||
|
| |<- {login, name, avatar} --|
|
||||||
|
| | provision_oauth_user() |
|
||||||
|
| | create_session() |
|
||||||
|
|<- 302 / (set cookie) ----| |
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
- `test_oauth_state`: `make_state` + `validate_state` happy path; expired state
|
||||||
|
returns False; replay (double-use) returns False.
|
||||||
|
- `test_provision_oauth_user_new`: new username creates User with no password.
|
||||||
|
- `test_provision_oauth_user_existing`: existing config user updates name/avatar,
|
||||||
|
preserves admin flag and notification_channels.
|
||||||
|
- `test_oauth_callback_invalid_state`: callback with bad state redirects to login.
|
||||||
|
- Integration: mock Gitea endpoints with `aiohttp_client` fixture; full
|
||||||
|
redirect → callback → session cookie flow.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Out of scope
|
||||||
|
|
||||||
|
- Restricting login to specific Gitea organisations or teams.
|
||||||
|
- Making OAuth users admin automatically.
|
||||||
|
- Multiple OAuth providers.
|
||||||
|
- Token refresh (Gitea access tokens are long-lived; the hbd session TTL governs
|
||||||
|
re-authentication).
|
||||||
+1
-1
@@ -14,4 +14,4 @@ Install options:
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
__all__ = ["__version__"]
|
__all__ = ["__version__"]
|
||||||
__version__ = "5.1.4"
|
__version__ = "5.2.5"
|
||||||
|
|||||||
@@ -15,12 +15,15 @@ CLIENT_DEFAULTS = {
|
|||||||
# Network settings
|
# Network settings
|
||||||
"hb_port": 50003, # Port where hbd servers listen
|
"hb_port": 50003, # Port where hbd servers listen
|
||||||
"interval": 10, # Heartbeat interval in seconds
|
"interval": 10, # Heartbeat interval in seconds
|
||||||
|
|
||||||
|
# Host identity
|
||||||
|
"owner": None, # Optional username to set as this host's owner on the server
|
||||||
|
|
||||||
# Runtime flags
|
# Runtime flags
|
||||||
"foreground": False,
|
"foreground": False,
|
||||||
"verbose": False,
|
"verbose": False,
|
||||||
"debug": 0,
|
"debug": 0,
|
||||||
|
|
||||||
# Plugin configuration
|
# Plugin configuration
|
||||||
"plugins": {}, # Per-plugin configuration
|
"plugins": {}, # Per-plugin configuration
|
||||||
"thresholds": {}, # Threshold configuration for monitoring
|
"thresholds": {}, # Threshold configuration for monitoring
|
||||||
|
|||||||
+163
-97
@@ -14,7 +14,6 @@ import signal
|
|||||||
import socket
|
import socket
|
||||||
import sys
|
import sys
|
||||||
import time
|
import time
|
||||||
from hashlib import md5
|
|
||||||
from logging.handlers import SysLogHandler
|
from logging.handlers import SysLogHandler
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Dict, List, Optional
|
from typing import Dict, List, Optional
|
||||||
@@ -22,6 +21,7 @@ from typing import Dict, List, Optional
|
|||||||
# Import protocol and config
|
# Import protocol and config
|
||||||
from .config import load_config
|
from .config import load_config
|
||||||
from ..common.proto import dicttos, stodict
|
from ..common.proto import dicttos, stodict
|
||||||
|
from .. import __version__
|
||||||
|
|
||||||
# Import plugin system
|
# Import plugin system
|
||||||
from .plugin import PluginRegistry, PluginLoader, InfoPlugin, MonitorPlugin
|
from .plugin import PluginRegistry, PluginLoader, InfoPlugin, MonitorPlugin
|
||||||
@@ -56,23 +56,28 @@ class AsyncConnection:
|
|||||||
|
|
||||||
self.transport: Optional[asyncio.DatagramTransport] = None
|
self.transport: Optional[asyncio.DatagramTransport] = None
|
||||||
self.protocol: Optional[asyncio.DatagramProtocol] = None
|
self.protocol: Optional[asyncio.DatagramProtocol] = None
|
||||||
|
self._dead = False
|
||||||
|
self._ever_opened = False
|
||||||
|
self._open_fail_count = 0 # consecutive failures before first success
|
||||||
|
self.request_info_event: asyncio.Event = asyncio.Event()
|
||||||
|
|
||||||
self.logger = logging.getLogger(f"hbc.conn.{addr}")
|
self.logger = logging.getLogger(f"hbc.conn.{addr}")
|
||||||
|
|
||||||
async def open(self) -> bool:
|
async def open(self) -> bool:
|
||||||
"""Open the UDP connection.
|
"""Open the UDP connection.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
True if successful, False otherwise
|
True if successful, False otherwise
|
||||||
"""
|
"""
|
||||||
try:
|
try:
|
||||||
loop = asyncio.get_event_loop()
|
loop = asyncio.get_event_loop()
|
||||||
|
|
||||||
# Create datagram endpoint
|
# Create datagram endpoint
|
||||||
self.transport, self.protocol = await loop.create_datagram_endpoint(
|
self.transport, self.protocol = await loop.create_datagram_endpoint(
|
||||||
lambda: HeartbeatProtocol(self),
|
lambda: HeartbeatProtocol(self),
|
||||||
family=self.af
|
family=self.af
|
||||||
)
|
)
|
||||||
|
self._ever_opened = True
|
||||||
self.logger.debug(f"Opened connection to {self.addr}:{self.port}")
|
self.logger.debug(f"Opened connection to {self.addr}:{self.port}")
|
||||||
return True
|
return True
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
@@ -93,9 +98,12 @@ class AsyncConnection:
|
|||||||
msg: Message dictionary
|
msg: Message dictionary
|
||||||
msg_id: Message ID (HTB, PLG, etc.)
|
msg_id: Message ID (HTB, PLG, etc.)
|
||||||
"""
|
"""
|
||||||
|
if self._dead:
|
||||||
|
return
|
||||||
|
|
||||||
if not self.transport:
|
if not self.transport:
|
||||||
await self.open()
|
await self.open()
|
||||||
|
|
||||||
if not self.transport:
|
if not self.transport:
|
||||||
self.logger.error("Cannot send - no transport")
|
self.logger.error("Cannot send - no transport")
|
||||||
return
|
return
|
||||||
@@ -131,6 +139,9 @@ class AsyncConnection:
|
|||||||
|
|
||||||
self.ackcount += 1
|
self.ackcount += 1
|
||||||
self.logger.debug(f"ACK received, RTT: {rtt:.1f}ms")
|
self.logger.debug(f"ACK received, RTT: {rtt:.1f}ms")
|
||||||
|
if msg.get("request_update"):
|
||||||
|
self.logger.info("server requested plugin info refresh")
|
||||||
|
self.request_info_event.set()
|
||||||
|
|
||||||
|
|
||||||
class HeartbeatProtocol(asyncio.DatagramProtocol):
|
class HeartbeatProtocol(asyncio.DatagramProtocol):
|
||||||
@@ -166,8 +177,9 @@ class HeartbeatProtocol(asyncio.DatagramProtocol):
|
|||||||
self.logger.error(f"Error processing datagram: {e}", exc_info=True)
|
self.logger.error(f"Error processing datagram: {e}", exc_info=True)
|
||||||
|
|
||||||
def error_received(self, exc):
|
def error_received(self, exc):
|
||||||
"""Handle protocol errors."""
|
"""Handle protocol errors — close transport so the heartbeat sender retries."""
|
||||||
self.logger.error(f"Protocol error: {exc}")
|
self.logger.warning(f"Protocol error on {self.connection.addr}: {exc} — will retry")
|
||||||
|
self.connection.close()
|
||||||
|
|
||||||
|
|
||||||
async def handle_command(conn: AsyncConnection, msg: dict):
|
async def handle_command(conn: AsyncConnection, msg: dict):
|
||||||
@@ -204,55 +216,52 @@ async def handle_command(conn: AsyncConnection, msg: dict):
|
|||||||
await conn.sendto(response)
|
await conn.sendto(response)
|
||||||
|
|
||||||
|
|
||||||
async def handle_update(conn: AsyncConnection, msg: dict):
|
async def handle_update(conn: AsyncConnection, _msg: dict): # pyright: ignore[reportUnusedParameter]
|
||||||
"""Handle self-update from server."""
|
"""Handle self-update by running hb_install.sh."""
|
||||||
import codecs
|
|
||||||
import shutil
|
import shutil
|
||||||
|
|
||||||
logger = logging.getLogger("hbc.update")
|
logger = logging.getLogger("hbc.update")
|
||||||
|
|
||||||
|
installer = shutil.which("hb_install.sh")
|
||||||
|
if installer is None:
|
||||||
|
candidate = Path(sys.argv[0]).parent / "hb_install.sh"
|
||||||
|
if candidate.exists():
|
||||||
|
installer = str(candidate)
|
||||||
|
|
||||||
|
if installer is None:
|
||||||
|
error = "hb_install.sh not found in PATH or alongside hbc"
|
||||||
|
logger.error(error)
|
||||||
|
await conn.sendto({"service": "update", "msg": error})
|
||||||
|
return
|
||||||
|
|
||||||
|
logger.info(f"Running installer: {installer}")
|
||||||
try:
|
try:
|
||||||
code = codecs.decode(msg["code"], "base64").decode()
|
proc = await asyncio.create_subprocess_exec(
|
||||||
csum = msg["csum"]
|
installer, "client",
|
||||||
|
stdout=asyncio.subprocess.PIPE,
|
||||||
|
stderr=asyncio.subprocess.STDOUT,
|
||||||
|
)
|
||||||
|
out, _ = await asyncio.wait_for(proc.communicate(), timeout=120)
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
error = "Installer timed out"
|
||||||
|
logger.error(error)
|
||||||
|
await conn.sendto({"service": "update", "msg": error})
|
||||||
|
return
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
error = f"Missing code/csum: {e}"
|
error = f"Installer failed: {e}"
|
||||||
logger.error(error)
|
logger.error(error)
|
||||||
await conn.sendto({"service": "update", "msg": error})
|
await conn.sendto({"service": "update", "msg": error})
|
||||||
return
|
return
|
||||||
|
|
||||||
# Verify checksum
|
if proc.returncode != 0:
|
||||||
m = md5()
|
error = f"Installer exited {proc.returncode}: {out.decode().strip()}"
|
||||||
m.update(code.encode())
|
|
||||||
if m.hexdigest() != csum:
|
|
||||||
error = "Checksum mismatch"
|
|
||||||
logger.error(error)
|
logger.error(error)
|
||||||
await conn.sendto({"service": "update", "msg": error})
|
await conn.sendto({"service": "update", "msg": error})
|
||||||
return
|
return
|
||||||
|
|
||||||
# Backup current file
|
|
||||||
fn = sys.argv[0]
|
|
||||||
ofn = f"{fn}.sav"
|
|
||||||
try:
|
|
||||||
shutil.copy2(fn, ofn)
|
|
||||||
except Exception as e:
|
|
||||||
error = f"Backup failed: {e}"
|
|
||||||
logger.error(error)
|
|
||||||
await conn.sendto({"service": "update", "msg": error})
|
|
||||||
return
|
|
||||||
|
|
||||||
# Write new code
|
|
||||||
try:
|
|
||||||
with open(fn, "w") as fh:
|
|
||||||
fh.write(code)
|
|
||||||
except Exception as e:
|
|
||||||
error = f"Write failed: {e}"
|
|
||||||
logger.error(error)
|
|
||||||
await conn.sendto({"service": "update", "msg": error})
|
|
||||||
return
|
|
||||||
|
|
||||||
logger.info("Update successful, restart required")
|
logger.info("Update successful, restart required")
|
||||||
await conn.sendto({"service": "update", "msg": "OK"})
|
await conn.sendto({"service": "update", "msg": "OK"})
|
||||||
|
|
||||||
# Trigger restart
|
# Trigger restart
|
||||||
global dorestart
|
global dorestart
|
||||||
dorestart = True
|
dorestart = True
|
||||||
@@ -260,15 +269,51 @@ async def handle_update(conn: AsyncConnection, msg: dict):
|
|||||||
|
|
||||||
|
|
||||||
async def heartbeat_sender(conn: AsyncConnection, interval: int):
|
async def heartbeat_sender(conn: AsyncConnection, interval: int):
|
||||||
"""Send periodic heartbeats.
|
"""Send periodic heartbeats, retrying the connection if it is not open.
|
||||||
|
|
||||||
|
IPv6 connections that fail to open before their first successful send are
|
||||||
|
dropped after IPV6_EARLY_FAIL_LIMIT attempts so that a network without IPv6
|
||||||
|
does not keep a dead sender alive. IPv4 connections are retried indefinitely.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
conn: Connection to send on
|
conn: Connection to send on
|
||||||
interval: Heartbeat interval in seconds
|
interval: Heartbeat interval in seconds
|
||||||
"""
|
"""
|
||||||
logger = logging.getLogger("hbc.heartbeat")
|
logger = logging.getLogger("hbc.heartbeat")
|
||||||
|
IPV6_EARLY_FAIL_LIMIT = 3
|
||||||
while running:
|
|
||||||
|
while running and not conn._dead:
|
||||||
|
# Ensure transport is open before attempting to send.
|
||||||
|
if not conn.transport:
|
||||||
|
opened = await conn.open()
|
||||||
|
if opened:
|
||||||
|
conn._open_fail_count = 0
|
||||||
|
else:
|
||||||
|
conn._open_fail_count += 1
|
||||||
|
# Drop an IPv6 connection that has never come up within the
|
||||||
|
# first few attempts — it is likely unavailable on this network.
|
||||||
|
if (not conn._ever_opened
|
||||||
|
and conn.af == socket.AF_INET6
|
||||||
|
and conn._open_fail_count >= IPV6_EARLY_FAIL_LIMIT):
|
||||||
|
logger.warning(
|
||||||
|
f"IPv6 connection to {conn.addr} unreachable after "
|
||||||
|
f"{conn._open_fail_count} attempts, disabling"
|
||||||
|
)
|
||||||
|
conn._dead = True
|
||||||
|
break
|
||||||
|
# Retry after the normal interval; IPv4 retries forever.
|
||||||
|
try:
|
||||||
|
if shutdown_event:
|
||||||
|
await asyncio.wait_for(shutdown_event.wait(), timeout=interval)
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
await asyncio.sleep(interval)
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
pass
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
raise
|
||||||
|
continue
|
||||||
|
|
||||||
try:
|
try:
|
||||||
msg = {
|
msg = {
|
||||||
"acks": conn.ackcount,
|
"acks": conn.ackcount,
|
||||||
@@ -276,20 +321,17 @@ async def heartbeat_sender(conn: AsyncConnection, interval: int):
|
|||||||
"interval": interval
|
"interval": interval
|
||||||
}
|
}
|
||||||
await conn.sendto(msg, "HTB")
|
await conn.sendto(msg, "HTB")
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Error sending heartbeat: {e}", exc_info=True)
|
|
||||||
except asyncio.CancelledError:
|
except asyncio.CancelledError:
|
||||||
logger.debug("Heartbeat sender cancelled")
|
logger.debug("Heartbeat sender cancelled")
|
||||||
raise
|
raise
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error sending heartbeat: {e}", exc_info=True)
|
||||||
|
|
||||||
# Wait for next interval or shutdown event
|
# Wait for next interval or shutdown event
|
||||||
try:
|
try:
|
||||||
if shutdown_event:
|
if shutdown_event:
|
||||||
await asyncio.wait_for(
|
await asyncio.wait_for(shutdown_event.wait(), timeout=interval)
|
||||||
shutdown_event.wait(),
|
|
||||||
timeout=interval
|
|
||||||
)
|
|
||||||
break
|
break
|
||||||
else:
|
else:
|
||||||
await asyncio.sleep(interval)
|
await asyncio.sleep(interval)
|
||||||
@@ -300,15 +342,35 @@ async def heartbeat_sender(conn: AsyncConnection, interval: int):
|
|||||||
raise
|
raise
|
||||||
|
|
||||||
|
|
||||||
|
async def _info_plugin_refresh_loop(conn: AsyncConnection, info_plugins: List):
|
||||||
|
"""Wait for server requests to re-send InfoPlugin data."""
|
||||||
|
logger = logging.getLogger("hbc.plugins")
|
||||||
|
while running:
|
||||||
|
await conn.request_info_event.wait()
|
||||||
|
if not running:
|
||||||
|
break
|
||||||
|
conn.request_info_event.clear()
|
||||||
|
logger.info("refreshing InfoPlugins on server request")
|
||||||
|
for plugin in info_plugins:
|
||||||
|
plugin._cache = None
|
||||||
|
try:
|
||||||
|
data = await plugin.collect()
|
||||||
|
if data:
|
||||||
|
await conn.sendto({"plugin": plugin.name, **data}, "PLG")
|
||||||
|
logger.info(f"Resent {plugin.name} data")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f"Error re-collecting {plugin.name}: {e}", exc_info=True)
|
||||||
|
|
||||||
|
|
||||||
async def plugin_collector(conn: AsyncConnection, registry: PluginRegistry):
|
async def plugin_collector(conn: AsyncConnection, registry: PluginRegistry):
|
||||||
"""Collect and send plugin data.
|
"""Collect and send plugin data.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
conn: Connection to send on
|
conn: Connection to send on
|
||||||
registry: Plugin registry
|
registry: Plugin registry
|
||||||
"""
|
"""
|
||||||
logger = logging.getLogger("hbc.plugins")
|
logger = logging.getLogger("hbc.plugins")
|
||||||
|
|
||||||
# Collect InfoPlugins once at startup
|
# Collect InfoPlugins once at startup
|
||||||
info_plugins = registry.get_by_type(InfoPlugin)
|
info_plugins = registry.get_by_type(InfoPlugin)
|
||||||
for plugin in info_plugins:
|
for plugin in info_plugins:
|
||||||
@@ -321,34 +383,31 @@ async def plugin_collector(conn: AsyncConnection, registry: PluginRegistry):
|
|||||||
logger.info(f"Sent {plugin.name} data")
|
logger.info(f"Sent {plugin.name} data")
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Error collecting {plugin.name}: {e}", exc_info=True)
|
logger.error(f"Error collecting {plugin.name}: {e}", exc_info=True)
|
||||||
|
|
||||||
# Schedule MonitorPlugins
|
# Schedule MonitorPlugins
|
||||||
# Group plugins by interval
|
# Group plugins by interval
|
||||||
from collections import defaultdict
|
from collections import defaultdict
|
||||||
by_interval = defaultdict(list)
|
by_interval = defaultdict(list)
|
||||||
|
|
||||||
monitor_plugins = registry.get_by_type(MonitorPlugin)
|
monitor_plugins = registry.get_by_type(MonitorPlugin)
|
||||||
for plugin in monitor_plugins:
|
for plugin in monitor_plugins:
|
||||||
by_interval[plugin.interval].append(plugin)
|
by_interval[plugin.interval].append(plugin)
|
||||||
|
|
||||||
# Create tasks for each interval
|
# Create tasks for each interval; always include the info-refresh watcher
|
||||||
tasks = []
|
tasks = [asyncio.create_task(_info_plugin_refresh_loop(conn, info_plugins))]
|
||||||
for interval, plugins in by_interval.items():
|
for interval, plugins in by_interval.items():
|
||||||
task = asyncio.create_task(
|
tasks.append(asyncio.create_task(
|
||||||
plugin_collector_interval(conn, plugins, interval)
|
plugin_collector_interval(conn, plugins, interval)
|
||||||
)
|
))
|
||||||
tasks.append(task)
|
|
||||||
|
try:
|
||||||
# Wait for all tasks
|
await asyncio.gather(*tasks, return_exceptions=True)
|
||||||
if tasks:
|
except asyncio.CancelledError:
|
||||||
try:
|
logger.debug("Plugin collector cancelled, cancelling sub-tasks")
|
||||||
await asyncio.gather(*tasks, return_exceptions=True)
|
for task in tasks:
|
||||||
except asyncio.CancelledError:
|
if not task.done():
|
||||||
logger.debug("Plugin collector cancelled, cancelling sub-tasks")
|
task.cancel()
|
||||||
for task in tasks:
|
raise
|
||||||
if not task.done():
|
|
||||||
task.cancel()
|
|
||||||
raise
|
|
||||||
|
|
||||||
|
|
||||||
async def plugin_collector_interval(
|
async def plugin_collector_interval(
|
||||||
@@ -425,16 +484,13 @@ async def cleanup(connections: List[AsyncConnection]):
|
|||||||
logger = logging.getLogger("hbc.cleanup")
|
logger = logging.getLogger("hbc.cleanup")
|
||||||
logger.info("Cleaning up connections")
|
logger.info("Cleaning up connections")
|
||||||
|
|
||||||
for conn in connections:
|
target = next((c for c in connections if c.transport), connections[0] if connections else None)
|
||||||
|
if target and send_shutdown:
|
||||||
try:
|
try:
|
||||||
msg = {
|
await target.sendto({"shutdown": 1, "acks": target.ackcount})
|
||||||
"shutdown": 1,
|
|
||||||
"acks": conn.ackcount
|
|
||||||
}
|
|
||||||
await conn.sendto(msg)
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Error sending shutdown: {e}")
|
logger.error(f"Error sending shutdown: {e}")
|
||||||
|
for conn in connections:
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
||||||
# Give messages time to send
|
# Give messages time to send
|
||||||
@@ -443,7 +499,7 @@ async def cleanup(connections: List[AsyncConnection]):
|
|||||||
|
|
||||||
async def async_main(args, config):
|
async def async_main(args, config):
|
||||||
"""Async main function."""
|
"""Async main function."""
|
||||||
global running, shutdown_event, active_tasks
|
global running, shutdown_event, active_tasks, send_shutdown
|
||||||
|
|
||||||
# Create shutdown event
|
# Create shutdown event
|
||||||
shutdown_event = asyncio.Event()
|
shutdown_event = asyncio.Event()
|
||||||
@@ -460,8 +516,7 @@ async def async_main(args, config):
|
|||||||
hb_port = config.get("hb_port", PORT)
|
hb_port = config.get("hb_port", PORT)
|
||||||
interval = config.get("interval", INTERVAL)
|
interval = config.get("interval", INTERVAL)
|
||||||
|
|
||||||
logger.info(f"Starting hbc for {iam} -> {hb_hosts}")
|
logger.info(f"hbc {__version__} on {iam} -> {hb_hosts} port={hb_port}, interval={interval}s")
|
||||||
logger.info(f"Port: {hb_port}, Interval: {interval}s")
|
|
||||||
|
|
||||||
# Create connections
|
# Create connections
|
||||||
connections = []
|
connections = []
|
||||||
@@ -477,30 +532,34 @@ async def async_main(args, config):
|
|||||||
for addr_info in addrs:
|
for addr_info in addrs:
|
||||||
af = addr_info[0]
|
af = addr_info[0]
|
||||||
addr = addr_info[4][0]
|
addr = addr_info[4][0]
|
||||||
|
|
||||||
conn = AsyncConnection(conn_id, addr, hb_port, af, iam)
|
conn = AsyncConnection(conn_id, addr, hb_port, af, iam)
|
||||||
if await conn.open():
|
if not await conn.open():
|
||||||
connections.append(conn)
|
logger.warning(f"Initial open to {addr} failed, heartbeat sender will retry")
|
||||||
conn_id += 1
|
connections.append(conn)
|
||||||
|
conn_id += 1
|
||||||
|
|
||||||
if not connections:
|
if not connections:
|
||||||
logger.error("No connections established")
|
logger.error("No connections established (DNS resolution failed for all hosts)")
|
||||||
return 1
|
return 1
|
||||||
|
|
||||||
logger.info(f"Created {len(connections)} connections")
|
logger.info(f"Created {len(connections)} connections")
|
||||||
|
|
||||||
# Send boot/message if requested
|
# Send boot/message if requested
|
||||||
|
send_shutdown = False
|
||||||
if args.boot or args.message:
|
if args.boot or args.message:
|
||||||
boot_msg = {}
|
boot_msg = {}
|
||||||
if args.boot:
|
if args.boot:
|
||||||
boot_msg["boot"] = 1
|
boot_msg["boot"] = 1
|
||||||
|
args.boot = False # Clear boot flag so we don't send it again in main loop
|
||||||
|
send_shutdown = True
|
||||||
if args.message:
|
if args.message:
|
||||||
boot_msg["service"] = "service"
|
boot_msg["service"] = "service"
|
||||||
boot_msg["msg"] = args.message
|
boot_msg["msg"] = args.message
|
||||||
|
|
||||||
boot_msg["acks"] = 0
|
boot_msg["acks"] = 0
|
||||||
for conn in connections:
|
target = next((c for c in connections if c.transport), connections[0])
|
||||||
await conn.sendto(boot_msg)
|
await target.sendto(boot_msg)
|
||||||
|
|
||||||
if args.message and not args.daemon:
|
if args.message and not args.daemon:
|
||||||
# Message-only mode
|
# Message-only mode
|
||||||
@@ -522,6 +581,13 @@ async def async_main(args, config):
|
|||||||
loop = asyncio.get_event_loop()
|
loop = asyncio.get_event_loop()
|
||||||
for sig in (signal.SIGTERM, signal.SIGINT):
|
for sig in (signal.SIGTERM, signal.SIGINT):
|
||||||
loop.add_signal_handler(sig, stop)
|
loop.add_signal_handler(sig, stop)
|
||||||
|
|
||||||
|
def _sighup():
|
||||||
|
global dorestart
|
||||||
|
dorestart = True
|
||||||
|
stop()
|
||||||
|
|
||||||
|
loop.add_signal_handler(signal.SIGHUP, _sighup)
|
||||||
|
|
||||||
# Start async tasks
|
# Start async tasks
|
||||||
# Heartbeat senders (one per connection)
|
# Heartbeat senders (one per connection)
|
||||||
@@ -693,7 +759,7 @@ def main(argv=None):
|
|||||||
|
|
||||||
# Daemonize if requested
|
# Daemonize if requested
|
||||||
if args.daemon:
|
if args.daemon:
|
||||||
print("Daemonizing...")
|
logging.info("Daemonizing...")
|
||||||
daemonize()
|
daemonize()
|
||||||
_reconfigure_logging_for_daemon(log_level)
|
_reconfigure_logging_for_daemon(log_level)
|
||||||
logging.info(f"hbc starting, sending heartbeat to {', '.join(args.hosts)}")
|
logging.info(f"hbc starting, sending heartbeat to {', '.join(args.hosts)}")
|
||||||
|
|||||||
@@ -364,7 +364,10 @@ class PluginLoader:
|
|||||||
|
|
||||||
# Instantiate plugin with config — check plugins subdict first,
|
# Instantiate plugin with config — check plugins subdict first,
|
||||||
# then top-level keys (e.g. nagios_runner: ... at root of config).
|
# then top-level keys (e.g. nagios_runner: ... at root of config).
|
||||||
plugin_instance_config = plugins_subconfig.get(obj.name) or raw_config.get(obj.name, {})
|
plugin_instance_config = dict(plugins_subconfig.get(obj.name) or raw_config.get(obj.name) or {})
|
||||||
|
# Propagate top-level owner so os_info (and any future plugin) can report it.
|
||||||
|
if "owner" in raw_config and "owner" not in plugin_instance_config:
|
||||||
|
plugin_instance_config["owner"] = raw_config["owner"]
|
||||||
plugin = obj(config=plugin_instance_config)
|
plugin = obj(config=plugin_instance_config)
|
||||||
|
|
||||||
# Initialize plugin
|
# Initialize plugin
|
||||||
|
|||||||
@@ -118,6 +118,13 @@ class CPUMonitorPlugin(MonitorPlugin):
|
|||||||
data["cpu_iowait"] = round(cpu_times.iowait, 1)
|
data["cpu_iowait"] = round(cpu_times.iowait, 1)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
self.logger.debug(f"Could not get CPU times: {e}")
|
self.logger.debug(f"Could not get CPU times: {e}")
|
||||||
|
|
||||||
|
# Uptime in seconds
|
||||||
|
try:
|
||||||
|
import time
|
||||||
|
data["uptime_seconds"] = int(time.time() - self.psutil.boot_time())
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.debug(f"Could not get uptime: {e}")
|
||||||
|
|
||||||
self.logger.debug(
|
self.logger.debug(
|
||||||
f"Collected CPU metrics: {data.get('cpu_percent', 'N/A')}% usage"
|
f"Collected CPU metrics: {data.get('cpu_percent', 'N/A')}% usage"
|
||||||
|
|||||||
@@ -14,6 +14,24 @@ except ImportError:
|
|||||||
|
|
||||||
from hbd.client.plugin import MonitorPlugin
|
from hbd.client.plugin import MonitorPlugin
|
||||||
|
|
||||||
|
|
||||||
|
def _zfs_arc_bytes() -> int:
|
||||||
|
"""Return current ZFS ARC size in bytes, or 0 if ZFS is not present.
|
||||||
|
|
||||||
|
ZFS ARC is reclaimable but is not included in MemAvailable by the Linux
|
||||||
|
kernel (it is not in SReclaimable), so it would otherwise be counted as
|
||||||
|
used memory.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
with open("/proc/spl/kstat/zfs/arcstats") as fh:
|
||||||
|
for line in fh:
|
||||||
|
parts = line.split()
|
||||||
|
if len(parts) >= 3 and parts[0] == "size":
|
||||||
|
return int(parts[2])
|
||||||
|
except (OSError, ValueError):
|
||||||
|
pass
|
||||||
|
return 0
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
@@ -101,11 +119,21 @@ class MemoryMonitorPlugin(MonitorPlugin):
|
|||||||
|
|
||||||
# Virtual (physical) memory statistics
|
# Virtual (physical) memory statistics
|
||||||
vmem = psutil.virtual_memory()
|
vmem = psutil.virtual_memory()
|
||||||
|
|
||||||
|
# psutil's available already excludes page cache / file buffers
|
||||||
|
# (uses MemAvailable on Linux). Add ZFS ARC on top because the kernel
|
||||||
|
# does not include it in SReclaimable / MemAvailable even though it is
|
||||||
|
# reclaimable.
|
||||||
|
arc_bytes = _zfs_arc_bytes()
|
||||||
|
available = min(vmem.available + arc_bytes, vmem.total)
|
||||||
|
used = vmem.total - available
|
||||||
|
percent = round(used / vmem.total * 100, 1) if vmem.total else 0.0
|
||||||
|
|
||||||
metrics['memory_total'] = vmem.total
|
metrics['memory_total'] = vmem.total
|
||||||
metrics['memory_available'] = vmem.available
|
metrics['memory_available'] = available
|
||||||
metrics['memory_used'] = vmem.used
|
metrics['memory_used'] = used
|
||||||
metrics['memory_free'] = vmem.free
|
metrics['memory_free'] = vmem.free
|
||||||
metrics['memory_percent'] = vmem.percent
|
metrics['memory_percent'] = percent
|
||||||
|
|
||||||
# Platform-specific memory details
|
# Platform-specific memory details
|
||||||
if hasattr(vmem, 'active'):
|
if hasattr(vmem, 'active'):
|
||||||
|
|||||||
@@ -31,16 +31,13 @@ from hbd.client.plugin import MonitorPlugin
|
|||||||
|
|
||||||
|
|
||||||
# Nagios exit codes
|
# Nagios exit codes
|
||||||
NAGIOS_OK = 0
|
|
||||||
NAGIOS_WARNING = 1
|
|
||||||
NAGIOS_CRITICAL = 2
|
|
||||||
NAGIOS_UNKNOWN = 3
|
NAGIOS_UNKNOWN = 3
|
||||||
|
|
||||||
STATUS_NAMES = {
|
STATUS_NAMES = {
|
||||||
NAGIOS_OK: "OK",
|
0: "OK",
|
||||||
NAGIOS_WARNING: "WARNING",
|
1: "WARNING",
|
||||||
NAGIOS_CRITICAL: "CRITICAL",
|
2: "CRITICAL",
|
||||||
NAGIOS_UNKNOWN: "UNKNOWN"
|
3: "UNKNOWN",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -128,52 +125,39 @@ class NagiosRunnerPlugin(MonitorPlugin):
|
|||||||
Dictionary with results from all plugins
|
Dictionary with results from all plugins
|
||||||
"""
|
"""
|
||||||
results = {}
|
results = {}
|
||||||
|
|
||||||
# Track overall status (worst status wins)
|
|
||||||
worst_status = NAGIOS_OK
|
|
||||||
|
|
||||||
for cmd_config in self.commands:
|
for cmd_config in self.commands:
|
||||||
name = cmd_config.get("name")
|
name = cmd_config.get("name")
|
||||||
command = cmd_config.get("command")
|
command = cmd_config.get("command")
|
||||||
|
|
||||||
if not name or not command:
|
if not name or not command:
|
||||||
self.logger.warning("Skipping command with missing name or command")
|
self.logger.warning("Skipping command with missing name or command")
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# Execute plugin
|
# Execute plugin
|
||||||
try:
|
try:
|
||||||
status_code, output, perfdata = await self._run_nagios_plugin(command)
|
status_code, output, perfdata = await self._run_nagios_plugin(command)
|
||||||
|
|
||||||
# Store results
|
# Store results
|
||||||
results[f"{name}_status"] = STATUS_NAMES.get(status_code, "UNKNOWN")
|
results[f"{name}_status"] = STATUS_NAMES.get(status_code, "UNKNOWN")
|
||||||
results[f"{name}_status_code"] = status_code
|
results[f"{name}_status_code"] = status_code
|
||||||
results[f"{name}_output"] = output
|
results[f"{name}_output"] = output
|
||||||
|
|
||||||
# Track worst status
|
|
||||||
if status_code > worst_status:
|
|
||||||
worst_status = status_code
|
|
||||||
|
|
||||||
# Parse and add performance data
|
# Parse and add performance data
|
||||||
if perfdata:
|
if perfdata:
|
||||||
for metric_name, metric_value in perfdata.items():
|
for metric_name, metric_value in perfdata.items():
|
||||||
results[f"{name}_{metric_name}"] = metric_value
|
results[f"{name}_{metric_name}"] = metric_value
|
||||||
|
|
||||||
self.logger.info(
|
self.logger.info(
|
||||||
f"Executed {name}: {STATUS_NAMES.get(status_code, 'UNKNOWN')} - {output[:50]}"
|
f"Executed {name}: {STATUS_NAMES.get(status_code, 'UNKNOWN')} - {output[:50]}"
|
||||||
)
|
)
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
self.logger.error(f"Error running {name}: {e}", exc_info=True)
|
self.logger.error(f"Error running {name}: {e}", exc_info=True)
|
||||||
results[f"{name}_status"] = "ERROR"
|
results[f"{name}_status"] = "ERROR"
|
||||||
results[f"{name}_status_code"] = NAGIOS_UNKNOWN
|
results[f"{name}_status_code"] = NAGIOS_UNKNOWN
|
||||||
results[f"{name}_output"] = str(e)
|
results[f"{name}_output"] = str(e)
|
||||||
worst_status = NAGIOS_UNKNOWN
|
|
||||||
|
|
||||||
# Add overall status
|
|
||||||
results["overall_status"] = STATUS_NAMES.get(worst_status, "UNKNOWN")
|
|
||||||
results["overall_status_code"] = worst_status
|
|
||||||
results["plugin_count"] = len(self.commands)
|
|
||||||
|
|
||||||
return results
|
return results
|
||||||
|
|
||||||
async def _run_nagios_plugin(
|
async def _run_nagios_plugin(
|
||||||
|
|||||||
@@ -60,7 +60,11 @@ class OSInfoPlugin(InfoPlugin):
|
|||||||
"python_version": platform.python_version(),
|
"python_version": platform.python_version(),
|
||||||
"python_implementation": platform.python_implementation(),
|
"python_implementation": platform.python_implementation(),
|
||||||
"hbc_version": hbc_version,
|
"hbc_version": hbc_version,
|
||||||
|
"hbc_type": "full",
|
||||||
}
|
}
|
||||||
|
if self.config.get("owner"):
|
||||||
|
self.logger.debug(f"Adding owner from config: {self.config['owner']}")
|
||||||
|
data["owner"] = self.config["owner"]
|
||||||
|
|
||||||
# Add Linux-specific distribution info
|
# Add Linux-specific distribution info
|
||||||
if platform.system() == "Linux":
|
if platform.system() == "Linux":
|
||||||
|
|||||||
@@ -13,12 +13,8 @@ plugins:
|
|||||||
count: 3 # ICMP packets per ping run (default 3)
|
count: 3 # ICMP packets per ping run (default 3)
|
||||||
timeout: 5 # seconds before a host is considered unreachable (default 5)
|
timeout: 5 # seconds before a host is considered unreachable (default 5)
|
||||||
hosts:
|
hosts:
|
||||||
8.8.8.8:
|
- 8.8.8.8
|
||||||
warning: 20.0 # ms
|
- 192.168.1.1
|
||||||
critical: 100.0 # ms
|
|
||||||
192.168.1.1:
|
|
||||||
warning: 5.0
|
|
||||||
critical: 20.0
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Reported metrics per host (metric key uses the hostname with dots/colons replaced
|
Reported metrics per host (metric key uses the hostname with dots/colons replaced
|
||||||
|
|||||||
@@ -0,0 +1,140 @@
|
|||||||
|
"""
|
||||||
|
ZFS pool monitoring plugin for Heartbeat.
|
||||||
|
|
||||||
|
Collects per-pool health, capacity, and cumulative I/O statistics via zpool(8).
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import shutil
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
from hbd.client.plugin import MonitorPlugin
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def _int(s: str) -> Optional[int]:
|
||||||
|
try:
|
||||||
|
return int(s.strip().rstrip("KMGTkBkmgt%x"))
|
||||||
|
except (ValueError, AttributeError):
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _float(s: str) -> Optional[float]:
|
||||||
|
try:
|
||||||
|
return float(s.strip().rstrip("%x"))
|
||||||
|
except (ValueError, AttributeError):
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
class ZFSMonitorPlugin(MonitorPlugin):
|
||||||
|
"""Monitor ZFS pool health, capacity, and I/O statistics.
|
||||||
|
|
||||||
|
Collects per pool:
|
||||||
|
- health: ONLINE, DEGRADED, FAULTED, etc.
|
||||||
|
- size / alloc / free: total, allocated and free bytes
|
||||||
|
- capacity: percentage used (0-100)
|
||||||
|
- frag: fragmentation percentage
|
||||||
|
- dedup: deduplication ratio
|
||||||
|
- read_ops / write_ops: cumulative I/O operations since last boot/clear
|
||||||
|
- read_bw / write_bw: cumulative bytes transferred since last boot/clear
|
||||||
|
|
||||||
|
Configuration:
|
||||||
|
interval: collection interval in seconds (default: 300)
|
||||||
|
pools: list of pool names to monitor (default: all)
|
||||||
|
"""
|
||||||
|
|
||||||
|
name = "zfs_monitor"
|
||||||
|
description = "ZFS pool health, capacity, and I/O statistics"
|
||||||
|
interval = 300
|
||||||
|
|
||||||
|
def __init__(self, config: Optional[Dict[str, Any]] = None):
|
||||||
|
super().__init__(config)
|
||||||
|
self.interval = self.config.get("interval", 300)
|
||||||
|
self._pools_filter: Optional[List[str]] = self.config.get("pools", None)
|
||||||
|
|
||||||
|
async def initialize(self) -> bool:
|
||||||
|
if not shutil.which("zpool"):
|
||||||
|
self.skip_reason = "zpool not found"
|
||||||
|
return False
|
||||||
|
logger.info("ZFS monitor initialized (interval: %ds)", self.interval)
|
||||||
|
return True
|
||||||
|
|
||||||
|
async def _run(self, *args: str) -> List[str]:
|
||||||
|
"""Run a command and return its stdout lines, or [] on error."""
|
||||||
|
try:
|
||||||
|
proc = await asyncio.create_subprocess_exec(
|
||||||
|
*args,
|
||||||
|
stdout=asyncio.subprocess.PIPE,
|
||||||
|
stderr=asyncio.subprocess.DEVNULL,
|
||||||
|
)
|
||||||
|
stdout, _ = await asyncio.wait_for(proc.communicate(), timeout=15)
|
||||||
|
return stdout.decode(errors="replace").splitlines()
|
||||||
|
except (FileNotFoundError, asyncio.TimeoutError) as exc:
|
||||||
|
logger.warning("zfs_monitor: %s: %s", args[0], exc)
|
||||||
|
return []
|
||||||
|
|
||||||
|
async def _zpool_list(self) -> Dict[str, Dict]:
|
||||||
|
"""Return per-pool health and capacity from `zpool list`."""
|
||||||
|
lines = await self._run(
|
||||||
|
"zpool", "list", "-H", "-p",
|
||||||
|
"-o", "name,health,size,alloc,free,cap,frag,dedup",
|
||||||
|
)
|
||||||
|
pools: Dict[str, Dict] = {}
|
||||||
|
for line in lines:
|
||||||
|
parts = line.split("\t")
|
||||||
|
if len(parts) < 8:
|
||||||
|
continue
|
||||||
|
name = parts[0].strip()
|
||||||
|
if self._pools_filter and name not in self._pools_filter:
|
||||||
|
continue
|
||||||
|
health = parts[1].strip()
|
||||||
|
if health == "ONLINE":
|
||||||
|
status = 0
|
||||||
|
elif health in ("DEGRADED", "ONLINE with errors"):
|
||||||
|
status = 1
|
||||||
|
elif health in ("FAULTED", "OFFLINE", "UNAVAIL"):
|
||||||
|
status = 2
|
||||||
|
else:
|
||||||
|
status = 3 # unknown status
|
||||||
|
pools[name] = {
|
||||||
|
"health": health,
|
||||||
|
"status": status,
|
||||||
|
"size": _int(parts[2]),
|
||||||
|
"alloc": _int(parts[3]),
|
||||||
|
"free": _int(parts[4]),
|
||||||
|
"capacity": _float(parts[5]),
|
||||||
|
"frag": _float(parts[6]),
|
||||||
|
"dedup": _float(parts[7]),
|
||||||
|
}
|
||||||
|
return pools
|
||||||
|
|
||||||
|
async def _zpool_iostat(self) -> Dict[str, Dict]:
|
||||||
|
"""Return per-pool cumulative I/O counters from `zpool iostat`."""
|
||||||
|
lines = await self._run("zpool", "iostat", "-H", "-p")
|
||||||
|
io: Dict[str, Dict] = {}
|
||||||
|
for line in lines:
|
||||||
|
parts = line.split("\t")
|
||||||
|
if len(parts) < 7:
|
||||||
|
continue
|
||||||
|
name = parts[0].strip()
|
||||||
|
if not name or name.startswith(" "):
|
||||||
|
continue
|
||||||
|
io[name] = {
|
||||||
|
"read_ops": _int(parts[3]),
|
||||||
|
"write_ops": _int(parts[4]),
|
||||||
|
"read_bw": _int(parts[5]),
|
||||||
|
"write_bw": _int(parts[6]),
|
||||||
|
}
|
||||||
|
return io
|
||||||
|
|
||||||
|
async def _collect_metrics(self) -> Dict[str, Any]:
|
||||||
|
pools, io = await asyncio.gather(self._zpool_list(), self._zpool_iostat())
|
||||||
|
for name, stats in io.items():
|
||||||
|
if name in pools:
|
||||||
|
pools[name].update(stats)
|
||||||
|
return {"pools": pools}
|
||||||
|
|
||||||
|
|
||||||
|
plugin = ZFSMonitorPlugin
|
||||||
@@ -134,6 +134,30 @@ thresholds:
|
|||||||
hysteresis: 0.1
|
hysteresis: 0.1
|
||||||
enabled: true
|
enabled: true
|
||||||
|
|
||||||
|
# ----------------------------------------------------------------------------
|
||||||
|
# ZFS Monitor Thresholds
|
||||||
|
# ----------------------------------------------------------------------------
|
||||||
|
zfs_monitor:
|
||||||
|
# Pool health check — built-in default; shown here for reference/override.
|
||||||
|
# status is 0 (ONLINE) or 1 (DEGRADED) or 2 (SUSPENDED, FAULTED, UNAVAIL…).
|
||||||
|
# Use '*' to apply the same rule to every pool, or name a specific pool.
|
||||||
|
pools:
|
||||||
|
'*':
|
||||||
|
status:
|
||||||
|
warning: 1 # Alert WARNING when pool is DEGRADED
|
||||||
|
critical: 2 # Alert CRITICAL when pool is SUSPENDED/FAULTED/UNAVAIL
|
||||||
|
operator: ">"
|
||||||
|
hysteresis: 0.0 # No hysteresis — a degraded pool is always critical
|
||||||
|
display: "ZFS pool {pool_name} is {health}"
|
||||||
|
|
||||||
|
# Per-pool capacity thresholds (optional; add pools you care about)
|
||||||
|
# tank:
|
||||||
|
# capacity:
|
||||||
|
# warning: 75.0 # Warn at 75% used
|
||||||
|
# critical: 90.0 # Critical at 90% used
|
||||||
|
# operator: ">"
|
||||||
|
# hysteresis: 0.05
|
||||||
|
|
||||||
# ----------------------------------------------------------------------------
|
# ----------------------------------------------------------------------------
|
||||||
# Network Monitor Thresholds
|
# Network Monitor Thresholds
|
||||||
# ----------------------------------------------------------------------------
|
# ----------------------------------------------------------------------------
|
||||||
|
|||||||
+5
-6
@@ -144,17 +144,16 @@ def cmd_notify(args):
|
|||||||
url=f"{base_url}/plugins" if base_url else "",
|
url=f"{base_url}/plugins" if base_url else "",
|
||||||
)
|
)
|
||||||
|
|
||||||
# Bypass min_level for explicit test sends; run async channels directly
|
|
||||||
import asyncio
|
import asyncio
|
||||||
|
from .notify import _send_matrix_async, _send_sms_voipms_async, _DRIVERS
|
||||||
ch_type = channel_cfg.get("type", "")
|
ch_type = channel_cfg.get("type", "")
|
||||||
print(f"Sending via {args.channel} ({ch_type}): {title} — {args.message}")
|
print(f"Sending via {args.channel} ({ch_type}): {title} — {args.message}")
|
||||||
|
|
||||||
if ch_type in ("matrix", "sms_voipms"):
|
if ch_type == "matrix":
|
||||||
from .notify import _send_matrix_async, _send_sms_voipms_async
|
ok = asyncio.run(_send_matrix_async(channel_cfg, notif))
|
||||||
driver_async = _send_matrix_async if ch_type == "matrix" else _send_sms_voipms_async
|
elif ch_type == "sms_voipms":
|
||||||
ok = asyncio.run(driver_async(channel_cfg, notif))
|
ok = asyncio.run(_send_sms_voipms_async(channel_cfg, notif))
|
||||||
else:
|
else:
|
||||||
from .notify import _DRIVERS
|
|
||||||
driver = _DRIVERS.get(ch_type)
|
driver = _DRIVERS.get(ch_type)
|
||||||
if driver is None:
|
if driver is None:
|
||||||
print(f"Error: unknown channel type '{ch_type}'", file=sys.stderr)
|
print(f"Error: unknown channel type '{ch_type}'", file=sys.stderr)
|
||||||
|
|||||||
+25
-3
@@ -34,6 +34,9 @@ SERVER_DEFAULTS = {
|
|||||||
"users": {}, # username -> {full_name, avatar, password, admin, notification_channels}
|
"users": {}, # username -> {full_name, avatar, password, admin, notification_channels}
|
||||||
"default_owner": None, # Username that owns hosts with no explicit owner
|
"default_owner": None, # Username that owns hosts with no explicit owner
|
||||||
|
|
||||||
|
# OAuth2 providers
|
||||||
|
"oauth": {}, # oauth.gitea.{url,client_id,client_secret}
|
||||||
|
|
||||||
# Host management
|
# Host management
|
||||||
"hosts": {}, # Unified host definitions
|
"hosts": {}, # Unified host definitions
|
||||||
"dyndnshosts": [], # Hosts with dynamic DNS (legacy)
|
"dyndnshosts": [], # Hosts with dynamic DNS (legacy)
|
||||||
@@ -95,7 +98,26 @@ THRESHOLD_DEFAULTS = {
|
|||||||
'warning': 200,
|
'warning': 200,
|
||||||
'critical': 250.0,
|
'critical': 250.0,
|
||||||
'count': 3 # Optional: number of consecutive breaches before alerting
|
'count': 3 # Optional: number of consecutive breaches before alerting
|
||||||
}
|
},
|
||||||
|
'nagios_runner': {
|
||||||
|
'status_code': {
|
||||||
|
'display': '{check_name} {output}',
|
||||||
|
'operator': "nagios"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
'zfs_monitor': {
|
||||||
|
'pools': {
|
||||||
|
'*': {
|
||||||
|
'status': {
|
||||||
|
'warning': 1,
|
||||||
|
'critical': 2,
|
||||||
|
'operator': '>',
|
||||||
|
'hysteresis': 0.0,
|
||||||
|
'display': 'ZFS pool {pool_name} is {health}'
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -225,7 +247,7 @@ def get_watchhosts(config):
|
|||||||
hosts_config = config.get("hosts", {})
|
hosts_config = config.get("hosts", {})
|
||||||
if isinstance(hosts_config, dict):
|
if isinstance(hosts_config, dict):
|
||||||
for host_name, host_attrs in hosts_config.items():
|
for host_name, host_attrs in hosts_config.items():
|
||||||
if isinstance(host_attrs, dict) and host_attrs.get("watch", False):
|
if isinstance(host_attrs, dict) and host_attrs.get("watch", True):
|
||||||
watchhosts.append(host_name)
|
watchhosts.append(host_name)
|
||||||
return watchhosts
|
return watchhosts
|
||||||
|
|
||||||
@@ -303,7 +325,7 @@ def get_host_access(config, hostname) -> dict:
|
|||||||
"""
|
"""
|
||||||
host_cfg = get_host_config(config, hostname)
|
host_cfg = get_host_config(config, hostname)
|
||||||
|
|
||||||
owner = host_cfg.get("owner") or get_default_owner(config)
|
owner = host_cfg.get("owner") # or get_default_owner(config)
|
||||||
|
|
||||||
managers = host_cfg.get("managers", [])
|
managers = host_cfg.get("managers", [])
|
||||||
if isinstance(managers, str):
|
if isinstance(managers, str):
|
||||||
|
|||||||
@@ -95,7 +95,7 @@ class Connection:
|
|||||||
if not Null:
|
if not Null:
|
||||||
d["addr"] = self.addr
|
d["addr"] = self.addr
|
||||||
if self.rtts[-1]:
|
if self.rtts[-1]:
|
||||||
d["rtt"] = "%0.1f" % self.rtts[-1]
|
d["rtt"] = "%d" % round(self.rtts[-1])
|
||||||
elif self.state == Connection.UNKNOWN:
|
elif self.state == Connection.UNKNOWN:
|
||||||
d["rtt"] = ""
|
d["rtt"] = ""
|
||||||
else:
|
else:
|
||||||
@@ -286,7 +286,7 @@ class Host:
|
|||||||
Host.hosts[name] = self
|
Host.hosts[name] = self
|
||||||
self.num = num
|
self.num = num
|
||||||
self.dyn = False
|
self.dyn = False
|
||||||
self.watched = False
|
self.watched = True
|
||||||
self.upcount = 0
|
self.upcount = 0
|
||||||
self.interval = 0
|
self.interval = 0
|
||||||
self.doesack = -1
|
self.doesack = -1
|
||||||
@@ -304,6 +304,7 @@ class Host:
|
|||||||
|
|
||||||
def statedict(self):
|
def statedict(self):
|
||||||
d = {}
|
d = {}
|
||||||
|
d["raw_name"] = self.name
|
||||||
d["name"] = self.name
|
d["name"] = self.name
|
||||||
if self.dyn:
|
if self.dyn:
|
||||||
d["name"] += "*"
|
d["name"] += "*"
|
||||||
|
|||||||
+143
-13
@@ -1,7 +1,11 @@
|
|||||||
"""HTTP server implementation using aiohttp and jinja2."""
|
"""HTTP server implementation using aiohttp and jinja2."""
|
||||||
|
|
||||||
import asyncio
|
import asyncio
|
||||||
|
import datetime
|
||||||
import json
|
import json
|
||||||
|
import platform
|
||||||
|
import socket
|
||||||
|
import sys
|
||||||
import time
|
import time
|
||||||
import urllib.parse
|
import urllib.parse
|
||||||
import os
|
import os
|
||||||
@@ -12,6 +16,7 @@ from . import data
|
|||||||
from . import notify as notify_mod
|
from . import notify as notify_mod
|
||||||
from . import settings as settings_mod
|
from . import settings as settings_mod
|
||||||
from . import users as users_mod
|
from . import users as users_mod
|
||||||
|
from . import oauth as oauth_mod
|
||||||
from . import ws as ws_mod
|
from . import ws as ws_mod
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
@@ -111,6 +116,7 @@ async def start(
|
|||||||
This function is intended to be awaited inside the main asyncio event loop.
|
This function is intended to be awaited inside the main asyncio event loop.
|
||||||
"""
|
"""
|
||||||
get_now = get_now or (lambda: time.time())
|
get_now = get_now or (lambda: time.time())
|
||||||
|
_start_epoch = time.time()
|
||||||
|
|
||||||
async def old_index(request):
|
async def old_index(request):
|
||||||
_require_auth_redirect(request)
|
_require_auth_redirect(request)
|
||||||
@@ -149,6 +155,25 @@ async def start(
|
|||||||
lst = [h.jsons() for h in hosts]
|
lst = [h.jsons() for h in hosts]
|
||||||
return web.json_response(json.loads("[" + ",".join(lst) + "]"))
|
return web.json_response(json.loads("[" + ",".join(lst) + "]"))
|
||||||
|
|
||||||
|
async def api_alert_summary(request):
|
||||||
|
"""GET /api/0/alert_summary — counts of ok/warning/critical hosts visible to caller."""
|
||||||
|
user, err = _require_auth(request)
|
||||||
|
if err:
|
||||||
|
return err
|
||||||
|
from .threshold import AlertLevel
|
||||||
|
critical = warning = ok = 0
|
||||||
|
for host in hbdclass.Host.hosts.values():
|
||||||
|
if not _can_operate_host(user, host):
|
||||||
|
continue
|
||||||
|
levels = {s.level for s in host.alert_states.values()}
|
||||||
|
if AlertLevel.CRITICAL in levels:
|
||||||
|
critical += 1
|
||||||
|
elif AlertLevel.WARNING in levels:
|
||||||
|
warning += 1
|
||||||
|
else:
|
||||||
|
ok += 1
|
||||||
|
return web.json_response({"critical": critical, "warning": warning, "ok": ok})
|
||||||
|
|
||||||
async def api_messages(request):
|
async def api_messages(request):
|
||||||
lst = data.msgs[-30:]
|
lst = data.msgs[-30:]
|
||||||
return web.json_response(lst)
|
return web.json_response(lst)
|
||||||
@@ -210,15 +235,11 @@ async def start(
|
|||||||
return err
|
return err
|
||||||
qa = request.rel_url.query
|
qa = request.rel_url.query
|
||||||
uname = urllib.parse.unquote(qa.get("h", ""))
|
uname = urllib.parse.unquote(qa.get("h", ""))
|
||||||
ucode = qa.get("c")
|
if not uname:
|
||||||
if not ucode or not uname:
|
return web.Response(status=400, text="need h= argument")
|
||||||
return web.Response(status=400, text="need h= and c= arguments")
|
|
||||||
if uname != "All" and uname not in hbdclass.Host.hosts:
|
if uname != "All" and uname not in hbdclass.Host.hosts:
|
||||||
return web.Response(status=400, text=f"h={uname} not found")
|
return web.Response(status=400, text=f"h={uname} not found")
|
||||||
if uname != "All":
|
names = [uname] if uname != "All" else list(hbdclass.Host.hosts)
|
||||||
names = [uname]
|
|
||||||
else:
|
|
||||||
names = [n for n in hbdclass.Host.hosts]
|
|
||||||
out = []
|
out = []
|
||||||
for n in names:
|
for n in names:
|
||||||
host = hbdclass.Host.hosts[n]
|
host = hbdclass.Host.hosts[n]
|
||||||
@@ -227,8 +248,7 @@ async def start(
|
|||||||
continue
|
continue
|
||||||
op_err = None
|
op_err = None
|
||||||
try:
|
try:
|
||||||
r = {"csum": None, "code": ucode}
|
host.cmds.append(("UPD", {}))
|
||||||
host.cmds.append(("UPD", r))
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
op_err = str(e)
|
op_err = str(e)
|
||||||
out.append(f"update started for {n}: {op_err if op_err else 'OK'}")
|
out.append(f"update started for {n}: {op_err if op_err else 'OK'}")
|
||||||
@@ -258,7 +278,9 @@ async def start(
|
|||||||
extra_scripts=extra_scripts,
|
extra_scripts=extra_scripts,
|
||||||
hbd_version=hbd_version,
|
hbd_version=hbd_version,
|
||||||
hosts=[
|
hosts=[
|
||||||
hbdclass.Host.hosts[h].stateinfo() for h in sorted(hbdclass.Host.hosts)
|
hbdclass.Host.hosts[h].stateinfo()
|
||||||
|
for h in sorted(hbdclass.Host.hosts)
|
||||||
|
if _can_operate_host(current_user, hbdclass.Host.hosts[h])
|
||||||
],
|
],
|
||||||
messages=data.msgs[-30:],
|
messages=data.msgs[-30:],
|
||||||
current_user=current_user.to_dict() if current_user else None,
|
current_user=current_user.to_dict() if current_user else None,
|
||||||
@@ -510,12 +532,14 @@ async def start(
|
|||||||
hosts_with_plugins = []
|
hosts_with_plugins = []
|
||||||
for hostname in sorted(hbdclass.Host.hosts.keys()):
|
for hostname in sorted(hbdclass.Host.hosts.keys()):
|
||||||
host = hbdclass.Host.hosts[hostname]
|
host = hbdclass.Host.hosts[hostname]
|
||||||
if not _can_view_host(current_user, host):
|
if not _can_operate_host(current_user, host):
|
||||||
continue
|
continue
|
||||||
if host.plugin_data:
|
if host.plugin_data:
|
||||||
hosts_with_plugins.append({
|
hosts_with_plugins.append({
|
||||||
"name": hostname,
|
"name": hostname,
|
||||||
"plugins": list(host.plugin_data.keys()),
|
"plugins": list(host.plugin_data.keys()),
|
||||||
|
"is_owner": _can_own_host(current_user, host),
|
||||||
|
"owner": host.owner,
|
||||||
})
|
})
|
||||||
|
|
||||||
tmpl = env.get_template("plugins.html")
|
tmpl = env.get_template("plugins.html")
|
||||||
@@ -598,6 +622,16 @@ async def start(
|
|||||||
)
|
)
|
||||||
raise resp
|
raise resp
|
||||||
error = "Invalid username or password."
|
error = "Invalid username or password."
|
||||||
|
elif request.rel_url.query.get("error"):
|
||||||
|
error = "Sign-in failed. Please try again."
|
||||||
|
|
||||||
|
gitea_button = ""
|
||||||
|
if oauth_mod.is_enabled(config):
|
||||||
|
gitea_button = f"""
|
||||||
|
<div class="divider">or</div>
|
||||||
|
<a href="/login/oauth/gitea" class="gitea-btn">
|
||||||
|
Sign in with Gitea
|
||||||
|
</a>"""
|
||||||
|
|
||||||
html = f"""<!DOCTYPE html>
|
html = f"""<!DOCTYPE html>
|
||||||
<html>
|
<html>
|
||||||
@@ -618,6 +652,12 @@ async def start(
|
|||||||
button:hover {{ background: #0055aa; }}
|
button:hover {{ background: #0055aa; }}
|
||||||
.error {{ color: #c00; font-size: .9em; margin-bottom: .8em; }}
|
.error {{ color: #c00; font-size: .9em; margin-bottom: .8em; }}
|
||||||
.field {{ margin-bottom: .9em; }}
|
.field {{ margin-bottom: .9em; }}
|
||||||
|
.divider {{ text-align: center; margin: 1.2em 0 .8em; color: #999;
|
||||||
|
font-size: .85em; border-top: 1px solid #eee; padding-top: .8em; }}
|
||||||
|
.gitea-btn {{ display: block; width: 100%; padding: .6em; background: #609926;
|
||||||
|
color: #fff; border-radius: 4px; font-size: 1em; text-align: center;
|
||||||
|
text-decoration: none; box-sizing: border-box; }}
|
||||||
|
.gitea-btn:hover {{ background: #4e7d1e; }}
|
||||||
</style>
|
</style>
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
@@ -628,7 +668,7 @@ async def start(
|
|||||||
<div class="field"><label>Username</label><input name="username" autofocus></div>
|
<div class="field"><label>Username</label><input name="username" autofocus></div>
|
||||||
<div class="field"><label>Password</label><input name="password" type="password"></div>
|
<div class="field"><label>Password</label><input name="password" type="password"></div>
|
||||||
<button type="submit">Sign in</button>
|
<button type="submit">Sign in</button>
|
||||||
</form>
|
</form>{gitea_button}
|
||||||
</div>
|
</div>
|
||||||
</body>
|
</body>
|
||||||
</html>"""
|
</html>"""
|
||||||
@@ -811,6 +851,48 @@ async def start(
|
|||||||
)
|
)
|
||||||
return web.Response(text=body, content_type="text/html")
|
return web.Response(text=body, content_type="text/html")
|
||||||
|
|
||||||
|
# -------------------------------------------------------------------------
|
||||||
|
# About page
|
||||||
|
# -------------------------------------------------------------------------
|
||||||
|
|
||||||
|
async def about_page(request):
|
||||||
|
"""GET /about — version, runtime, and project information."""
|
||||||
|
current_user, _ = _require_auth_redirect(request)
|
||||||
|
pkg_dir = os.path.dirname(__file__)
|
||||||
|
templates_dir = config.get("templates_dir", os.path.join(pkg_dir, "templates"))
|
||||||
|
env = jinja2.Environment(loader=jinja2.FileSystemLoader(templates_dir))
|
||||||
|
from hbd import __version__ as hbd_version
|
||||||
|
|
||||||
|
uptime_secs = int(time.time() - _start_epoch)
|
||||||
|
days, rem = divmod(uptime_secs, 86400)
|
||||||
|
hours, rem = divmod(rem, 3600)
|
||||||
|
mins, secs = divmod(rem, 60)
|
||||||
|
if days:
|
||||||
|
uptime_str = f"{days}d {hours}h {mins}m"
|
||||||
|
elif hours:
|
||||||
|
uptime_str = f"{hours}h {mins}m {secs}s"
|
||||||
|
else:
|
||||||
|
uptime_str = f"{mins}m {secs}s"
|
||||||
|
|
||||||
|
start_dt = datetime.datetime.fromtimestamp(_start_epoch)
|
||||||
|
start_time_str = start_dt.strftime("%Y-%m-%d %H:%M:%S")
|
||||||
|
|
||||||
|
tmpl = env.get_template("about.html")
|
||||||
|
body = tmpl.render(
|
||||||
|
title="About - Heartbeat",
|
||||||
|
header="About",
|
||||||
|
hbd_version=hbd_version,
|
||||||
|
python_version=f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro} ({platform.python_implementation()})",
|
||||||
|
server_hostname=socket.gethostname(),
|
||||||
|
start_epoch=int(_start_epoch),
|
||||||
|
start_time_str=start_time_str,
|
||||||
|
uptime_str=uptime_str,
|
||||||
|
host_count=len(hbdclass.Host.hosts),
|
||||||
|
current_user=current_user.to_dict() if current_user else None,
|
||||||
|
active_page="about",
|
||||||
|
)
|
||||||
|
return web.Response(text=body, content_type="text/html")
|
||||||
|
|
||||||
# -------------------------------------------------------------------------
|
# -------------------------------------------------------------------------
|
||||||
# Settings page (admin only)
|
# Settings page (admin only)
|
||||||
# -------------------------------------------------------------------------
|
# -------------------------------------------------------------------------
|
||||||
@@ -826,12 +908,56 @@ async def start(
|
|||||||
tmpl = env.get_template("settings.html")
|
tmpl = env.get_template("settings.html")
|
||||||
body = tmpl.render(
|
body = tmpl.render(
|
||||||
title="Settings - Heartbeat",
|
title="Settings - Heartbeat",
|
||||||
sections=settings_mod.get_settings_sections(config),
|
sections=settings_mod.get_settings_sections(config, threshold_checker=threshold_checker),
|
||||||
current_user=current_user.to_dict() if current_user else None,
|
current_user=current_user.to_dict() if current_user else None,
|
||||||
active_page="settings",
|
active_page="settings",
|
||||||
)
|
)
|
||||||
return web.Response(text=body, content_type="text/html")
|
return web.Response(text=body, content_type="text/html")
|
||||||
|
|
||||||
|
def _oauth_redirect_uri(request) -> str:
|
||||||
|
base = config.get("base_url", "").rstrip("/") or str(request.url.origin())
|
||||||
|
return f"{base}/login/oauth/gitea/callback"
|
||||||
|
|
||||||
|
async def oauth_gitea_redirect(request):
|
||||||
|
"""GET /login/oauth/gitea — kick off the Gitea OAuth2 flow."""
|
||||||
|
if not oauth_mod.is_enabled(config):
|
||||||
|
return web.Response(status=404, text="OAuth not configured")
|
||||||
|
state = oauth_mod.make_state()
|
||||||
|
raise web.HTTPFound(oauth_mod.authorization_url(config, state, _oauth_redirect_uri(request)))
|
||||||
|
|
||||||
|
async def oauth_gitea_callback(request):
|
||||||
|
"""GET /login/oauth/gitea/callback — handle Gitea's redirect back."""
|
||||||
|
if not oauth_mod.is_enabled(config):
|
||||||
|
return web.Response(status=404, text="OAuth not configured")
|
||||||
|
code = request.rel_url.query.get("code", "")
|
||||||
|
state = request.rel_url.query.get("state", "")
|
||||||
|
if not code or not state:
|
||||||
|
return web.Response(status=400, text="Missing code or state")
|
||||||
|
if not oauth_mod.validate_state(state):
|
||||||
|
logger.warning("OAuth: invalid or expired state token from %s", request.remote)
|
||||||
|
raise web.HTTPFound("/login?error=1")
|
||||||
|
try:
|
||||||
|
token = await oauth_mod.exchange_code(config, code, _oauth_redirect_uri(request))
|
||||||
|
profile = await oauth_mod.fetch_user(config, token)
|
||||||
|
except oauth_mod.OAuthError as exc:
|
||||||
|
logger.warning("OAuth error: %s", exc)
|
||||||
|
raise web.HTTPFound("/login?error=1")
|
||||||
|
user = users_mod.provision_oauth_user(
|
||||||
|
profile["login"],
|
||||||
|
profile["full_name"],
|
||||||
|
profile["avatar_url"],
|
||||||
|
)
|
||||||
|
session_token = users_mod.create_session(user.username)
|
||||||
|
resp = web.HTTPFound("/")
|
||||||
|
resp.set_cookie(
|
||||||
|
SESSION_COOKIE,
|
||||||
|
session_token,
|
||||||
|
max_age=users_mod.SESSION_TTL,
|
||||||
|
httponly=True,
|
||||||
|
samesite="Lax",
|
||||||
|
)
|
||||||
|
raise resp
|
||||||
|
|
||||||
app = web.Application()
|
app = web.Application()
|
||||||
app.add_routes(
|
app.add_routes(
|
||||||
[
|
[
|
||||||
@@ -843,12 +969,15 @@ async def start(
|
|||||||
web.get("/logout", web_logout),
|
web.get("/logout", web_logout),
|
||||||
web.post("/api/0/auth/login", api_login),
|
web.post("/api/0/auth/login", api_login),
|
||||||
web.post("/api/0/auth/logout", api_logout),
|
web.post("/api/0/auth/logout", api_logout),
|
||||||
|
web.get("/login/oauth/gitea", oauth_gitea_redirect),
|
||||||
|
web.get("/login/oauth/gitea/callback", oauth_gitea_callback),
|
||||||
# Users
|
# Users
|
||||||
web.get("/api/0/users", api_users),
|
web.get("/api/0/users", api_users),
|
||||||
web.get("/api/0/users/me", api_user_self),
|
web.get("/api/0/users/me", api_user_self),
|
||||||
web.get("/api/0/users/{username}/avatar", api_user_avatar),
|
web.get("/api/0/users/{username}/avatar", api_user_avatar),
|
||||||
# Hosts
|
# Hosts
|
||||||
web.get("/api/0/hosts", api_hosts),
|
web.get("/api/0/hosts", api_hosts),
|
||||||
|
web.get("/api/0/alert_summary", api_alert_summary),
|
||||||
web.get("/api/0/messages", api_messages),
|
web.get("/api/0/messages", api_messages),
|
||||||
web.get("/api/0/hosts/{hostname}/plugins", api_host_plugins),
|
web.get("/api/0/hosts/{hostname}/plugins", api_host_plugins),
|
||||||
web.get("/api/0/hosts/{hostname}/plugins/{plugin_name}", api_host_plugin_detail),
|
web.get("/api/0/hosts/{hostname}/plugins/{plugin_name}", api_host_plugin_detail),
|
||||||
@@ -864,6 +993,7 @@ async def start(
|
|||||||
web.get("/live", live),
|
web.get("/live", live),
|
||||||
web.get("/plugins", plugins_page),
|
web.get("/plugins", plugins_page),
|
||||||
web.get("/alerts", alerts_page),
|
web.get("/alerts", alerts_page),
|
||||||
|
web.get("/about", about_page),
|
||||||
web.get("/profile", profile_page),
|
web.get("/profile", profile_page),
|
||||||
web.get("/settings", settings_page),
|
web.get("/settings", settings_page),
|
||||||
web.get("/static/{path:.*}", static),
|
web.get("/static/{path:.*}", static),
|
||||||
|
|||||||
+9
-3
@@ -101,9 +101,10 @@ async def reload_configuration(config_obj, config_path, components):
|
|||||||
access = config_mod.get_host_access(new_config, hostname)
|
access = config_mod.get_host_access(new_config, hostname)
|
||||||
host.apply_access(access["owner"], access["managers"], access["monitors"])
|
host.apply_access(access["owner"], access["managers"], access["monitors"])
|
||||||
|
|
||||||
# Reload threshold checker
|
# Reload threshold checker and prune alerts orphaned by the new config
|
||||||
if 'threshold_checker' in components:
|
if 'threshold_checker' in components:
|
||||||
components['threshold_checker'].reload(new_config)
|
components['threshold_checker'].reload(new_config)
|
||||||
|
components['threshold_checker'].purge_stale_alerts(hbdclass)
|
||||||
|
|
||||||
# Note: Changes to the following require restart:
|
# Note: Changes to the following require restart:
|
||||||
# - hb_port, hbd_port, ws_port (already bound)
|
# - hb_port, hbd_port, ws_port (already bound)
|
||||||
@@ -210,7 +211,6 @@ async def _run_async(config, config_path=None):
|
|||||||
ctx = dict(
|
ctx = dict(
|
||||||
config=config,
|
config=config,
|
||||||
hbdclass=hbdclass,
|
hbdclass=hbdclass,
|
||||||
log=eventlog,
|
|
||||||
msg_to_websockets=msg_to_websockets,
|
msg_to_websockets=msg_to_websockets,
|
||||||
msg_journal=msg_journal,
|
msg_journal=msg_journal,
|
||||||
threshold_checker=threshold_checker,
|
threshold_checker=threshold_checker,
|
||||||
@@ -237,12 +237,15 @@ async def _run_async(config, config_path=None):
|
|||||||
restore_ctx = dict(
|
restore_ctx = dict(
|
||||||
config=config,
|
config=config,
|
||||||
hbdclass=hbdclass,
|
hbdclass=hbdclass,
|
||||||
log=eventlog,
|
|
||||||
msg_to_websockets=msg_to_websockets,
|
msg_to_websockets=msg_to_websockets,
|
||||||
threshold_checker=threshold_checker,
|
threshold_checker=threshold_checker,
|
||||||
)
|
)
|
||||||
udp.restore_connection_timers(hbdclass, restore_ctx)
|
udp.restore_connection_timers(hbdclass, restore_ctx)
|
||||||
|
|
||||||
|
# Drop alert states that no longer have a matching threshold (stale after
|
||||||
|
# upgrade or config change between runs).
|
||||||
|
threshold_checker.purge_stale_alerts(hbdclass)
|
||||||
|
|
||||||
# HTTP server (asyncio-based via aiohttp)
|
# HTTP server (asyncio-based via aiohttp)
|
||||||
try:
|
try:
|
||||||
http_task = asyncio.create_task(
|
http_task = asyncio.create_task(
|
||||||
@@ -252,6 +255,7 @@ async def _run_async(config, config_path=None):
|
|||||||
config=config,
|
config=config,
|
||||||
hbdclass=hbdclass,
|
hbdclass=hbdclass,
|
||||||
tcss=None,
|
tcss=None,
|
||||||
|
threshold_checker=threshold_checker,
|
||||||
verbose=config.get("verbose", False),
|
verbose=config.get("verbose", False),
|
||||||
get_now=lambda: time.time(),
|
get_now=lambda: time.time(),
|
||||||
VER="",
|
VER="",
|
||||||
@@ -471,6 +475,8 @@ def run(config, config_path=None):
|
|||||||
if config.get("debug", 0) > 0:
|
if config.get("debug", 0) > 0:
|
||||||
log_level = logging.DEBUG
|
log_level = logging.DEBUG
|
||||||
logging.basicConfig(level=log_level)
|
logging.basicConfig(level=log_level)
|
||||||
|
if not config.get("debug", 0):
|
||||||
|
logging.getLogger("aiohttp.access").propagate = False
|
||||||
load_pickled_hosts(config, hbdclass)
|
load_pickled_hosts(config, hbdclass)
|
||||||
|
|
||||||
notify_mod.initlog(logfile=config.get("logfile", "messages.log"))
|
notify_mod.initlog(logfile=config.get("logfile", "messages.log"))
|
||||||
|
|||||||
+42
-61
@@ -15,7 +15,6 @@ their own ``notification_channels`` list. When no users are configured the
|
|||||||
server runs silently (no notifications sent).
|
server runs silently (no notifications sent).
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import asyncio
|
|
||||||
import asyncio
|
import asyncio
|
||||||
import logging
|
import logging
|
||||||
import smtplib
|
import smtplib
|
||||||
@@ -30,13 +29,10 @@ from . import ws as ws_mod
|
|||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
msg_to_websockets = ws_mod.broadcast
|
msg_to_websockets = ws_mod.broadcast
|
||||||
|
|
||||||
# Module-level state set via setup()
|
# Module-level state set via setup()
|
||||||
_config: dict = {}
|
_config: dict = {}
|
||||||
_loop: Optional[asyncio.AbstractEventLoop] = None
|
|
||||||
|
|
||||||
# Tracks which channels fired a WARNING/CRITICAL per host.
|
# Tracks which channels fired a WARNING/CRITICAL per host.
|
||||||
# {host_name: set of channel_names} — used to route RECOVER to the same channels.
|
# {host_name: set of channel_names} — used to route RECOVER to the same channels.
|
||||||
@@ -73,11 +69,9 @@ class Notification:
|
|||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
def setup(cfg: dict, loop: Optional[asyncio.AbstractEventLoop] = None):
|
def setup(cfg: dict, loop: Optional[asyncio.AbstractEventLoop] = None):
|
||||||
"""Initialize notifier from configuration dict and event loop."""
|
"""Initialize notifier from configuration dict."""
|
||||||
global _config, _loop
|
global _config
|
||||||
_config = dict(cfg)
|
_config = dict(cfg)
|
||||||
if loop is not None:
|
|
||||||
_loop = loop
|
|
||||||
|
|
||||||
|
|
||||||
def reload_config(cfg: dict):
|
def reload_config(cfg: dict):
|
||||||
@@ -112,11 +106,18 @@ def closelog():
|
|||||||
|
|
||||||
def eventlog(host, lvl, m, service=None):
|
def eventlog(host, lvl, m, service=None):
|
||||||
ts = time.time()
|
ts = time.time()
|
||||||
|
msg = {
|
||||||
|
"ts": ts,
|
||||||
|
"host": host or None,
|
||||||
|
"level": lvl,
|
||||||
|
"service": service,
|
||||||
|
"message": m,
|
||||||
|
}
|
||||||
|
data.msgs.append(msg)
|
||||||
s = f"{time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(ts))} {lvl} "
|
s = f"{time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(ts))} {lvl} "
|
||||||
if host:
|
if host:
|
||||||
s += f"{host} "
|
s += f"{host} "
|
||||||
s += m
|
s += m
|
||||||
data.msgs.append(s)
|
|
||||||
logger.info(s)
|
logger.info(s)
|
||||||
if logf:
|
if logf:
|
||||||
try:
|
try:
|
||||||
@@ -124,7 +125,7 @@ def eventlog(host, lvl, m, service=None):
|
|||||||
logf.flush()
|
logf.flush()
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.warning("failed to write to logfile: %s", e)
|
logger.warning("failed to write to logfile: %s", e)
|
||||||
msg_to_websockets("message", s)
|
msg_to_websockets("message", msg)
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
@@ -140,9 +141,11 @@ def _send_pushover(channel_cfg: dict, notif: Notification) -> bool:
|
|||||||
logger.warning("pushover: missing token or user")
|
logger.warning("pushover: missing token or user")
|
||||||
return False
|
return False
|
||||||
params: dict = {"token": token, "user": user, "title": notif.title, "message": notif.body}
|
params: dict = {"token": token, "user": user, "title": notif.title, "message": notif.body}
|
||||||
|
if channel_cfg.get("sound"):
|
||||||
|
params["sound"] = channel_cfg["sound"]
|
||||||
if notif.url:
|
if notif.url:
|
||||||
params["url"] = notif.url
|
params["url"] = notif.url
|
||||||
params["url_title"] = "Plugin metrics"
|
params["url_title"] = "Heartbeat"
|
||||||
conn = http.client.HTTPSConnection("api.pushover.net:443")
|
conn = http.client.HTTPSConnection("api.pushover.net:443")
|
||||||
try:
|
try:
|
||||||
conn.request(
|
conn.request(
|
||||||
@@ -215,7 +218,7 @@ def _send_mattermost(channel_cfg: dict, notif: Notification) -> bool:
|
|||||||
return False
|
return False
|
||||||
text = f"**{notif.title}**\n{notif.body}"
|
text = f"**{notif.title}**\n{notif.body}"
|
||||||
if notif.url:
|
if notif.url:
|
||||||
text += f"\n[Plugin metrics]({notif.url})"
|
text += f"\n[Plugin metrics] {notif.url}"
|
||||||
ses = {"url": host, "scheme": "http", "basepath": "/api/v4", "port": 8065}
|
ses = {"url": host, "scheme": "http", "basepath": "/api/v4", "port": 8065}
|
||||||
mm = Driver(ses)
|
mm = Driver(ses)
|
||||||
payload: dict = {"text": text, "channel": channel, "username": channel_cfg.get("username", "hbd")}
|
payload: dict = {"text": text, "channel": channel, "username": channel_cfg.get("username", "hbd")}
|
||||||
@@ -299,17 +302,6 @@ async def _send_sms_voipms_async(channel_cfg: dict, notif: Notification) -> bool
|
|||||||
return False
|
return False
|
||||||
|
|
||||||
|
|
||||||
def _send_sms_voipms(channel_cfg: dict, notif: Notification) -> bool:
|
|
||||||
"""Dispatch voip.ms SMS send onto the shared event loop."""
|
|
||||||
if _loop is None:
|
|
||||||
logger.warning("sms_voipms: event loop not available")
|
|
||||||
return False
|
|
||||||
future = asyncio.run_coroutine_threadsafe(_send_sms_voipms_async(channel_cfg, notif), _loop)
|
|
||||||
try:
|
|
||||||
return future.result(timeout=15)
|
|
||||||
except Exception as e:
|
|
||||||
logger.error("sms_voipms send timed out or failed: %s", e)
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
async def _send_matrix_async(channel_cfg: dict, notif: Notification) -> bool:
|
async def _send_matrix_async(channel_cfg: dict, notif: Notification) -> bool:
|
||||||
@@ -357,40 +349,23 @@ async def _send_matrix_async(channel_cfg: dict, notif: Notification) -> bool:
|
|||||||
await client.close()
|
await client.close()
|
||||||
|
|
||||||
|
|
||||||
def _send_matrix(channel_cfg: dict, notif: Notification) -> bool:
|
|
||||||
"""Dispatch matrix send onto the shared event loop."""
|
|
||||||
if _loop is None:
|
|
||||||
logger.warning("matrix: event loop not available")
|
|
||||||
return False
|
|
||||||
future = asyncio.run_coroutine_threadsafe(_send_matrix_async(channel_cfg, notif), _loop)
|
|
||||||
try:
|
|
||||||
return future.result(timeout=15)
|
|
||||||
except Exception as e:
|
|
||||||
logger.error("matrix send timed out or failed: %s", e)
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Channel dispatcher
|
# Channel dispatcher (all async — sync drivers run in a thread executor)
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
# Sync drivers kept for `hbd notify` CLI usage (asyncio.run wraps them there).
|
||||||
_DRIVERS = {
|
_DRIVERS = {
|
||||||
"pushover": _send_pushover,
|
"pushover": _send_pushover,
|
||||||
"email": _send_email,
|
"email": _send_email,
|
||||||
"mattermost": _send_mattermost,
|
"mattermost": _send_mattermost,
|
||||||
"signal": _send_signal,
|
"signal": _send_signal,
|
||||||
"sms_voipms": _send_sms_voipms,
|
|
||||||
"matrix": _send_matrix,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
_TIMEOUT = 15 # seconds per channel send
|
||||||
|
|
||||||
def _dispatch_to_channel(channel_name: str, channel_cfg: dict, notif: Notification) -> bool:
|
|
||||||
"""Send *notif* to a single named channel, honouring min_level.
|
|
||||||
|
|
||||||
RECOVER always bypasses min_level — a recovery is always relevant if the
|
async def _dispatch_to_channel(channel_name: str, channel_cfg: dict, notif: Notification) -> bool:
|
||||||
channel was configured for any alerting (handles the restart-then-recover case
|
"""Send *notif* to a single named channel, honouring min_level."""
|
||||||
where _alerted_channels is empty and we fall through to the normal loop).
|
|
||||||
"""
|
|
||||||
level = notif.level.upper()
|
level = notif.level.upper()
|
||||||
if level != "RECOVER":
|
if level != "RECOVER":
|
||||||
min_level = channel_cfg.get("min_level", "WARNING").upper()
|
min_level = channel_cfg.get("min_level", "WARNING").upper()
|
||||||
@@ -398,14 +373,24 @@ def _dispatch_to_channel(channel_name: str, channel_cfg: dict, notif: Notificati
|
|||||||
logger.debug(
|
logger.debug(
|
||||||
"channel '%s': skipping level %s (min_level=%s)", channel_name, level, min_level
|
"channel '%s': skipping level %s (min_level=%s)", channel_name, level, min_level
|
||||||
)
|
)
|
||||||
return True # not an error — filtered intentionally
|
return True # filtered intentionally
|
||||||
|
|
||||||
ch_type = channel_cfg.get("type", "")
|
ch_type = channel_cfg.get("type", "")
|
||||||
driver = _DRIVERS.get(ch_type)
|
try:
|
||||||
if driver is None:
|
if ch_type == "matrix":
|
||||||
logger.warning("unknown channel type '%s' for channel '%s'", ch_type, channel_name)
|
return await asyncio.wait_for(_send_matrix_async(channel_cfg, notif), timeout=_TIMEOUT)
|
||||||
|
if ch_type == "sms_voipms":
|
||||||
|
return await asyncio.wait_for(_send_sms_voipms_async(channel_cfg, notif), timeout=_TIMEOUT)
|
||||||
|
sync_driver = _DRIVERS.get(ch_type)
|
||||||
|
if sync_driver is None:
|
||||||
|
logger.warning("unknown channel type '%s' for channel '%s'", ch_type, channel_name)
|
||||||
|
return False
|
||||||
|
return await asyncio.wait_for(
|
||||||
|
asyncio.to_thread(sync_driver, channel_cfg, notif), timeout=_TIMEOUT
|
||||||
|
)
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
logger.error("channel '%s' timed out after %ds", channel_name, _TIMEOUT)
|
||||||
return False
|
return False
|
||||||
return driver(channel_cfg, notif)
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
@@ -419,7 +404,7 @@ def _build_url(host_name: str) -> str:
|
|||||||
return f"{base_url}/plugins#{host_name}"
|
return f"{base_url}/plugins#{host_name}"
|
||||||
|
|
||||||
|
|
||||||
def send_notification(host_name: str, notif: Notification) -> dict:
|
async def send_notification(host_name: str, notif: Notification) -> dict:
|
||||||
"""Dispatch *notif* to all managers/owner of *host_name*.
|
"""Dispatch *notif* to all managers/owner of *host_name*.
|
||||||
|
|
||||||
Looks up the host's owner + managers, resolves each user's
|
Looks up the host's owner + managers, resolves each user's
|
||||||
@@ -469,16 +454,12 @@ def send_notification(host_name: str, notif: Notification) -> dict:
|
|||||||
if not channel_cfg:
|
if not channel_cfg:
|
||||||
continue
|
continue
|
||||||
try:
|
try:
|
||||||
ch_type = channel_cfg.get("type", "")
|
ok = await _dispatch_to_channel(channel_name, channel_cfg, notif)
|
||||||
driver = _DRIVERS.get(ch_type)
|
results[channel_name] = ok
|
||||||
if driver:
|
if ok:
|
||||||
ok = driver(channel_cfg, notif)
|
logger.info("recover sent to channel '%s': %s", channel_name, notif.title)
|
||||||
results[channel_name] = ok
|
|
||||||
if ok:
|
|
||||||
logger.info("recover sent to channel '%s': %s", channel_name, notif.title)
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error("error sending recover to channel '%s': %s", channel_name, e)
|
logger.error("error sending recover to channel '%s': %s", channel_name, e)
|
||||||
# Clear the alerted set once recovery is delivered
|
|
||||||
del _alerted_channels[host_name]
|
del _alerted_channels[host_name]
|
||||||
return results
|
return results
|
||||||
|
|
||||||
@@ -489,14 +470,14 @@ def send_notification(host_name: str, notif: Notification) -> dict:
|
|||||||
continue
|
continue
|
||||||
for channel_name in user.notification_channels:
|
for channel_name in user.notification_channels:
|
||||||
if channel_name in results:
|
if channel_name in results:
|
||||||
continue # already dispatched to this channel this notification
|
continue
|
||||||
channel_cfg = global_channels.get(channel_name)
|
channel_cfg = global_channels.get(channel_name)
|
||||||
if not channel_cfg:
|
if not channel_cfg:
|
||||||
logger.warning("channel '%s' not defined in notification_channels", channel_name)
|
logger.warning("channel '%s' not defined in notification_channels", channel_name)
|
||||||
results[channel_name] = False
|
results[channel_name] = False
|
||||||
continue
|
continue
|
||||||
try:
|
try:
|
||||||
ok = _dispatch_to_channel(channel_name, channel_cfg, notif)
|
ok = await _dispatch_to_channel(channel_name, channel_cfg, notif)
|
||||||
results[channel_name] = ok
|
results[channel_name] = ok
|
||||||
if ok:
|
if ok:
|
||||||
logger.info("notification sent to channel '%s': %s", channel_name, notif.title)
|
logger.info("notification sent to channel '%s': %s", channel_name, notif.title)
|
||||||
|
|||||||
@@ -0,0 +1,142 @@
|
|||||||
|
"""Gitea OAuth2 support.
|
||||||
|
|
||||||
|
Config shape (in ~/.hb.yaml):
|
||||||
|
|
||||||
|
oauth:
|
||||||
|
gitea:
|
||||||
|
url: https://git.example.com
|
||||||
|
client_id: <client-id>
|
||||||
|
client_secret: <client-secret>
|
||||||
|
|
||||||
|
Register a Gitea OAuth2 application at:
|
||||||
|
Gitea → Settings → Applications → OAuth2
|
||||||
|
Set the redirect URI to:
|
||||||
|
https://<hbd-host>/login/oauth/gitea/callback
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import secrets
|
||||||
|
import time
|
||||||
|
import urllib.parse
|
||||||
|
|
||||||
|
import aiohttp
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
STATE_TTL = 600 # 10 minutes
|
||||||
|
|
||||||
|
# state_token -> expiry timestamp
|
||||||
|
_states: dict[str, float] = {}
|
||||||
|
|
||||||
|
|
||||||
|
def make_state() -> str:
|
||||||
|
"""Generate a CSRF state token, store it with TTL, and return it."""
|
||||||
|
_purge_states()
|
||||||
|
token = secrets.token_hex(32)
|
||||||
|
_states[token] = time.time() + STATE_TTL
|
||||||
|
return token
|
||||||
|
|
||||||
|
|
||||||
|
def validate_state(state: str) -> bool:
|
||||||
|
"""Return True if *state* is known and unexpired; always removes it."""
|
||||||
|
expiry = _states.pop(state, None)
|
||||||
|
if expiry is None:
|
||||||
|
return False
|
||||||
|
return time.time() < expiry
|
||||||
|
|
||||||
|
|
||||||
|
def _purge_states() -> None:
|
||||||
|
"""Remove all expired CSRF state tokens from the in-memory store."""
|
||||||
|
now = time.time()
|
||||||
|
expired = [k for k, exp in list(_states.items()) if exp < now]
|
||||||
|
for k in expired:
|
||||||
|
del _states[k]
|
||||||
|
|
||||||
|
|
||||||
|
class OAuthError(Exception):
|
||||||
|
"""Raised when the OAuth2 flow fails for any reason."""
|
||||||
|
|
||||||
|
|
||||||
|
def _gitea_cfg(config: dict) -> dict:
|
||||||
|
"""Return the gitea sub-dict or {} if absent/incomplete."""
|
||||||
|
return config.get("oauth", {}).get("gitea", {})
|
||||||
|
|
||||||
|
|
||||||
|
def is_enabled(config: dict) -> bool:
|
||||||
|
"""Return True when all three required Gitea OAuth keys are present."""
|
||||||
|
g = _gitea_cfg(config)
|
||||||
|
return bool(g.get("url") and g.get("client_id") and g.get("client_secret"))
|
||||||
|
|
||||||
|
|
||||||
|
def authorization_url(config: dict, state: str, redirect_uri: str) -> str:
|
||||||
|
"""Return the Gitea OAuth2 authorization URL to redirect the browser to."""
|
||||||
|
g = _gitea_cfg(config)
|
||||||
|
if not (g.get("url") and g.get("client_id") and g.get("client_secret")):
|
||||||
|
raise OAuthError("Gitea OAuth2 is not configured")
|
||||||
|
params = urllib.parse.urlencode({
|
||||||
|
"client_id": g["client_id"],
|
||||||
|
"redirect_uri": redirect_uri,
|
||||||
|
"response_type": "code",
|
||||||
|
"scope": "user:email",
|
||||||
|
"state": state,
|
||||||
|
})
|
||||||
|
return f"{g['url'].rstrip('/')}/login/oauth/authorize?{params}"
|
||||||
|
|
||||||
|
|
||||||
|
async def exchange_code(config: dict, code: str, redirect_uri: str) -> str:
|
||||||
|
"""Exchange an authorization *code* for a Gitea access token.
|
||||||
|
|
||||||
|
Returns the access token string. Raises OAuthError on any failure.
|
||||||
|
"""
|
||||||
|
g = _gitea_cfg(config)
|
||||||
|
if not (g.get("url") and g.get("client_id") and g.get("client_secret")):
|
||||||
|
raise OAuthError("Gitea OAuth2 is not configured")
|
||||||
|
url = f"{g['url'].rstrip('/')}/login/oauth/access_token"
|
||||||
|
payload = {
|
||||||
|
"client_id": g["client_id"],
|
||||||
|
"client_secret": g["client_secret"],
|
||||||
|
"code": code,
|
||||||
|
"grant_type": "authorization_code",
|
||||||
|
"redirect_uri": redirect_uri,
|
||||||
|
}
|
||||||
|
timeout = aiohttp.ClientTimeout(total=10)
|
||||||
|
try:
|
||||||
|
async with aiohttp.ClientSession(timeout=timeout) as session:
|
||||||
|
async with session.post(url, json=payload, headers={"Accept": "application/json"}) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
text = await resp.text()
|
||||||
|
raise OAuthError(f"Token exchange failed ({resp.status}): {text}")
|
||||||
|
data = await resp.json()
|
||||||
|
token = data.get("access_token")
|
||||||
|
if not token:
|
||||||
|
raise OAuthError(f"No access_token in response: {data}")
|
||||||
|
except aiohttp.ClientError as exc:
|
||||||
|
raise OAuthError(f"Token exchange network error: {exc}") from exc
|
||||||
|
return token
|
||||||
|
|
||||||
|
|
||||||
|
async def fetch_user(config: dict, token: str) -> dict:
|
||||||
|
"""Fetch the authenticated user's profile from Gitea.
|
||||||
|
|
||||||
|
Returns a dict with keys: login, full_name, avatar_url.
|
||||||
|
Raises OAuthError on any failure.
|
||||||
|
"""
|
||||||
|
g = _gitea_cfg(config)
|
||||||
|
if not (g.get("url") and g.get("client_id") and g.get("client_secret")):
|
||||||
|
raise OAuthError("Gitea OAuth2 is not configured")
|
||||||
|
url = f"{g['url'].rstrip('/')}/api/v1/user"
|
||||||
|
timeout = aiohttp.ClientTimeout(total=10)
|
||||||
|
try:
|
||||||
|
async with aiohttp.ClientSession(timeout=timeout) as session:
|
||||||
|
async with session.get(url, headers={"Authorization": f"token {token}"}) as resp:
|
||||||
|
if resp.status != 200:
|
||||||
|
text = await resp.text()
|
||||||
|
raise OAuthError(f"User fetch failed ({resp.status}): {text}")
|
||||||
|
data = await resp.json()
|
||||||
|
except aiohttp.ClientError as exc:
|
||||||
|
raise OAuthError(f"User fetch network error: {exc}") from exc
|
||||||
|
return {
|
||||||
|
"login": data.get("login", ""),
|
||||||
|
"full_name": data.get("full_name", ""),
|
||||||
|
"avatar_url": data.get("avatar_url", ""),
|
||||||
|
}
|
||||||
+48
-3
@@ -24,7 +24,7 @@ sensitive bool True when the raw value must never be shown
|
|||||||
# Credential field names that should always be masked.
|
# Credential field names that should always be masked.
|
||||||
_SECRET_KEYS = frozenset({
|
_SECRET_KEYS = frozenset({
|
||||||
"password", "token", "user_key", "api_key", "secret",
|
"password", "token", "user_key", "api_key", "secret",
|
||||||
"smtp_password", "smtp_user",
|
"smtp_password", "smtp_user", "api_password", "access_token",
|
||||||
})
|
})
|
||||||
|
|
||||||
_CHANNEL_TYPE_LABELS = {
|
_CHANNEL_TYPE_LABELS = {
|
||||||
@@ -88,7 +88,7 @@ def _sanitize_channel(name, cfg):
|
|||||||
# Public API
|
# Public API
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
def get_settings_sections(config: dict) -> list:
|
def get_settings_sections(config: dict, threshold_checker=None) -> list:
|
||||||
"""Return ordered list of setting sections for the settings page.
|
"""Return ordered list of setting sections for the settings page.
|
||||||
|
|
||||||
Each section:
|
Each section:
|
||||||
@@ -181,6 +181,41 @@ def get_settings_sections(config: dict) -> list:
|
|||||||
"notification_channels": attrs.get("notification_channels", []),
|
"notification_channels": attrs.get("notification_channels", []),
|
||||||
})
|
})
|
||||||
|
|
||||||
|
# ---- Threshold configurations -----------------------------------------
|
||||||
|
def _tc_to_row(tc):
|
||||||
|
return {
|
||||||
|
"metric": tc.metric_path,
|
||||||
|
"operator": tc.operator.value,
|
||||||
|
"warning": tc.warning,
|
||||||
|
"critical": tc.critical,
|
||||||
|
"hysteresis": tc.hysteresis,
|
||||||
|
"count": tc.count,
|
||||||
|
"enabled": tc.enabled,
|
||||||
|
}
|
||||||
|
|
||||||
|
threshold_config_list = []
|
||||||
|
if threshold_checker is not None:
|
||||||
|
if threshold_checker.threshold_configs:
|
||||||
|
for cfg_name, cfg_metrics in sorted(threshold_checker.threshold_configs.items()):
|
||||||
|
# For the default config use the merged effective set;
|
||||||
|
# for named overrides use only the explicitly defined metrics
|
||||||
|
# (threshold_raw_configs) so inherited defaults are not repeated.
|
||||||
|
if cfg_name == "default":
|
||||||
|
display_metrics = cfg_metrics
|
||||||
|
else:
|
||||||
|
display_metrics = threshold_checker.threshold_raw_configs.get(cfg_name, cfg_metrics)
|
||||||
|
metrics = sorted(
|
||||||
|
[_tc_to_row(tc) for tc in display_metrics.values()],
|
||||||
|
key=lambda m: m["metric"],
|
||||||
|
)
|
||||||
|
threshold_config_list.append({"name": cfg_name, "metrics": metrics})
|
||||||
|
elif threshold_checker.thresholds:
|
||||||
|
metrics = sorted(
|
||||||
|
[_tc_to_row(tc) for tc in threshold_checker.thresholds.values()],
|
||||||
|
key=lambda m: m["metric"],
|
||||||
|
)
|
||||||
|
threshold_config_list.append({"name": "default", "metrics": metrics})
|
||||||
|
|
||||||
# ---- Hosts summary ----------------------------------------------------
|
# ---- Hosts summary ----------------------------------------------------
|
||||||
hosts_list = []
|
hosts_list = []
|
||||||
for hname, hcfg in (config.get("hosts") or {}).items():
|
for hname, hcfg in (config.get("hosts") or {}).items():
|
||||||
@@ -188,7 +223,7 @@ def get_settings_sections(config: dict) -> list:
|
|||||||
continue
|
continue
|
||||||
hosts_list.append({
|
hosts_list.append({
|
||||||
"name": hname,
|
"name": hname,
|
||||||
"watch": bool(hcfg.get("watch", False)),
|
"watch": bool(hcfg.get("watch", True)),
|
||||||
"dyndns": bool(hcfg.get("dyndns", False)),
|
"dyndns": bool(hcfg.get("dyndns", False)),
|
||||||
"owner": hcfg.get("owner", ""),
|
"owner": hcfg.get("owner", ""),
|
||||||
"managers": hcfg.get("managers", []),
|
"managers": hcfg.get("managers", []),
|
||||||
@@ -312,6 +347,16 @@ def get_settings_sections(config: dict) -> list:
|
|||||||
"hosts": hosts_list,
|
"hosts": hosts_list,
|
||||||
"fields": [],
|
"fields": [],
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"id": "thresholds",
|
||||||
|
"title": "Threshold Configurations",
|
||||||
|
"description": "Named alert threshold sets. Each defines warning/critical levels per metric.",
|
||||||
|
"threshold_configs": threshold_config_list,
|
||||||
|
"fields": [
|
||||||
|
field("default_threshold_config", "Default config", "text",
|
||||||
|
"Threshold config used for hosts with no explicit mapping."),
|
||||||
|
],
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": "runtime",
|
"id": "runtime",
|
||||||
"title": "Runtime",
|
"title": "Runtime",
|
||||||
|
|||||||
@@ -0,0 +1,199 @@
|
|||||||
|
<!DOCTYPE html>
|
||||||
|
<html>
|
||||||
|
{% include 'head.html' %}
|
||||||
|
|
||||||
|
<style>
|
||||||
|
html, body { overflow: visible; }
|
||||||
|
|
||||||
|
.container {
|
||||||
|
max-width: 700px;
|
||||||
|
margin: 0 auto;
|
||||||
|
}
|
||||||
|
|
||||||
|
h1 {
|
||||||
|
color: #333;
|
||||||
|
margin-bottom: 4px;
|
||||||
|
font-size: 1.5em;
|
||||||
|
}
|
||||||
|
|
||||||
|
.subtitle {
|
||||||
|
color: #666;
|
||||||
|
margin-bottom: 24px;
|
||||||
|
font-size: 0.9em;
|
||||||
|
}
|
||||||
|
|
||||||
|
.section {
|
||||||
|
background: #fff;
|
||||||
|
border-radius: 8px;
|
||||||
|
box-shadow: 0 1px 6px rgba(0,0,0,0.1);
|
||||||
|
padding: 20px 24px;
|
||||||
|
margin-bottom: 20px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.section h2 {
|
||||||
|
font-size: 1em;
|
||||||
|
font-weight: 700;
|
||||||
|
color: #333;
|
||||||
|
margin: 0 0 16px;
|
||||||
|
padding-bottom: 10px;
|
||||||
|
border-bottom: 1px solid #eee;
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.5px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.info-row {
|
||||||
|
display: flex;
|
||||||
|
align-items: baseline;
|
||||||
|
padding: 8px 0;
|
||||||
|
border-bottom: 1px solid #f5f5f5;
|
||||||
|
font-size: 0.9em;
|
||||||
|
}
|
||||||
|
.info-row:last-child { border-bottom: none; }
|
||||||
|
|
||||||
|
.info-label {
|
||||||
|
width: 160px;
|
||||||
|
flex-shrink: 0;
|
||||||
|
color: #666;
|
||||||
|
font-size: 0.88em;
|
||||||
|
}
|
||||||
|
|
||||||
|
.info-value {
|
||||||
|
color: #222;
|
||||||
|
word-break: break-all;
|
||||||
|
}
|
||||||
|
|
||||||
|
.info-value a {
|
||||||
|
color: #0066cc;
|
||||||
|
text-decoration: none;
|
||||||
|
}
|
||||||
|
.info-value a:hover { text-decoration: underline; }
|
||||||
|
|
||||||
|
.version-badge {
|
||||||
|
display: inline-block;
|
||||||
|
padding: 3px 12px;
|
||||||
|
background: #e8f0fe;
|
||||||
|
color: #1a73e8;
|
||||||
|
border-radius: 12px;
|
||||||
|
font-size: 0.85em;
|
||||||
|
font-weight: 600;
|
||||||
|
font-family: monospace;
|
||||||
|
}
|
||||||
|
|
||||||
|
.hb-logo {
|
||||||
|
font-size: 2.5em;
|
||||||
|
font-weight: 700;
|
||||||
|
color: #0066cc;
|
||||||
|
letter-spacing: -1px;
|
||||||
|
margin-bottom: 6px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.hb-tagline {
|
||||||
|
color: #555;
|
||||||
|
font-size: 0.95em;
|
||||||
|
}
|
||||||
|
|
||||||
|
.logo-section {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 20px;
|
||||||
|
padding: 8px 0 4px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.logo-text { flex: 1; }
|
||||||
|
</style>
|
||||||
|
|
||||||
|
<body>
|
||||||
|
{% include 'nav.html' %}
|
||||||
|
|
||||||
|
<div class="container">
|
||||||
|
<h1>{{ header }}</h1>
|
||||||
|
<p class="subtitle">Heartbeat monitoring system</p>
|
||||||
|
|
||||||
|
<div class="section">
|
||||||
|
<div class="logo-section">
|
||||||
|
<div class="logo-text">
|
||||||
|
<div class="hb-logo">Heartbeat</div>
|
||||||
|
<div class="hb-tagline">Lightweight host monitoring over UDP</div>
|
||||||
|
</div>
|
||||||
|
<span class="version-badge">v{{ hbd_version }}</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="section">
|
||||||
|
<h2>Version</h2>
|
||||||
|
<div class="info-row">
|
||||||
|
<span class="info-label">Server version</span>
|
||||||
|
<span class="info-value">{{ hbd_version }}</span>
|
||||||
|
</div>
|
||||||
|
<div class="info-row">
|
||||||
|
<span class="info-label">Python</span>
|
||||||
|
<span class="info-value">{{ python_version }}</span>
|
||||||
|
</div>
|
||||||
|
<div class="info-row">
|
||||||
|
<span class="info-label">License</span>
|
||||||
|
<span class="info-value">MIT</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="section">
|
||||||
|
<h2>Runtime</h2>
|
||||||
|
<div class="info-row">
|
||||||
|
<span class="info-label">Host</span>
|
||||||
|
<span class="info-value">{{ server_hostname }}</span>
|
||||||
|
</div>
|
||||||
|
<div class="info-row">
|
||||||
|
<span class="info-label">Started</span>
|
||||||
|
<span class="info-value">{{ start_time_str }}</span>
|
||||||
|
</div>
|
||||||
|
<div class="info-row">
|
||||||
|
<span class="info-label">Uptime</span>
|
||||||
|
<span class="info-value" id="uptime-value">{{ uptime_str }}</span>
|
||||||
|
</div>
|
||||||
|
<div class="info-row">
|
||||||
|
<span class="info-label">Hosts monitored</span>
|
||||||
|
<span class="info-value">{{ host_count }}</span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="section">
|
||||||
|
<h2>Contact & Source</h2>
|
||||||
|
<div class="info-row">
|
||||||
|
<span class="info-label">Author</span>
|
||||||
|
<span class="info-value">Andreas Wrede</span>
|
||||||
|
</div>
|
||||||
|
<div class="info-row">
|
||||||
|
<span class="info-label">Email</span>
|
||||||
|
<span class="info-value"><a href="mailto:aew@wrede.ca">aew@wrede.ca</a></span>
|
||||||
|
</div>
|
||||||
|
<div class="info-row">
|
||||||
|
<span class="info-label">Repository</span>
|
||||||
|
<span class="info-value"><a href="https://git.wrede.ca/andreas/heartbeat" target="_blank" rel="noopener">git.wrede.ca/andreas/heartbeat</a></span>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
(function() {
|
||||||
|
var startEpoch = {{ start_epoch }};
|
||||||
|
var el = document.getElementById('uptime-value');
|
||||||
|
if (!el) return;
|
||||||
|
function fmt(s) {
|
||||||
|
var d = Math.floor(s / 86400);
|
||||||
|
var h = Math.floor((s % 86400) / 3600);
|
||||||
|
var m = Math.floor((s % 3600) / 60);
|
||||||
|
var sec = s % 60;
|
||||||
|
if (d > 0) return d + 'd ' + h + 'h ' + m + 'm';
|
||||||
|
if (h > 0) return h + 'h ' + m + 'm ' + sec + 's';
|
||||||
|
return m + 'm ' + sec + 's';
|
||||||
|
}
|
||||||
|
function tick() {
|
||||||
|
var up = Math.floor(Date.now() / 1000 - startEpoch);
|
||||||
|
el.textContent = fmt(up);
|
||||||
|
}
|
||||||
|
tick();
|
||||||
|
setInterval(tick, 1000);
|
||||||
|
})();
|
||||||
|
</script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
@@ -4,12 +4,17 @@
|
|||||||
|
|
||||||
<style>
|
<style>
|
||||||
|
|
||||||
|
html, body {
|
||||||
|
height: auto;
|
||||||
|
overflow-y: auto;
|
||||||
|
}
|
||||||
|
|
||||||
.container {
|
.container {
|
||||||
max-width: 1400px;
|
max-width: 1400px;
|
||||||
margin: 0 auto;
|
margin: 0 auto;
|
||||||
}
|
}
|
||||||
|
|
||||||
h1 { color: #333; margin-bottom: 10px; font-size: 1.5em; }
|
h1 { color: #333; margin-bottom: 5px; margin-top: 15px; font-size: 1.5em; }
|
||||||
|
|
||||||
.subtitle {
|
.subtitle {
|
||||||
color: #666;
|
color: #666;
|
||||||
@@ -170,14 +175,18 @@
|
|||||||
|
|
||||||
.alert-hostname {
|
.alert-hostname {
|
||||||
font-weight: bold;
|
font-weight: bold;
|
||||||
color: #333;
|
color: #0066cc;
|
||||||
font-size: 1.1em;
|
font-size: 1.1em;
|
||||||
|
text-decoration: none;
|
||||||
|
}
|
||||||
|
.alert-hostname:hover {
|
||||||
|
text-decoration: underline;
|
||||||
}
|
}
|
||||||
|
|
||||||
.alert-metric {
|
.alert-metric {
|
||||||
color: #666;
|
color: #0066cc;
|
||||||
font-family: 'Courier New', monospace;
|
font-size: 1.1em;
|
||||||
font-size: 0.9em;
|
font-weight: normal;
|
||||||
}
|
}
|
||||||
|
|
||||||
.alert-details {
|
.alert-details {
|
||||||
@@ -400,6 +409,10 @@
|
|||||||
} else if (alert.threshold_value !== undefined && alert.threshold_value !== null && alert.operator) {
|
} else if (alert.threshold_value !== undefined && alert.threshold_value !== null && alert.operator) {
|
||||||
valueText += ` <span class="threshold-info">(threshold: ${alert.operator} ${formatValue(alert.threshold_value)})</span>`;
|
valueText += ` <span class="threshold-info">(threshold: ${alert.operator} ${formatValue(alert.threshold_value)})</span>`;
|
||||||
}
|
}
|
||||||
|
if (alert.recovery_threshold !== undefined && alert.recovery_threshold !== null) {
|
||||||
|
const recOp = (alert.operator === '>' || alert.operator === '>=') ? '<' : '>';
|
||||||
|
valueText += ` <span class="threshold-info" style="color:#888">(recovers ${recOp} ${formatValue(alert.recovery_threshold)})</span>`;
|
||||||
|
}
|
||||||
|
|
||||||
// Build actions section
|
// Build actions section
|
||||||
let actionsHtml = '';
|
let actionsHtml = '';
|
||||||
@@ -424,9 +437,9 @@
|
|||||||
<div class="alert-main">
|
<div class="alert-main">
|
||||||
<div class="alert-header">
|
<div class="alert-header">
|
||||||
<span class="alert-level ${level}">${alert.level}</span>
|
<span class="alert-level ${level}">${alert.level}</span>
|
||||||
<span class="alert-hostname">${alert.hostname}</span>
|
<a class="alert-hostname" href="/plugins#${alert.hostname}">${alert.hostname}</a>
|
||||||
|
<span class="alert-metric">${(alert.metric_path.includes('.') ? alert.metric_path.slice(alert.metric_path.indexOf('.') + 1) : alert.metric_path).replace(/_status_code$/, '')}</span>
|
||||||
</div>
|
</div>
|
||||||
<div class="alert-metric">${alert.metric_path}</div>
|
|
||||||
<div class="alert-details">
|
<div class="alert-details">
|
||||||
<span>${valueText}</span>
|
<span>${valueText}</span>
|
||||||
<span class="alert-duration">Active for ${duration}</span>
|
<span class="alert-duration">Active for ${duration}</span>
|
||||||
|
|||||||
@@ -15,6 +15,7 @@
|
|||||||
body {
|
body {
|
||||||
margin: 0;
|
margin: 0;
|
||||||
padding: 10px;
|
padding: 10px;
|
||||||
|
padding-top: 60px;
|
||||||
background: #f5f5f5;
|
background: #f5f5f5;
|
||||||
}
|
}
|
||||||
h1 { font-size: 1.5em; color: #333; margin: 0 0 5px; }
|
h1 { font-size: 1.5em; color: #333; margin: 0 0 5px; }
|
||||||
@@ -23,11 +24,14 @@
|
|||||||
|
|
||||||
/* Navigation bar — shared across all pages */
|
/* Navigation bar — shared across all pages */
|
||||||
.nav {
|
.nav {
|
||||||
|
position: fixed;
|
||||||
|
top: 0;
|
||||||
|
left: 0;
|
||||||
|
right: 0;
|
||||||
|
z-index: 200;
|
||||||
background: #fff;
|
background: #fff;
|
||||||
padding: 6px 12px;
|
padding: 6px 12px;
|
||||||
margin-bottom: 10px;
|
|
||||||
box-shadow: 0 2px 4px rgba(0,0,0,.1);
|
box-shadow: 0 2px 4px rgba(0,0,0,.1);
|
||||||
border-radius: 4px;
|
|
||||||
display: flex;
|
display: flex;
|
||||||
align-items: center;
|
align-items: center;
|
||||||
justify-content: space-between;
|
justify-content: space-between;
|
||||||
@@ -122,11 +126,17 @@
|
|||||||
}
|
}
|
||||||
|
|
||||||
/* Swiss railway clock — nav */
|
/* Swiss railway clock — nav */
|
||||||
.nav-clock {
|
.nav-pie {
|
||||||
flex-shrink: 0;
|
flex-shrink: 0;
|
||||||
line-height: 0;
|
line-height: 0;
|
||||||
margin-left: auto;
|
margin-left: auto;
|
||||||
padding: 4px 4px 4px 0;
|
padding: 4px 4px 4px 0;
|
||||||
|
}
|
||||||
|
#alert-pie { display: block; cursor: default; }
|
||||||
|
.nav-clock {
|
||||||
|
flex-shrink: 0;
|
||||||
|
line-height: 0;
|
||||||
|
padding: 4px 4px 4px 0;
|
||||||
cursor: pointer;
|
cursor: pointer;
|
||||||
}
|
}
|
||||||
#swiss-clock { display: block; }
|
#swiss-clock { display: block; }
|
||||||
@@ -204,7 +214,7 @@
|
|||||||
ctx.restore();
|
ctx.restore();
|
||||||
}
|
}
|
||||||
|
|
||||||
hand((m + s / 60) / 60 * Math.PI * 2 - Math.PI / 2,
|
hand((sFrac >= 58.5 ? m + 1 : m) / 60 * Math.PI * 2 - Math.PI / 2,
|
||||||
R * 0.88, -R * 0.12, SIZE * 0.027, '#222'); /* minute */
|
R * 0.88, -R * 0.12, SIZE * 0.027, '#222'); /* minute */
|
||||||
hand((h + m / 60) / 12 * Math.PI * 2 - Math.PI / 2,
|
hand((h + m / 60) / 12 * Math.PI * 2 - Math.PI / 2,
|
||||||
R * 0.58, -R * 0.12, SIZE * 0.039, '#222'); /* hour */
|
R * 0.58, -R * 0.12, SIZE * 0.039, '#222'); /* hour */
|
||||||
|
|||||||
@@ -45,6 +45,7 @@
|
|||||||
h1 {
|
h1 {
|
||||||
color: #333;
|
color: #333;
|
||||||
margin-bottom: 5px;
|
margin-bottom: 5px;
|
||||||
|
margin-top: 15px;
|
||||||
font-size: 1.5em;
|
font-size: 1.5em;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -182,11 +183,24 @@
|
|||||||
line-height: 1.0;
|
line-height: 1.0;
|
||||||
}
|
}
|
||||||
|
|
||||||
#messages div {
|
#messages .log-entry {
|
||||||
padding: 5px 0;
|
padding: 5px 0;
|
||||||
border-bottom: 1px solid #f0f0f0;
|
border-bottom: 1px solid #f0f0f0;
|
||||||
|
display: flex;
|
||||||
|
gap: 0.5em;
|
||||||
|
align-items: baseline;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.log-ts { color: #888; white-space: nowrap; }
|
||||||
|
.log-level { font-weight: bold; min-width: 6em; }
|
||||||
|
.log-host { font-weight: 600; }
|
||||||
|
.log-service { color: #888; }
|
||||||
|
|
||||||
|
.log-warning .log-level { color: #b8860b; }
|
||||||
|
.log-critical .log-level { color: #c00; }
|
||||||
|
.log-recover .log-level { color: #2a7a2a; }
|
||||||
|
.log-info .log-level { color: #555; }
|
||||||
|
|
||||||
/* Modal for connection status messages */
|
/* Modal for connection status messages */
|
||||||
.connection-modal {
|
.connection-modal {
|
||||||
display: none;
|
display: none;
|
||||||
@@ -235,6 +249,8 @@
|
|||||||
color: #ff9800;
|
color: #ff9800;
|
||||||
font-weight: 700;
|
font-weight: 700;
|
||||||
}
|
}
|
||||||
|
#ntable a.host-link { color: inherit; text-decoration: none; }
|
||||||
|
#ntable a.host-link:hover { text-decoration: underline; }
|
||||||
</style>
|
</style>
|
||||||
<script type="text/javascript">
|
<script type="text/javascript">
|
||||||
var cnt = 0;
|
var cnt = 0;
|
||||||
@@ -244,11 +260,13 @@
|
|||||||
var HBD_VERSION = "{{ hbd_version }}";
|
var HBD_VERSION = "{{ hbd_version }}";
|
||||||
|
|
||||||
function hostNameHtml(data) {
|
function hostNameHtml(data) {
|
||||||
|
var rawName = data.raw_name || data.name.replace(/<[^>]+>/g, '').replace('*', '').trim();
|
||||||
var nameHtml = data.name;
|
var nameHtml = data.name;
|
||||||
if (!data.hbc_version || data.hbc_version !== HBD_VERSION) {
|
if (!data.hbc_version || data.hbc_version !== HBD_VERSION) {
|
||||||
nameHtml += ' 🥀';
|
nameHtml += ' 🥀';
|
||||||
}
|
}
|
||||||
return data.dyn ? '<b>' + nameHtml + '</b>' : nameHtml;
|
var display = data.dyn ? '<b>' + nameHtml + '</b>' : nameHtml;
|
||||||
|
return '<a class="host-link" href="/plugins#' + encodeURIComponent(rawName) + '">' + display + '</a>';
|
||||||
}
|
}
|
||||||
|
|
||||||
function setup() {
|
function setup() {
|
||||||
@@ -403,7 +421,7 @@
|
|||||||
);
|
);
|
||||||
if (data.connections[i].state == "up") {
|
if (data.connections[i].state == "up") {
|
||||||
state = '<span class="state-up">up</span>';
|
state = '<span class="state-up">up</span>';
|
||||||
latency = Number.parseFloat(data.connections[i].rtts[0]).toFixed(2);
|
latency = String(Math.round(Number.parseFloat(data.connections[i].rtts[0])));
|
||||||
} else {
|
} else {
|
||||||
if (data.connections[i].state == "unknown") {
|
if (data.connections[i].state == "unknown") {
|
||||||
state = "";
|
state = "";
|
||||||
@@ -455,7 +473,20 @@
|
|||||||
update_table(state.data);
|
update_table(state.data);
|
||||||
} else if (state.type == "message") {
|
} else if (state.type == "message") {
|
||||||
var msgs = document.getElementById("messages");
|
var msgs = document.getElementById("messages");
|
||||||
msgs.insertAdjacentHTML("afterbegin", "<div>" + state.data + "</div>");
|
var msg = state.data;
|
||||||
|
var _d = new Date(msg.ts * 1000);
|
||||||
|
function _p(n) { return n < 10 ? '0' + n : '' + n; }
|
||||||
|
var ts_str = _d.getFullYear() + '-' + _p(_d.getMonth()+1) + '-' + _p(_d.getDate())
|
||||||
|
+ ' ' + _p(_d.getHours()) + ':' + _p(_d.getMinutes()) + ':' + _p(_d.getSeconds());
|
||||||
|
var lvl = (msg.level || "INFO").toLowerCase();
|
||||||
|
var html = '<div class="log-entry log-' + lvl + '">';
|
||||||
|
html += '<span class="log-ts">' + ts_str + '</span>';
|
||||||
|
html += '<span class="log-level">' + (msg.level || "") + '</span>';
|
||||||
|
if (msg.host) html += '<span class="log-host">' + msg.host + '</span>';
|
||||||
|
if (msg.service) html += '<span class="log-service">' + msg.service + '</span>';
|
||||||
|
html += '<span class="log-msg">' + msg.message + '</span>';
|
||||||
|
html += '</div>';
|
||||||
|
msgs.insertAdjacentHTML("afterbegin", html);
|
||||||
}
|
}
|
||||||
cnt++;
|
cnt++;
|
||||||
};
|
};
|
||||||
@@ -510,7 +541,7 @@
|
|||||||
<tbody id="ntablebody">
|
<tbody id="ntablebody">
|
||||||
{% for host in hosts %}
|
{% for host in hosts %}
|
||||||
<tr class="{% if host.alert_critical_unacked > 0 or host.alert_critical_acked > 0 %}row-critical{% elif host.alert_warning_unacked > 0 or host.alert_warning_acked > 0 %}row-warning{% endif %}">
|
<tr class="{% if host.alert_critical_unacked > 0 or host.alert_critical_acked > 0 %}row-critical{% elif host.alert_warning_unacked > 0 or host.alert_warning_acked > 0 %}row-warning{% endif %}">
|
||||||
<td data-name="{{ host.name }}">{{ host.name }}{% if not host.hbc_version or host.hbc_version != hbd_version %} 🥀{% endif %}</td>
|
<td data-name="{{ host.name }}"><a class="host-link" href="/plugins#{{ host.raw_name | urlencode }}">{{ host.name }}{% if not host.hbc_version or host.hbc_version != hbd_version %} 🥀{% endif %}</a></td>
|
||||||
<td style="text-align: center; color: #ff9800; font-weight: bold;">
|
<td style="text-align: center; color: #ff9800; font-weight: bold;">
|
||||||
{%- set warning_unacked = host.alert_warning_unacked -%}
|
{%- set warning_unacked = host.alert_warning_unacked -%}
|
||||||
{%- set warning_acked = host.alert_warning_acked -%}
|
{%- set warning_acked = host.alert_warning_acked -%}
|
||||||
|
|||||||
@@ -9,6 +9,10 @@
|
|||||||
{% if current_user and current_user.admin %}
|
{% if current_user and current_user.admin %}
|
||||||
<a href="/settings"{% if active_page == "settings" %} class="active"{% endif %}>Settings</a>
|
<a href="/settings"{% if active_page == "settings" %} class="active"{% endif %}>Settings</a>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
<a href="/about"{% if active_page == "about" %} class="active"{% endif %}>About</a>
|
||||||
|
</div>
|
||||||
|
<div class="nav-pie" title="Host alert status">
|
||||||
|
<canvas id="alert-pie" width="44" height="44"></canvas>
|
||||||
</div>
|
</div>
|
||||||
<div class="nav-clock" title="Click for full-screen clock">
|
<div class="nav-clock" title="Click for full-screen clock">
|
||||||
<canvas id="swiss-clock" width="44" height="44"></canvas>
|
<canvas id="swiss-clock" width="44" height="44"></canvas>
|
||||||
@@ -41,4 +45,52 @@
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
})();
|
})();
|
||||||
|
|
||||||
|
function drawAlertPie(critical, warning, ok) {
|
||||||
|
var canvas = document.getElementById('alert-pie');
|
||||||
|
if (!canvas) return;
|
||||||
|
var ctx = canvas.getContext('2d');
|
||||||
|
var SIZE = canvas.width;
|
||||||
|
var R = SIZE / 2;
|
||||||
|
ctx.clearRect(0, 0, SIZE, SIZE);
|
||||||
|
var total = critical + warning + ok;
|
||||||
|
if (total === 0) {
|
||||||
|
ctx.beginPath();
|
||||||
|
ctx.arc(R, R, R - 1, 0, Math.PI * 2);
|
||||||
|
ctx.fillStyle = '#ccc';
|
||||||
|
ctx.fill();
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
var slices = [
|
||||||
|
{ value: critical, color: '#e53935' },
|
||||||
|
{ value: warning, color: '#ffb300' },
|
||||||
|
{ value: ok, color: '#43a047' }
|
||||||
|
];
|
||||||
|
var start = -Math.PI / 2;
|
||||||
|
slices.forEach(function(s) {
|
||||||
|
if (s.value === 0) return;
|
||||||
|
var sweep = (s.value / total) * Math.PI * 2;
|
||||||
|
ctx.beginPath();
|
||||||
|
ctx.moveTo(R, R);
|
||||||
|
ctx.arc(R, R, R - 1, start, start + sweep);
|
||||||
|
ctx.closePath();
|
||||||
|
ctx.fillStyle = s.color;
|
||||||
|
ctx.fill();
|
||||||
|
start += sweep;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
function updateAlertPie() {
|
||||||
|
fetch('/api/0/alert_summary').then(function(r) {
|
||||||
|
if (!r.ok) return;
|
||||||
|
return r.json();
|
||||||
|
}).then(function(d) {
|
||||||
|
if (d) drawAlertPie(d.critical || 0, d.warning || 0, d.ok || 0);
|
||||||
|
}).catch(function() {});
|
||||||
|
}
|
||||||
|
|
||||||
|
document.addEventListener('DOMContentLoaded', function() {
|
||||||
|
updateAlertPie();
|
||||||
|
setInterval(updateAlertPie, 30000);
|
||||||
|
});
|
||||||
</script>
|
</script>
|
||||||
|
|||||||
@@ -16,6 +16,7 @@
|
|||||||
h1 {
|
h1 {
|
||||||
color: #333;
|
color: #333;
|
||||||
margin-bottom: 5px;
|
margin-bottom: 5px;
|
||||||
|
margin-top: 15px;
|
||||||
font-size: 1.5em;
|
font-size: 1.5em;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -130,6 +131,52 @@
|
|||||||
text-overflow: ellipsis;
|
text-overflow: ellipsis;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
.host-action-btn {
|
||||||
|
font-size: 0.75em;
|
||||||
|
font-weight: bold;
|
||||||
|
padding: 3px 10px;
|
||||||
|
border-radius: 4px;
|
||||||
|
border: none;
|
||||||
|
cursor: pointer;
|
||||||
|
text-decoration: none;
|
||||||
|
white-space: nowrap;
|
||||||
|
}
|
||||||
|
.host-action-btn.update-btn {
|
||||||
|
background: #e3f2fd;
|
||||||
|
color: #1565c0;
|
||||||
|
}
|
||||||
|
.host-action-btn.update-btn:hover { background: #bbdefb; }
|
||||||
|
.host-action-btn.delete-btn {
|
||||||
|
background: #ffebee;
|
||||||
|
color: #c62828;
|
||||||
|
}
|
||||||
|
.host-action-btn.delete-btn:hover { background: #ffcdd2; }
|
||||||
|
|
||||||
|
/* ── Action result toast ───────────────────────────────────── */
|
||||||
|
#action-toast {
|
||||||
|
position: fixed;
|
||||||
|
bottom: 24px;
|
||||||
|
left: 50%;
|
||||||
|
transform: translateX(-50%) translateY(20px);
|
||||||
|
background: #323232;
|
||||||
|
color: #fff;
|
||||||
|
padding: 12px 22px;
|
||||||
|
border-radius: 6px;
|
||||||
|
font-size: 0.9em;
|
||||||
|
max-width: 480px;
|
||||||
|
text-align: center;
|
||||||
|
opacity: 0;
|
||||||
|
pointer-events: none;
|
||||||
|
transition: opacity 0.25s, transform 0.25s;
|
||||||
|
z-index: 9000;
|
||||||
|
white-space: pre-wrap;
|
||||||
|
}
|
||||||
|
#action-toast.show {
|
||||||
|
opacity: 1;
|
||||||
|
transform: translateX(-50%) translateY(0);
|
||||||
|
}
|
||||||
|
#action-toast.error { background: #c62828; }
|
||||||
|
|
||||||
/* ── Host body ──────────────────────────────────────────────── */
|
/* ── Host body ──────────────────────────────────────────────── */
|
||||||
|
|
||||||
.host-body {
|
.host-body {
|
||||||
@@ -369,7 +416,8 @@
|
|||||||
<span class="host-name">{{ host.name }}</span>
|
<span class="host-name">{{ host.name }}</span>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="glance-strip" id="glance-{{ host.name }}">
|
<div class="glance-strip" id="glance-{{ host.name }}" data-owner="{{ host.owner or '' }}">
|
||||||
|
{% if current_user and current_user.admin and host.owner %}<span class="glance-chip neutral">{{ host.owner }}</span>{% endif %}
|
||||||
<span class="glance-loading">—</span>
|
<span class="glance-loading">—</span>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
@@ -378,11 +426,17 @@
|
|||||||
<span class="nagios-badge" id="nagios-badge-{{ host.name }}">—</span>
|
<span class="nagios-badge" id="nagios-badge-{{ host.name }}">—</span>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
<span class="os-label" id="os-label-{{ host.name }}"></span>
|
<span class="os-label" id="os-label-{{ host.name }}"></span>
|
||||||
|
{% if host.is_owner %}
|
||||||
|
<button class="host-action-btn update-btn"
|
||||||
|
onclick="event.stopPropagation(); hostAction(this, '/u?h={{ host.name }}')">Update</button>
|
||||||
|
<button class="host-action-btn delete-btn"
|
||||||
|
onclick="event.stopPropagation(); hostDelete(this, '{{ host.name }}')">Delete</button>
|
||||||
|
{% endif %}
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="host-body">
|
<div class="host-body">
|
||||||
{% set plugin_order = ['os_info','cpu_monitor','memory_monitor','disk_monitor','network_monitor','nagios_runner','filesystem_info'] %}
|
{% set plugin_order = ['os_info','cpu_monitor','memory_monitor','disk_monitor','network_monitor','zfs_monitor','nagios_runner','filesystem_info'] %}
|
||||||
{% for plugin in plugin_order if plugin in host.plugins %}
|
{% for plugin in plugin_order if plugin in host.plugins %}
|
||||||
<div class="plugin-accordion collapsed"
|
<div class="plugin-accordion collapsed"
|
||||||
data-hostname="{{ host.name }}"
|
data-hostname="{{ host.name }}"
|
||||||
@@ -427,6 +481,7 @@
|
|||||||
const GLANCE_PLUGINS = ['cpu_monitor','memory_monitor','disk_monitor',
|
const GLANCE_PLUGINS = ['cpu_monitor','memory_monitor','disk_monitor',
|
||||||
'network_monitor','nagios_runner','os_info'];
|
'network_monitor','nagios_runner','os_info'];
|
||||||
const SKIP_FIELDS = new Set(['id','name']);
|
const SKIP_FIELDS = new Set(['id','name']);
|
||||||
|
const CURRENT_USER_ADMIN = {{ 'true' if current_user and current_user.admin else 'false' }};
|
||||||
|
|
||||||
// ── Cache ───────────────────────────────────────────────────────────────
|
// ── Cache ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
@@ -446,6 +501,17 @@
|
|||||||
return pluginCache[hostname]?.[pluginName] ?? null;
|
return pluginCache[hostname]?.[pluginName] ?? null;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Return worst nagios exit code (0-3) found in a nagios_runner data object.
|
||||||
|
function nagiosWorstStatus(data) {
|
||||||
|
let worst = 0;
|
||||||
|
for (const [k, v] of Object.entries(data || {})) {
|
||||||
|
if (k.endsWith('_status_code') && typeof v === 'number' && v > worst) {
|
||||||
|
worst = v;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return worst;
|
||||||
|
}
|
||||||
|
|
||||||
// ── Fetch helpers ───────────────────────────────────────────────────────
|
// ── Fetch helpers ───────────────────────────────────────────────────────
|
||||||
|
|
||||||
async function fetchPlugin(hostname, pluginName) {
|
async function fetchPlugin(hostname, pluginName) {
|
||||||
@@ -494,6 +560,12 @@
|
|||||||
|
|
||||||
const chips = [];
|
const chips = [];
|
||||||
|
|
||||||
|
// Owner (admin only, static from server)
|
||||||
|
const owner = strip.dataset.owner;
|
||||||
|
if (CURRENT_USER_ADMIN && owner) {
|
||||||
|
chips.push(`<span class="glance-chip neutral">${owner}</span>`);
|
||||||
|
}
|
||||||
|
|
||||||
// CPU
|
// CPU
|
||||||
const cpu = getCache(hostname, 'cpu_monitor');
|
const cpu = getCache(hostname, 'cpu_monitor');
|
||||||
if (cpu) {
|
if (cpu) {
|
||||||
@@ -547,13 +619,13 @@
|
|||||||
? chips.join('')
|
? chips.join('')
|
||||||
: '<span class="glance-loading">—</span>';
|
: '<span class="glance-loading">—</span>';
|
||||||
|
|
||||||
// Nagios badge
|
// Nagios badge — derive worst status from individual check codes
|
||||||
const nagios = getCache(hostname, 'nagios_runner');
|
const nagios = getCache(hostname, 'nagios_runner');
|
||||||
if (nagosBadge && nagios) {
|
if (nagosBadge && nagios) {
|
||||||
const status = (nagios.data.overall_status || '—').toUpperCase();
|
const worst = nagiosWorstStatus(nagios.data);
|
||||||
const cls = status === 'OK' ? 'ok'
|
const names = {0:'OK', 1:'WARNING', 2:'CRITICAL', 3:'UNKNOWN'};
|
||||||
: status === 'WARNING' ? 'warning'
|
const status = names[worst] || '—';
|
||||||
: status === 'CRITICAL' ? 'critical' : '';
|
const cls = worst === 0 ? 'ok' : worst === 1 ? 'warning' : worst >= 2 ? 'critical' : '';
|
||||||
nagosBadge.className = `nagios-badge ${cls}`;
|
nagosBadge.className = `nagios-badge ${cls}`;
|
||||||
nagosBadge.textContent = status;
|
nagosBadge.textContent = status;
|
||||||
}
|
}
|
||||||
@@ -662,9 +734,10 @@
|
|||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
case 'nagios_runner': {
|
case 'nagios_runner': {
|
||||||
const status = (d.overall_status || '?').toUpperCase();
|
const worst = nagiosWorstStatus(d);
|
||||||
const count = d.plugin_count;
|
const names = {0:'OK', 1:'WARNING', 2:'CRITICAL', 3:'UNKNOWN'};
|
||||||
text = status + (count != null ? ` — ${count} checks` : '');
|
const codes = Object.keys(d).filter(k => k.endsWith('_status_code'));
|
||||||
|
text = (names[worst] || '?') + (codes.length ? ` — ${codes.length} checks` : '');
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
case 'filesystem_info': {
|
case 'filesystem_info': {
|
||||||
@@ -672,6 +745,19 @@
|
|||||||
text = `${count} filesystem${count !== 1 ? 's' : ''}`;
|
text = `${count} filesystem${count !== 1 ? 's' : ''}`;
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
case 'zfs_monitor': {
|
||||||
|
const pools = d.pools || {};
|
||||||
|
const names = Object.keys(pools);
|
||||||
|
if (names.length === 0) { text = 'No pools'; break; }
|
||||||
|
const degraded = names.filter(n => pools[n].health && pools[n].health !== 'ONLINE');
|
||||||
|
text = names.map(n => {
|
||||||
|
const p = pools[n];
|
||||||
|
const cap = p.capacity != null ? ` ${p.capacity.toFixed(0)}%` : '';
|
||||||
|
return `${n}${cap}`;
|
||||||
|
}).join(' · ');
|
||||||
|
if (degraded.length) text += ` ⚠ ${degraded.map(n => pools[n].health).join(',')}`;
|
||||||
|
break;
|
||||||
|
}
|
||||||
default:
|
default:
|
||||||
text = 'Loaded';
|
text = 'Loaded';
|
||||||
}
|
}
|
||||||
@@ -693,6 +779,7 @@
|
|||||||
case 'memory_monitor': html = renderMemoryTable(cached.data); break;
|
case 'memory_monitor': html = renderMemoryTable(cached.data); break;
|
||||||
case 'disk_monitor': html = renderDiskTables(cached.data); break;
|
case 'disk_monitor': html = renderDiskTables(cached.data); break;
|
||||||
case 'network_monitor':html = renderNetworkTables(cached.data); break;
|
case 'network_monitor':html = renderNetworkTables(cached.data); break;
|
||||||
|
case 'zfs_monitor': html = renderZfsTables(cached.data); break;
|
||||||
case 'nagios_runner': html = renderNagiosTable(cached.data); break;
|
case 'nagios_runner': html = renderNagiosTable(cached.data); break;
|
||||||
case 'filesystem_info':html = renderFilesystemTable(cached.data); break;
|
case 'filesystem_info':html = renderFilesystemTable(cached.data); break;
|
||||||
default: html = renderGenericTable(cached.data); break;
|
default: html = renderGenericTable(cached.data); break;
|
||||||
@@ -1023,6 +1110,66 @@
|
|||||||
return html;
|
return html;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function renderZfsTables(d) {
|
||||||
|
const pools = d.pools || {};
|
||||||
|
const names = Object.keys(pools);
|
||||||
|
if (names.length === 0) return '<div class="no-data">No ZFS pools found</div>';
|
||||||
|
|
||||||
|
const healthCls = h => {
|
||||||
|
if (!h || h === 'ONLINE') return 'pct-ok';
|
||||||
|
if (h === 'DEGRADED') return 'pct-warn';
|
||||||
|
return 'pct-crit';
|
||||||
|
};
|
||||||
|
|
||||||
|
let pt = '<table class="data-table"><thead><tr>'
|
||||||
|
+ '<th>Pool</th><th>Health</th>'
|
||||||
|
+ '<th class="num">Size</th><th class="num">Used</th>'
|
||||||
|
+ '<th class="num">Free</th><th class="num">Cap %</th>'
|
||||||
|
+ '<th class="num">Frag %</th><th class="num">Dedup</th>'
|
||||||
|
+ '</tr></thead><tbody>';
|
||||||
|
for (const name of names) {
|
||||||
|
const p = pools[name];
|
||||||
|
const cap = p.capacity != null ? p.capacity : 0;
|
||||||
|
const capCls = cap > 90 ? 'pct-crit' : cap > 75 ? 'pct-warn' : 'pct-ok';
|
||||||
|
pt += `<tr>
|
||||||
|
<td class="iface-name">${escHtml(name)}</td>
|
||||||
|
<td class="${healthCls(p.health)}">${escHtml(p.health || '—')}</td>
|
||||||
|
<td class="num">${formatBytes(p.size || 0)}</td>
|
||||||
|
<td class="num">${formatBytes(p.alloc || 0)}</td>
|
||||||
|
<td class="num">${formatBytes(p.free || 0)}</td>
|
||||||
|
<td class="num ${capCls}">${cap.toFixed(1)}%</td>
|
||||||
|
<td class="num">${p.frag != null ? p.frag.toFixed(1) + '%' : '—'}</td>
|
||||||
|
<td class="num">${p.dedup != null ? p.dedup.toFixed(2) + 'x' : '—'}</td>
|
||||||
|
</tr>`;
|
||||||
|
}
|
||||||
|
pt += '</tbody></table>';
|
||||||
|
|
||||||
|
const hasIo = names.some(n => pools[n].read_ops != null);
|
||||||
|
if (!hasIo) return pt;
|
||||||
|
|
||||||
|
let iot = '<table class="data-table"><thead><tr>'
|
||||||
|
+ '<th>Pool</th>'
|
||||||
|
+ '<th class="num">Read ops</th><th class="num">Write ops</th>'
|
||||||
|
+ '<th class="num">Read BW</th><th class="num">Write BW</th>'
|
||||||
|
+ '</tr></thead><tbody>';
|
||||||
|
for (const name of names) {
|
||||||
|
const p = pools[name];
|
||||||
|
iot += `<tr>
|
||||||
|
<td class="iface-name">${escHtml(name)}</td>
|
||||||
|
<td class="num">${p.read_ops != null ? p.read_ops.toLocaleString() : '—'}</td>
|
||||||
|
<td class="num">${p.write_ops != null ? p.write_ops.toLocaleString() : '—'}</td>
|
||||||
|
<td class="num">${p.read_bw != null ? formatBytes(p.read_bw) : '—'}</td>
|
||||||
|
<td class="num">${p.write_bw != null ? formatBytes(p.write_bw) : '—'}</td>
|
||||||
|
</tr>`;
|
||||||
|
}
|
||||||
|
iot += '</tbody></table>';
|
||||||
|
|
||||||
|
return `<div class="flex-tables">
|
||||||
|
<div><div class="table-section-label">Pools</div>${pt}</div>
|
||||||
|
<div><div class="table-section-label">I/O (cumulative)</div>${iot}</div>
|
||||||
|
</div>`;
|
||||||
|
}
|
||||||
|
|
||||||
function renderGenericTable(d) {
|
function renderGenericTable(d) {
|
||||||
let html = '<table class="data-table"><thead><tr><th>Field</th><th>Value</th></tr></thead><tbody>';
|
let html = '<table class="data-table"><thead><tr><th>Field</th><th>Value</th></tr></thead><tbody>';
|
||||||
for (const [k, v] of Object.entries(d)) {
|
for (const [k, v] of Object.entries(d)) {
|
||||||
@@ -1081,12 +1228,68 @@
|
|||||||
// ── Init ────────────────────────────────────────────────────────────────
|
// ── Init ────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
document.addEventListener('DOMContentLoaded', () => {
|
document.addEventListener('DOMContentLoaded', () => {
|
||||||
|
// If a host fragment is in the URL, expand and scroll to that host;
|
||||||
|
// otherwise expand the first host as before.
|
||||||
|
const hash = window.location.hash;
|
||||||
|
if (hash) {
|
||||||
|
const hostname = decodeURIComponent(hash.slice(1));
|
||||||
|
const card = document.querySelector(`.host-card[data-hostname="${hostname}"]`);
|
||||||
|
if (card) {
|
||||||
|
card.classList.remove('collapsed');
|
||||||
|
fetchHostGlance(hostname);
|
||||||
|
setTimeout(() => card.scrollIntoView({ behavior: 'smooth', block: 'start' }), 150);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
const first = document.querySelector('.host-card');
|
const first = document.querySelector('.host-card');
|
||||||
if (first) {
|
if (first) {
|
||||||
first.classList.remove('collapsed');
|
first.classList.remove('collapsed');
|
||||||
fetchHostGlance(first.dataset.hostname);
|
fetchHostGlance(first.dataset.hostname);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
// ── Host action helpers ──────────────────────────────────────
|
||||||
|
|
||||||
|
let _toastTimer = null;
|
||||||
|
function showToast(msg, isError) {
|
||||||
|
const t = document.getElementById('action-toast');
|
||||||
|
t.textContent = msg;
|
||||||
|
t.classList.toggle('error', !!isError);
|
||||||
|
t.classList.add('show');
|
||||||
|
clearTimeout(_toastTimer);
|
||||||
|
_toastTimer = setTimeout(() => t.classList.remove('show'), 4000);
|
||||||
|
}
|
||||||
|
|
||||||
|
async function hostAction(btn, url) {
|
||||||
|
btn.disabled = true;
|
||||||
|
try {
|
||||||
|
const res = await fetch(url);
|
||||||
|
const text = await res.text();
|
||||||
|
showToast(text, !res.ok);
|
||||||
|
} catch (e) {
|
||||||
|
showToast('Request failed: ' + e.message, true);
|
||||||
|
} finally {
|
||||||
|
btn.disabled = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function hostDelete(btn, hostname) {
|
||||||
|
if (!confirm('Delete host ' + hostname + '?')) return;
|
||||||
|
btn.disabled = true;
|
||||||
|
try {
|
||||||
|
const res = await fetch('/d?h=' + encodeURIComponent(hostname));
|
||||||
|
const text = await res.text();
|
||||||
|
showToast(text, !res.ok);
|
||||||
|
if (res.ok) {
|
||||||
|
const card = document.querySelector(`.host-card[data-hostname="${hostname}"]`);
|
||||||
|
if (card) card.remove();
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
showToast('Request failed: ' + e.message, true);
|
||||||
|
btn.disabled = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
</script>
|
</script>
|
||||||
|
|
||||||
|
<div id="action-toast"></div>
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
|||||||
@@ -9,7 +9,7 @@
|
|||||||
max-width: 960px;
|
max-width: 960px;
|
||||||
}
|
}
|
||||||
|
|
||||||
h1 { color: #333; margin-bottom: 4px; font-size: 1.5em; }
|
h1 { color: #333; margin-bottom: 5px; margin-top: 15px; font-size: 1.5em; }
|
||||||
.subtitle { color: #666; margin-bottom: 24px; font-size: 0.9em; }
|
.subtitle { color: #666; margin-bottom: 24px; font-size: 0.9em; }
|
||||||
|
|
||||||
/* ---- Sidebar + content layout ---- */
|
/* ---- Sidebar + content layout ---- */
|
||||||
@@ -23,7 +23,7 @@
|
|||||||
width: 180px;
|
width: 180px;
|
||||||
flex-shrink: 0;
|
flex-shrink: 0;
|
||||||
position: sticky;
|
position: sticky;
|
||||||
top: 20px;
|
top: 60px;
|
||||||
}
|
}
|
||||||
|
|
||||||
.sidebar-nav a {
|
.sidebar-nav a {
|
||||||
@@ -254,6 +254,17 @@
|
|||||||
.host-bool { text-align: center; }
|
.host-bool { text-align: center; }
|
||||||
.dot-yes { color: #2e7d32; font-size: 1.1em; }
|
.dot-yes { color: #2e7d32; font-size: 1.1em; }
|
||||||
.dot-no { color: #ddd; font-size: 1.1em; }
|
.dot-no { color: #ddd; font-size: 1.1em; }
|
||||||
|
|
||||||
|
/* ---- Threshold configurations ---- */
|
||||||
|
.thresh-config { margin: 12px 20px 20px; }
|
||||||
|
.thresh-config-name {
|
||||||
|
font-weight: 600; font-size: 0.9em; color: #1a237e;
|
||||||
|
margin-bottom: 6px;
|
||||||
|
}
|
||||||
|
.mini-table .warn { color: #e65100; font-weight: 600; }
|
||||||
|
.mini-table .crit { color: #b71c1c; font-weight: 600; }
|
||||||
|
.mini-table .dim { color: #aaa; }
|
||||||
|
.mini-table .metric-path { font-family: monospace; font-size: 0.88em; }
|
||||||
</style>
|
</style>
|
||||||
|
|
||||||
<body>
|
<body>
|
||||||
@@ -394,6 +405,49 @@
|
|||||||
{% endif %}
|
{% endif %}
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
|
||||||
|
{# ---- Threshold configurations section ---- #}
|
||||||
|
{% if section.id == "thresholds" %}
|
||||||
|
{% if section.threshold_configs %}
|
||||||
|
{% for tc in section.threshold_configs %}
|
||||||
|
<div class="thresh-config">
|
||||||
|
<div class="thresh-config-name">{{ tc.name }}</div>
|
||||||
|
{% if tc.metrics %}
|
||||||
|
<div style="overflow-x: auto;">
|
||||||
|
<table class="mini-table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Metric</th>
|
||||||
|
<th>Op</th>
|
||||||
|
<th>Warning</th>
|
||||||
|
<th>Critical</th>
|
||||||
|
<th>Hysteresis</th>
|
||||||
|
<th>Count</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{% for m in tc.metrics %}
|
||||||
|
<tr {% if not m.enabled %} style="opacity:0.45"{% endif %}>
|
||||||
|
<td class="metric-path">{{ m.metric }}</td>
|
||||||
|
<td>{{ m.operator or '>' }}</td>
|
||||||
|
<td class="warn">{{ m.warning if m.warning is not none else '—' }}</td>
|
||||||
|
<td class="crit">{{ m.critical if m.critical is not none else '—' }}</td>
|
||||||
|
<td class="dim">{{ '%.0f%%' % (m.hysteresis * 100) if m.hysteresis else '—' }}</td>
|
||||||
|
<td class="dim">{{ m.count }}</td>
|
||||||
|
</tr>
|
||||||
|
{% endfor %}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
{% else %}
|
||||||
|
<span class="val-empty">No thresholds defined.</span>
|
||||||
|
{% endif %}
|
||||||
|
</div>
|
||||||
|
{% endfor %}
|
||||||
|
{% else %}
|
||||||
|
<div class="field-row"><span class="val-empty">No threshold configurations defined.</span></div>
|
||||||
|
{% endif %}
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
{# ---- Hosts section ---- #}
|
{# ---- Hosts section ---- #}
|
||||||
{% if section.id == "hosts" %}
|
{% if section.id == "hosts" %}
|
||||||
{% if section.hosts %}
|
{% if section.hosts %}
|
||||||
|
|||||||
+445
-161
@@ -9,10 +9,11 @@ This module provides a flexible threshold checking system that:
|
|||||||
- Supports multiple comparison operators
|
- Supports multiple comparison operators
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import logging
|
import logging
|
||||||
import time
|
import time
|
||||||
from enum import Enum
|
from enum import Enum
|
||||||
from typing import Dict, Any, Optional, Tuple, Callable
|
from typing import Dict, List, Any, Optional, Tuple, Callable
|
||||||
from . import notify as notify_mod
|
from . import notify as notify_mod
|
||||||
from .config import THRESHOLD_DEFAULTS
|
from .config import THRESHOLD_DEFAULTS
|
||||||
|
|
||||||
@@ -29,12 +30,13 @@ class AlertLevel(Enum):
|
|||||||
|
|
||||||
class ComparisonOperator(Enum):
|
class ComparisonOperator(Enum):
|
||||||
"""Supported comparison operators for threshold checks."""
|
"""Supported comparison operators for threshold checks."""
|
||||||
GT = ">" # Greater than
|
GT = ">" # Greater than
|
||||||
GTE = ">=" # Greater than or equal
|
GTE = ">=" # Greater than or equal
|
||||||
LT = "<" # Less than
|
LT = "<" # Less than
|
||||||
LTE = "<=" # Less than or equal
|
LTE = "<=" # Less than or equal
|
||||||
EQ = "==" # Equal to
|
EQ = "==" # Equal to
|
||||||
NEQ = "!=" # Not equal to
|
NEQ = "!=" # Not equal to
|
||||||
|
NAGIOS = "nagios" # Nagios exit-code semantics: 0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN
|
||||||
|
|
||||||
|
|
||||||
class AlertState:
|
class AlertState:
|
||||||
@@ -56,6 +58,7 @@ class AlertState:
|
|||||||
self.last_notification = None
|
self.last_notification = None
|
||||||
self.threshold_value = None # The threshold value that triggered alert
|
self.threshold_value = None # The threshold value that triggered alert
|
||||||
self.operator = None # The comparison operator (>, <, >=, etc.)
|
self.operator = None # The comparison operator (>, <, >=, etc.)
|
||||||
|
self.hysteresis: Optional[float] = None # Hysteresis fraction used for recovery
|
||||||
self.formatted_message = None # Formatted display message for UI
|
self.formatted_message = None # Formatted display message for UI
|
||||||
self.acknowledged = False # Whether alert has been acknowledged
|
self.acknowledged = False # Whether alert has been acknowledged
|
||||||
self.acknowledged_at = None # Timestamp when acknowledged
|
self.acknowledged_at = None # Timestamp when acknowledged
|
||||||
@@ -150,7 +153,16 @@ class AlertState:
|
|||||||
result["operator"] = self.operator
|
result["operator"] = self.operator
|
||||||
if self.formatted_message is not None:
|
if self.formatted_message is not None:
|
||||||
result["formatted_message"] = self.formatted_message
|
result["formatted_message"] = self.formatted_message
|
||||||
|
|
||||||
|
# Compute and expose the recovery threshold so the UI can display it
|
||||||
|
if (self.hysteresis and self.threshold_value is not None
|
||||||
|
and self.operator is not None):
|
||||||
|
ha = abs(self.threshold_value * self.hysteresis)
|
||||||
|
if self.operator in ('>', '>='):
|
||||||
|
result["recovery_threshold"] = round(self.threshold_value - ha, 4)
|
||||||
|
elif self.operator in ('<', '<='):
|
||||||
|
result["recovery_threshold"] = round(self.threshold_value + ha, 4)
|
||||||
|
|
||||||
return result
|
return result
|
||||||
|
|
||||||
def __setstate__(self, state):
|
def __setstate__(self, state):
|
||||||
@@ -158,6 +170,8 @@ class AlertState:
|
|||||||
self.__dict__.update(state)
|
self.__dict__.update(state)
|
||||||
if not hasattr(self, 'consecutive_count'):
|
if not hasattr(self, 'consecutive_count'):
|
||||||
self.consecutive_count = 0
|
self.consecutive_count = 0
|
||||||
|
if not hasattr(self, 'hysteresis'):
|
||||||
|
self.hysteresis = None
|
||||||
|
|
||||||
def acknowledge(self):
|
def acknowledge(self):
|
||||||
"""Acknowledge this alert to stop reminder notifications."""
|
"""Acknowledge this alert to stop reminder notifications."""
|
||||||
@@ -216,33 +230,43 @@ class ThresholdConfig:
|
|||||||
def evaluate(self, value: float) -> AlertLevel:
|
def evaluate(self, value: float) -> AlertLevel:
|
||||||
"""
|
"""
|
||||||
Evaluate a value against this threshold.
|
Evaluate a value against this threshold.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
value: Metric value to check
|
value: Metric value to check
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
AlertLevel indicating the severity
|
AlertLevel indicating the severity
|
||||||
"""
|
"""
|
||||||
if not self.enabled:
|
if not self.enabled:
|
||||||
return AlertLevel.OK
|
return AlertLevel.OK
|
||||||
|
|
||||||
|
# Nagios exit-code semantics: value IS the severity
|
||||||
|
if self.operator == ComparisonOperator.NAGIOS:
|
||||||
|
try:
|
||||||
|
code = int(value)
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return AlertLevel.UNKNOWN
|
||||||
|
return {0: AlertLevel.OK, 1: AlertLevel.WARNING, 2: AlertLevel.CRITICAL}.get(
|
||||||
|
code, AlertLevel.UNKNOWN
|
||||||
|
)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Convert value to float for comparison
|
# Convert value to float for comparison
|
||||||
value = float(value)
|
value = float(value)
|
||||||
except (TypeError, ValueError):
|
except (TypeError, ValueError):
|
||||||
logger.warning("Cannot convert value %s to float for %s", value, self.metric_path)
|
logger.warning("Cannot convert value %s to float for %s", value, self.metric_path)
|
||||||
return AlertLevel.UNKNOWN
|
return AlertLevel.UNKNOWN
|
||||||
|
|
||||||
# Check critical threshold first
|
# Check critical threshold first
|
||||||
if self.critical is not None:
|
if self.critical is not None:
|
||||||
if self._compare(value, self.critical):
|
if self._compare(value, self.critical):
|
||||||
return AlertLevel.CRITICAL
|
return AlertLevel.CRITICAL
|
||||||
|
|
||||||
# Then check warning threshold
|
# Then check warning threshold
|
||||||
if self.warning is not None:
|
if self.warning is not None:
|
||||||
if self._compare(value, self.warning):
|
if self._compare(value, self.warning):
|
||||||
return AlertLevel.WARNING
|
return AlertLevel.WARNING
|
||||||
|
|
||||||
return AlertLevel.OK
|
return AlertLevel.OK
|
||||||
|
|
||||||
def evaluate_with_hysteresis(
|
def evaluate_with_hysteresis(
|
||||||
@@ -261,7 +285,11 @@ class ThresholdConfig:
|
|||||||
New alert level considering hysteresis
|
New alert level considering hysteresis
|
||||||
"""
|
"""
|
||||||
new_level = self.evaluate(value)
|
new_level = self.evaluate(value)
|
||||||
|
|
||||||
|
# Nagios exit codes are discrete integers — hysteresis doesn't apply
|
||||||
|
if self.operator == ComparisonOperator.NAGIOS:
|
||||||
|
return new_level
|
||||||
|
|
||||||
# If no hysteresis, return new level
|
# If no hysteresis, return new level
|
||||||
if self.hysteresis == 0.0:
|
if self.hysteresis == 0.0:
|
||||||
return new_level
|
return new_level
|
||||||
@@ -328,15 +356,18 @@ class ThresholdChecker:
|
|||||||
renotify_interval: Seconds between repeat notifications (default: 1 hour)
|
renotify_interval: Seconds between repeat notifications (default: 1 hour)
|
||||||
journal: Optional MessageJournal instance for logging threshold events
|
journal: Optional MessageJournal instance for logging threshold events
|
||||||
"""
|
"""
|
||||||
# Named threshold configurations: {config_name: {metric_path: ThresholdConfig}}
|
# Named threshold configurations (pre-merged: defaults + overrides): {config_name: {metric_path: ThresholdConfig}}
|
||||||
self.threshold_configs = {}
|
self.threshold_configs = {}
|
||||||
|
|
||||||
|
# Raw overrides only for each named config (no defaults baked in): {config_name: {metric_path: ThresholdConfig}}
|
||||||
|
self.threshold_raw_configs: Dict[str, Dict[str, ThresholdConfig]] = {}
|
||||||
|
|
||||||
# Single threshold set for backward compatibility: {metric_path: ThresholdConfig}
|
# Single threshold set for backward compatibility: {metric_path: ThresholdConfig}
|
||||||
self.thresholds = {}
|
self.thresholds = {}
|
||||||
|
|
||||||
# Host to config name mapping: {host_name: config_name}
|
# Host to ordered list of config names: {host_name: [config_name, ...]}
|
||||||
self.host_config_mapping = {}
|
self.host_config_mapping: Dict[str, List[str]] = {}
|
||||||
|
|
||||||
# Default config name to use when no mapping exists
|
# Default config name to use when no mapping exists
|
||||||
self.default_config = "default"
|
self.default_config = "default"
|
||||||
|
|
||||||
@@ -372,6 +403,7 @@ class ThresholdChecker:
|
|||||||
|
|
||||||
# Clear old configuration
|
# Clear old configuration
|
||||||
self.threshold_configs.clear()
|
self.threshold_configs.clear()
|
||||||
|
self.threshold_raw_configs.clear()
|
||||||
self.thresholds.clear()
|
self.thresholds.clear()
|
||||||
self.host_config_mapping.clear()
|
self.host_config_mapping.clear()
|
||||||
self.grace_seconds = float(config.get("grace", 2))
|
self.grace_seconds = float(config.get("grace", 2))
|
||||||
@@ -387,14 +419,28 @@ class ThresholdChecker:
|
|||||||
|
|
||||||
def _parse_config(self, config: Dict[str, Any]):
|
def _parse_config(self, config: Dict[str, Any]):
|
||||||
"""Parse threshold configuration from YAML structure.
|
"""Parse threshold configuration from YAML structure.
|
||||||
|
|
||||||
Supports two formats:
|
Supports two formats:
|
||||||
1. Legacy format with direct 'thresholds' section
|
1. Legacy format with direct 'thresholds' section
|
||||||
2. New format with 'threshold_configs' and 'host_threshold_mapping'
|
2. New format with 'threshold_configs' and 'host_threshold_mapping'
|
||||||
|
|
||||||
|
In all cases, THRESHOLD_DEFAULTS are seeded into threshold_configs["default"]
|
||||||
|
so the Settings page always shows the built-in defaults.
|
||||||
|
_parse_multi_config() overwrites this with the fully-merged effective defaults.
|
||||||
"""
|
"""
|
||||||
|
# Always expose built-in defaults through threshold_configs["default"] so
|
||||||
|
# the Settings page has something to display even in legacy/no-config mode.
|
||||||
|
seed: Dict[str, ThresholdConfig] = {}
|
||||||
|
for plugin_name, plugin_thresholds in THRESHOLD_DEFAULTS.get("thresholds", {}).items():
|
||||||
|
if isinstance(plugin_thresholds, dict):
|
||||||
|
self._parse_plugin_thresholds(plugin_name, plugin_thresholds, target_dict=seed)
|
||||||
|
if seed:
|
||||||
|
self.threshold_configs["default"] = seed
|
||||||
|
self.threshold_raw_configs["default"] = {}
|
||||||
|
|
||||||
# Check for new multi-config format
|
# Check for new multi-config format
|
||||||
if "threshold_configs" in config:
|
if "threshold_configs" in config:
|
||||||
self._parse_multi_config(config)
|
self._parse_multi_config(config) # overwrites threshold_configs["default"]
|
||||||
elif "thresholds" in config:
|
elif "thresholds" in config:
|
||||||
# Legacy single threshold configuration
|
# Legacy single threshold configuration
|
||||||
self._parse_legacy_config(config)
|
self._parse_legacy_config(config)
|
||||||
@@ -424,9 +470,10 @@ class ThresholdChecker:
|
|||||||
self._parse_plugin_thresholds(plugin_name, plugin_thresholds, target_dict=effective_defaults)
|
self._parse_plugin_thresholds(plugin_name, plugin_thresholds, target_dict=effective_defaults)
|
||||||
|
|
||||||
self.threshold_configs["default"] = dict(effective_defaults)
|
self.threshold_configs["default"] = dict(effective_defaults)
|
||||||
|
self.threshold_raw_configs["default"] = {}
|
||||||
logger.info("Registered 'default' threshold config with %d metrics", len(effective_defaults))
|
logger.info("Registered 'default' threshold config with %d metrics", len(effective_defaults))
|
||||||
|
|
||||||
# Parse each named configuration, seeding it with effective_defaults first
|
# Parse each named configuration
|
||||||
for config_name, config_data in threshold_configs.items():
|
for config_name, config_data in threshold_configs.items():
|
||||||
if config_name == "default":
|
if config_name == "default":
|
||||||
continue # already handled above
|
continue # already handled above
|
||||||
@@ -440,33 +487,41 @@ class ThresholdChecker:
|
|||||||
continue
|
continue
|
||||||
|
|
||||||
logger.info("Parsing threshold configuration: %s", config_name)
|
logger.info("Parsing threshold configuration: %s", config_name)
|
||||||
self.threshold_configs[config_name] = dict(effective_defaults)
|
|
||||||
|
|
||||||
|
# Raw overrides only (used for multi-config layering)
|
||||||
|
raw_overrides: Dict[str, ThresholdConfig] = {}
|
||||||
thresholds_config = config_data["thresholds"]
|
thresholds_config = config_data["thresholds"]
|
||||||
for plugin_name, plugin_thresholds in thresholds_config.items():
|
for plugin_name, plugin_thresholds in thresholds_config.items():
|
||||||
if not isinstance(plugin_thresholds, dict):
|
if isinstance(plugin_thresholds, dict):
|
||||||
continue
|
self._parse_plugin_thresholds(plugin_name, plugin_thresholds, target_dict=raw_overrides)
|
||||||
|
self.threshold_raw_configs[config_name] = raw_overrides
|
||||||
|
|
||||||
self._parse_plugin_thresholds(
|
# Pre-merged version (defaults + overrides) for single-config fast path
|
||||||
plugin_name,
|
self.threshold_configs[config_name] = dict(effective_defaults)
|
||||||
plugin_thresholds,
|
self.threshold_configs[config_name].update(raw_overrides)
|
||||||
target_dict=self.threshold_configs[config_name]
|
|
||||||
)
|
# Parse host → config list mapping from two possible sources
|
||||||
|
|
||||||
# Parse host to config mapping from two possible sources
|
def _normalise(value) -> List[str]:
|
||||||
# 1. New format: hosts section with threshold_config attribute
|
"""Accept a string or list; always return a list."""
|
||||||
|
if isinstance(value, list):
|
||||||
|
return [str(v) for v in value]
|
||||||
|
return [str(value)]
|
||||||
|
|
||||||
|
# 1. hosts section with threshold_config attribute (string or list)
|
||||||
if "hosts" in config:
|
if "hosts" in config:
|
||||||
hosts_config = config["hosts"]
|
hosts_config = config["hosts"]
|
||||||
if isinstance(hosts_config, dict):
|
if isinstance(hosts_config, dict):
|
||||||
for host_name, host_attrs in hosts_config.items():
|
for host_name, host_attrs in hosts_config.items():
|
||||||
if isinstance(host_attrs, dict) and "threshold_config" in host_attrs:
|
if isinstance(host_attrs, dict) and "threshold_config" in host_attrs:
|
||||||
self.host_config_mapping[host_name] = host_attrs["threshold_config"]
|
self.host_config_mapping[host_name] = _normalise(host_attrs["threshold_config"])
|
||||||
|
|
||||||
# 2. Legacy format: host_threshold_mapping section (for backward compatibility)
|
# 2. Legacy host_threshold_mapping section (string values only)
|
||||||
if "host_threshold_mapping" in config:
|
if "host_threshold_mapping" in config:
|
||||||
legacy_mapping = config.get("host_threshold_mapping", {})
|
legacy_mapping = config.get("host_threshold_mapping", {})
|
||||||
if isinstance(legacy_mapping, dict):
|
if isinstance(legacy_mapping, dict):
|
||||||
self.host_config_mapping.update(legacy_mapping)
|
for host_name, value in legacy_mapping.items():
|
||||||
|
self.host_config_mapping[host_name] = _normalise(value)
|
||||||
|
|
||||||
# Set default config (first one alphabetically or explicitly set)
|
# Set default config (first one alphabetically or explicitly set)
|
||||||
self.default_config = config.get("default_threshold_config", "default")
|
self.default_config = config.get("default_threshold_config", "default")
|
||||||
@@ -520,10 +575,13 @@ class ThresholdChecker:
|
|||||||
if not isinstance(threshold_config, dict):
|
if not isinstance(threshold_config, dict):
|
||||||
continue
|
continue
|
||||||
|
|
||||||
# Handle nested metrics (e.g., partitions./.percent)
|
# Handle nested metrics (e.g., partitions./.percent or pools.*.status)
|
||||||
if metric_name == "partitions":
|
if metric_name == "partitions":
|
||||||
self._parse_partition_thresholds(plugin_name, threshold_config, target_dict)
|
self._parse_partition_thresholds(plugin_name, threshold_config, target_dict)
|
||||||
continue
|
continue
|
||||||
|
if metric_name == "pools":
|
||||||
|
self._parse_pool_thresholds(plugin_name, threshold_config, target_dict)
|
||||||
|
continue
|
||||||
|
|
||||||
metric_path = f"{plugin_name}.{metric_name}"
|
metric_path = f"{plugin_name}.{metric_name}"
|
||||||
|
|
||||||
@@ -531,11 +589,14 @@ class ThresholdChecker:
|
|||||||
warning = threshold_config.get("warning")
|
warning = threshold_config.get("warning")
|
||||||
critical = threshold_config.get("critical")
|
critical = threshold_config.get("critical")
|
||||||
operator = threshold_config.get("operator", ">")
|
operator = threshold_config.get("operator", ">")
|
||||||
display = threshold_config.get("display", "(threshold: {op_symbol} {threshold_value})")
|
# Nagios operator maps exit codes directly; no numeric thresholds needed
|
||||||
hysteresis = threshold_config.get("hysteresis", 0.1) # 10% default
|
is_nagios_op = (operator == "nagios")
|
||||||
|
default_display = "{check_name}: {output}" if is_nagios_op else "(threshold: {op_symbol} {threshold_value})"
|
||||||
|
display = threshold_config.get("display", default_display)
|
||||||
|
hysteresis = threshold_config.get("hysteresis", 0.0 if is_nagios_op else 0.02)
|
||||||
enabled = threshold_config.get("enabled", True)
|
enabled = threshold_config.get("enabled", True)
|
||||||
|
|
||||||
if warning is None and critical is None:
|
if warning is None and critical is None and not is_nagios_op:
|
||||||
logger.warning("No thresholds defined for %s, skipping", metric_path)
|
logger.warning("No thresholds defined for %s, skipping", metric_path)
|
||||||
continue
|
continue
|
||||||
|
|
||||||
@@ -605,7 +666,57 @@ class ThresholdChecker:
|
|||||||
)
|
)
|
||||||
|
|
||||||
target_dict[metric_path] = threshold
|
target_dict[metric_path] = threshold
|
||||||
|
|
||||||
|
def _parse_pool_thresholds(
|
||||||
|
self,
|
||||||
|
plugin_name: str,
|
||||||
|
pools: Dict[str, Any],
|
||||||
|
target_dict: Optional[Dict[str, ThresholdConfig]] = None,
|
||||||
|
):
|
||||||
|
"""Parse ZFS pool thresholds. Pool names may be literal or '*' (all pools).
|
||||||
|
|
||||||
|
Config shape::
|
||||||
|
|
||||||
|
zfs_monitor:
|
||||||
|
pools:
|
||||||
|
'*':
|
||||||
|
status:
|
||||||
|
warning: 1
|
||||||
|
critical: 2
|
||||||
|
operator: '>'
|
||||||
|
tank:
|
||||||
|
capacity:
|
||||||
|
warning: 80
|
||||||
|
critical: 90
|
||||||
|
"""
|
||||||
|
if target_dict is None:
|
||||||
|
target_dict = self.thresholds
|
||||||
|
|
||||||
|
for pool_name, metrics in pools.items():
|
||||||
|
if not isinstance(metrics, dict):
|
||||||
|
continue
|
||||||
|
for metric_name, threshold_config in metrics.items():
|
||||||
|
if not isinstance(threshold_config, dict):
|
||||||
|
continue
|
||||||
|
metric_path = f"{plugin_name}.{pool_name}.{metric_name}"
|
||||||
|
warning = threshold_config.get("warning")
|
||||||
|
critical = threshold_config.get("critical")
|
||||||
|
operator = threshold_config.get("operator", ">")
|
||||||
|
hysteresis = threshold_config.get("hysteresis", 0.02)
|
||||||
|
enabled = threshold_config.get("enabled", True)
|
||||||
|
display = threshold_config.get("display")
|
||||||
|
if warning is None and critical is None:
|
||||||
|
continue
|
||||||
|
target_dict[metric_path] = ThresholdConfig(
|
||||||
|
metric_path=metric_path,
|
||||||
|
warning=warning,
|
||||||
|
critical=critical,
|
||||||
|
operator=operator,
|
||||||
|
hysteresis=hysteresis,
|
||||||
|
enabled=enabled,
|
||||||
|
display=display,
|
||||||
|
)
|
||||||
|
|
||||||
def _parse_rtt_thresholds(
|
def _parse_rtt_thresholds(
|
||||||
self,
|
self,
|
||||||
rtt_thresholds: Dict[str, Any],
|
rtt_thresholds: Dict[str, Any],
|
||||||
@@ -635,7 +746,7 @@ class ThresholdChecker:
|
|||||||
warning = rtt_thresholds.get("warning")
|
warning = rtt_thresholds.get("warning")
|
||||||
critical = rtt_thresholds.get("critical")
|
critical = rtt_thresholds.get("critical")
|
||||||
operator = rtt_thresholds.get("operator", ">")
|
operator = rtt_thresholds.get("operator", ">")
|
||||||
hysteresis = rtt_thresholds.get("hysteresis", 0.1) # 10% default
|
hysteresis = rtt_thresholds.get("hysteresis", 0.02) # 2% default
|
||||||
enabled = rtt_thresholds.get("enabled", True)
|
enabled = rtt_thresholds.get("enabled", True)
|
||||||
display = rtt_thresholds.get("display")
|
display = rtt_thresholds.get("display")
|
||||||
count = rtt_thresholds.get("count", 1)
|
count = rtt_thresholds.get("count", 1)
|
||||||
@@ -664,35 +775,55 @@ class ThresholdChecker:
|
|||||||
)
|
)
|
||||||
|
|
||||||
def get_thresholds_for_host(self, host_name: str) -> Dict[str, ThresholdConfig]:
|
def get_thresholds_for_host(self, host_name: str) -> Dict[str, ThresholdConfig]:
|
||||||
"""Get the appropriate threshold configuration for a host.
|
"""Get the effective threshold configuration for a host.
|
||||||
|
|
||||||
|
When threshold_config is a list, configs are applied left-to-right on top
|
||||||
|
of the default thresholds so earlier entries can be overridden by later ones.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
host_name: Name of the host
|
host_name: Name of the host
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Dictionary of thresholds for this host
|
Dictionary of thresholds for this host
|
||||||
"""
|
"""
|
||||||
# Legacy mode: single threshold set for all hosts
|
# Legacy mode: single threshold set for all hosts
|
||||||
if self.thresholds and not self.threshold_configs:
|
if self.thresholds and not self.threshold_configs:
|
||||||
return self.thresholds
|
return self.thresholds
|
||||||
|
|
||||||
# Multi-config mode: look up host-specific configuration
|
if not self.threshold_configs:
|
||||||
if self.threshold_configs:
|
return {}
|
||||||
config_name = self.host_config_mapping.get(host_name, self.default_config)
|
|
||||||
|
config_names = self.host_config_mapping.get(host_name)
|
||||||
if config_name in self.threshold_configs:
|
|
||||||
return self.threshold_configs[config_name]
|
# No host-specific mapping → return pre-merged default
|
||||||
else:
|
if not config_names:
|
||||||
|
return self.threshold_configs.get(self.default_config, {})
|
||||||
|
|
||||||
|
# Single config → fast path using pre-merged copy
|
||||||
|
if len(config_names) == 1:
|
||||||
|
name = config_names[0]
|
||||||
|
if name in self.threshold_configs:
|
||||||
|
return self.threshold_configs[name]
|
||||||
|
logger.warning(
|
||||||
|
"Threshold config '%s' not found for host '%s', using default '%s'",
|
||||||
|
name, host_name, self.default_config,
|
||||||
|
)
|
||||||
|
return self.threshold_configs.get(self.default_config, {})
|
||||||
|
|
||||||
|
# Multiple configs → start from defaults, layer raw overrides in order
|
||||||
|
result = dict(self.threshold_configs.get(self.default_config, {}))
|
||||||
|
for name in config_names:
|
||||||
|
if name == self.default_config:
|
||||||
|
continue # defaults already the base
|
||||||
|
raw = self.threshold_raw_configs.get(name)
|
||||||
|
if raw is None:
|
||||||
logger.warning(
|
logger.warning(
|
||||||
"Threshold config '%s' not found for host '%s', using default '%s'",
|
"Threshold config '%s' not found for host '%s', skipping",
|
||||||
config_name,
|
name, host_name,
|
||||||
host_name,
|
|
||||||
self.default_config
|
|
||||||
)
|
)
|
||||||
return self.threshold_configs.get(self.default_config, {})
|
else:
|
||||||
|
result.update(raw)
|
||||||
# No thresholds configured
|
return result
|
||||||
return {}
|
|
||||||
|
|
||||||
def check_value(
|
def check_value(
|
||||||
self,
|
self,
|
||||||
@@ -760,6 +891,12 @@ class ThresholdChecker:
|
|||||||
elif new_level == AlertLevel.WARNING and threshold.warning is not None:
|
elif new_level == AlertLevel.WARNING and threshold.warning is not None:
|
||||||
threshold_value = threshold.warning
|
threshold_value = threshold.warning
|
||||||
|
|
||||||
|
# Keep hysteresis on the state so the UI can show the recovery threshold
|
||||||
|
if new_level != AlertLevel.OK:
|
||||||
|
alert_state.hysteresis = threshold.hysteresis
|
||||||
|
else:
|
||||||
|
alert_state.hysteresis = None
|
||||||
|
|
||||||
# Update state and check for changes
|
# Update state and check for changes
|
||||||
old_level = alert_state.level
|
old_level = alert_state.level
|
||||||
if alert_state.update(new_level, value, threshold_value, threshold.operator.value):
|
if alert_state.update(new_level, value, threshold_value, threshold.operator.value):
|
||||||
@@ -769,6 +906,36 @@ class ThresholdChecker:
|
|||||||
self._check_pending_or_renotify(host_name, alert_state, metric_path, value, threshold, None)
|
self._check_pending_or_renotify(host_name, alert_state, metric_path, value, threshold, None)
|
||||||
|
|
||||||
return None
|
return None
|
||||||
|
def _find_threshold(
|
||||||
|
self, thresholds: Dict[str, "ThresholdConfig"], metric_path: str
|
||||||
|
) -> Tuple[Optional["ThresholdConfig"], Optional[str]]:
|
||||||
|
"""Return (threshold, check_name) for *metric_path*, falling back to suffix matches.
|
||||||
|
|
||||||
|
Allows generic thresholds like ``nagios_runner.status_code`` to match
|
||||||
|
fully-qualified paths like ``nagios_runner.check_disk_root_status_code``.
|
||||||
|
The exact match is always tried first; then successive leading
|
||||||
|
underscore-delimited segments are stripped from the field name until
|
||||||
|
a match is found or no segments remain.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
(ThresholdConfig, None) for an exact match.
|
||||||
|
(ThresholdConfig, "check_disk_root") for a suffix match — the second
|
||||||
|
element is the stripped prefix, available as ``{check_name}`` in
|
||||||
|
display format templates.
|
||||||
|
(None, None) when no threshold is found.
|
||||||
|
"""
|
||||||
|
if metric_path in thresholds:
|
||||||
|
return thresholds[metric_path], None
|
||||||
|
plugin, sep, field = metric_path.partition(".")
|
||||||
|
if not sep:
|
||||||
|
return None, None
|
||||||
|
parts = field.split("_")
|
||||||
|
for i in range(1, len(parts)):
|
||||||
|
candidate = plugin + "." + "_".join(parts[i:])
|
||||||
|
if candidate in thresholds:
|
||||||
|
return thresholds[candidate], "_".join(parts[:i])
|
||||||
|
return None, None
|
||||||
|
|
||||||
def check_plugin_data(
|
def check_plugin_data(
|
||||||
self,
|
self,
|
||||||
host_name: str,
|
host_name: str,
|
||||||
@@ -796,38 +963,39 @@ class ThresholdChecker:
|
|||||||
# Check flat metrics
|
# Check flat metrics
|
||||||
for metric_name, value in data.items():
|
for metric_name, value in data.items():
|
||||||
metric_path = f"{plugin_name}.{metric_name}"
|
metric_path = f"{plugin_name}.{metric_name}"
|
||||||
|
|
||||||
if metric_path not in thresholds:
|
threshold, check_name = self._find_threshold(thresholds, metric_path)
|
||||||
|
if threshold is None:
|
||||||
continue
|
continue
|
||||||
|
|
||||||
threshold = thresholds[metric_path]
|
|
||||||
|
|
||||||
# Get or create alert state
|
# Get or create alert state
|
||||||
if metric_path not in alert_states:
|
if metric_path not in alert_states:
|
||||||
alert_states[metric_path] = AlertState(metric_path)
|
alert_states[metric_path] = AlertState(metric_path)
|
||||||
|
|
||||||
alert_state = alert_states[metric_path]
|
alert_state = alert_states[metric_path]
|
||||||
|
|
||||||
# Evaluate threshold with hysteresis
|
# Evaluate threshold with hysteresis
|
||||||
new_level = threshold.evaluate_with_hysteresis(
|
new_level = threshold.evaluate_with_hysteresis(
|
||||||
value,
|
value,
|
||||||
alert_state.level
|
alert_state.level
|
||||||
)
|
)
|
||||||
|
|
||||||
# Determine which threshold was exceeded
|
# Determine which threshold was exceeded
|
||||||
threshold_value = None
|
threshold_value = None
|
||||||
if new_level == AlertLevel.CRITICAL and threshold.critical is not None:
|
if new_level == AlertLevel.CRITICAL and threshold.critical is not None:
|
||||||
threshold_value = threshold.critical
|
threshold_value = threshold.critical
|
||||||
elif new_level == AlertLevel.WARNING and threshold.warning is not None:
|
elif new_level == AlertLevel.WARNING and threshold.warning is not None:
|
||||||
threshold_value = threshold.warning
|
threshold_value = threshold.warning
|
||||||
|
|
||||||
|
alert_state.hysteresis = threshold.hysteresis if new_level != AlertLevel.OK else None
|
||||||
|
|
||||||
# Update state and check for changes
|
# Update state and check for changes
|
||||||
old_level = alert_state.level
|
old_level = alert_state.level
|
||||||
if alert_state.update(new_level, value, threshold_value, threshold.operator.value):
|
if alert_state.update(new_level, value, threshold_value, threshold.operator.value):
|
||||||
state_changes.append((metric_path, old_level, new_level, value))
|
state_changes.append((metric_path, old_level, new_level, value))
|
||||||
self._apply_grace(host_name, alert_state, metric_path, old_level, new_level, value, threshold, data)
|
self._apply_grace(host_name, alert_state, metric_path, old_level, new_level, value, threshold, data, check_name=check_name, metric_name=metric_name)
|
||||||
elif new_level != AlertLevel.OK:
|
elif new_level != AlertLevel.OK:
|
||||||
self._check_pending_or_renotify(host_name, alert_state, metric_path, value, threshold, data)
|
self._check_pending_or_renotify(host_name, alert_state, metric_path, value, threshold, data, check_name=check_name, metric_name=metric_name)
|
||||||
|
|
||||||
# Check nested metrics (e.g., partition data in disk_monitor)
|
# Check nested metrics (e.g., partition data in disk_monitor)
|
||||||
self._check_nested_metrics(
|
self._check_nested_metrics(
|
||||||
@@ -852,6 +1020,44 @@ class ThresholdChecker:
|
|||||||
# Get host-specific thresholds
|
# Get host-specific thresholds
|
||||||
thresholds = self.get_thresholds_for_host(host_name)
|
thresholds = self.get_thresholds_for_host(host_name)
|
||||||
|
|
||||||
|
# ZFS pool health checks
|
||||||
|
if plugin_name == "zfs_monitor" and "pools" in data:
|
||||||
|
pools = data["pools"]
|
||||||
|
if isinstance(pools, dict):
|
||||||
|
for pool_name, pool_metrics in pools.items():
|
||||||
|
if not isinstance(pool_metrics, dict):
|
||||||
|
continue
|
||||||
|
# Synthesize status from health string for older clients
|
||||||
|
# that predate the status field.
|
||||||
|
pool_metrics_effective = dict(pool_metrics)
|
||||||
|
if "health" in pool_metrics and "status" not in pool_metrics:
|
||||||
|
pool_metrics_effective["status"] = 0 if pool_metrics["health"] == "ONLINE" else 1
|
||||||
|
for metric_name, value in pool_metrics_effective.items():
|
||||||
|
# Try specific pool name first, then wildcard '*'
|
||||||
|
metric_path = f"{plugin_name}.{pool_name}.{metric_name}"
|
||||||
|
wildcard_path = f"{plugin_name}.*.{metric_name}"
|
||||||
|
threshold = thresholds.get(metric_path) or thresholds.get(wildcard_path)
|
||||||
|
if threshold is None:
|
||||||
|
continue
|
||||||
|
if metric_path not in alert_states:
|
||||||
|
alert_states[metric_path] = AlertState(metric_path)
|
||||||
|
alert_state = alert_states[metric_path]
|
||||||
|
new_level = threshold.evaluate_with_hysteresis(value, alert_state.level)
|
||||||
|
threshold_value = None
|
||||||
|
if new_level == AlertLevel.CRITICAL and threshold.critical is not None:
|
||||||
|
threshold_value = threshold.critical
|
||||||
|
elif new_level == AlertLevel.WARNING and threshold.warning is not None:
|
||||||
|
threshold_value = threshold.warning
|
||||||
|
alert_state.hysteresis = threshold.hysteresis if new_level != AlertLevel.OK else None
|
||||||
|
pool_context = dict(pool_metrics_effective)
|
||||||
|
pool_context["pool_name"] = pool_name
|
||||||
|
old_level = alert_state.level
|
||||||
|
if alert_state.update(new_level, value, threshold_value, threshold.operator.value):
|
||||||
|
state_changes.append((metric_path, old_level, new_level, value))
|
||||||
|
self._apply_grace(host_name, alert_state, metric_path, old_level, new_level, value, threshold, pool_context, metric_name=pool_name)
|
||||||
|
elif new_level != AlertLevel.OK:
|
||||||
|
self._check_pending_or_renotify(host_name, alert_state, metric_path, value, threshold, pool_context, metric_name=pool_name)
|
||||||
|
|
||||||
# Look for partition data in disk_monitor
|
# Look for partition data in disk_monitor
|
||||||
if plugin_name == "disk_monitor" and "partitions" in data:
|
if plugin_name == "disk_monitor" and "partitions" in data:
|
||||||
partitions = data["partitions"]
|
partitions = data["partitions"]
|
||||||
@@ -886,7 +1092,9 @@ class ThresholdChecker:
|
|||||||
threshold_value = threshold.critical
|
threshold_value = threshold.critical
|
||||||
elif new_level == AlertLevel.WARNING and threshold.warning is not None:
|
elif new_level == AlertLevel.WARNING and threshold.warning is not None:
|
||||||
threshold_value = threshold.warning
|
threshold_value = threshold.warning
|
||||||
|
|
||||||
|
alert_state.hysteresis = threshold.hysteresis if new_level != AlertLevel.OK else None
|
||||||
|
|
||||||
old_level = alert_state.level
|
old_level = alert_state.level
|
||||||
if alert_state.update(new_level, value, threshold_value, threshold.operator.value):
|
if alert_state.update(new_level, value, threshold_value, threshold.operator.value):
|
||||||
state_changes.append((metric_path, old_level, new_level, value))
|
state_changes.append((metric_path, old_level, new_level, value))
|
||||||
@@ -903,6 +1111,8 @@ class ThresholdChecker:
|
|||||||
value: Any,
|
value: Any,
|
||||||
threshold: ThresholdConfig,
|
threshold: ThresholdConfig,
|
||||||
plugin_data: Optional[Dict[str, Any]] = None,
|
plugin_data: Optional[Dict[str, Any]] = None,
|
||||||
|
check_name: Optional[str] = None,
|
||||||
|
metric_name: Optional[str] = None,
|
||||||
):
|
):
|
||||||
"""Trigger a notification for an alert state change.
|
"""Trigger a notification for an alert state change.
|
||||||
|
|
||||||
@@ -924,56 +1134,54 @@ class ThresholdChecker:
|
|||||||
|
|
||||||
# Format operator symbol
|
# Format operator symbol
|
||||||
op_symbol = threshold.operator.value
|
op_symbol = threshold.operator.value
|
||||||
|
|
||||||
|
# Short metric label: strip the plugin-name prefix and _status_code suffix
|
||||||
|
short_path = (metric_path.partition(".")[2] or metric_path).removesuffix("_status_code")
|
||||||
|
|
||||||
# Use a display-friendly value (inf is the sentinel for "overdue")
|
# Use a display-friendly value (inf is the sentinel for "overdue")
|
||||||
import math
|
import math
|
||||||
display_value = "overdue" if isinstance(value, float) and math.isinf(value) else value
|
display_value = "overdue" if isinstance(value, float) and math.isinf(value) else value
|
||||||
|
|
||||||
# Format message
|
# Format message — for the nagios operator there is no numeric threshold_value;
|
||||||
if new_level == AlertLevel.OK:
|
# render the display template whenever one is available.
|
||||||
lvl = "RECOVER"
|
has_display = threshold_value is not None or threshold.operator == ComparisonOperator.NAGIOS
|
||||||
message = f"{metric_path} = {display_value} ({old_level.name} -> OK)"
|
|
||||||
elif new_level == AlertLevel.WARNING:
|
def _fmt():
|
||||||
lvl = "WARNING"
|
return self._format_display(
|
||||||
if threshold_value is not None:
|
|
||||||
threshold_info = self._format_display(
|
|
||||||
threshold.display,
|
|
||||||
value=display_value,
|
|
||||||
threshold_value=threshold_value,
|
|
||||||
op_symbol=op_symbol,
|
|
||||||
plugin_data=plugin_data
|
|
||||||
)
|
|
||||||
message = f"{metric_path} = {display_value} {threshold_info}"
|
|
||||||
else:
|
|
||||||
message = f"{metric_path} = {display_value}"
|
|
||||||
elif new_level == AlertLevel.CRITICAL:
|
|
||||||
lvl = "CRITICAL"
|
|
||||||
if threshold_value is not None:
|
|
||||||
threshold_info = self._format_display(
|
|
||||||
threshold.display,
|
|
||||||
value=display_value,
|
|
||||||
threshold_value=threshold_value,
|
|
||||||
op_symbol=op_symbol,
|
|
||||||
plugin_data=plugin_data
|
|
||||||
)
|
|
||||||
message = f"{metric_path} = {display_value} {threshold_info}"
|
|
||||||
else:
|
|
||||||
message = f"{metric_path} = {display_value}"
|
|
||||||
else:
|
|
||||||
lvl = "UNKNOWN"
|
|
||||||
message = f"{metric_path} = {display_value}"
|
|
||||||
|
|
||||||
# Return the formatted threshold info for storing in AlertState
|
|
||||||
formatted_threshold_msg = None
|
|
||||||
if threshold_value is not None and new_level != AlertLevel.OK:
|
|
||||||
formatted_threshold_msg = self._format_display(
|
|
||||||
threshold.display,
|
threshold.display,
|
||||||
value=display_value,
|
value=display_value,
|
||||||
threshold_value=threshold_value,
|
threshold_value=threshold_value,
|
||||||
op_symbol=op_symbol,
|
op_symbol=op_symbol,
|
||||||
plugin_data=plugin_data
|
plugin_data=plugin_data,
|
||||||
|
check_name=check_name,
|
||||||
|
metric_name=metric_name,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
if new_level == AlertLevel.OK:
|
||||||
|
lvl = "RECOVER"
|
||||||
|
message = f"{short_path} = {display_value} ({old_level.name} -> OK)"
|
||||||
|
elif new_level == AlertLevel.WARNING:
|
||||||
|
lvl = "WARNING"
|
||||||
|
if has_display:
|
||||||
|
message = f"{short_path} = {display_value} {_fmt()}"
|
||||||
|
else:
|
||||||
|
message = f"{short_path} = {display_value}"
|
||||||
|
elif new_level == AlertLevel.CRITICAL:
|
||||||
|
lvl = "CRITICAL"
|
||||||
|
if has_display:
|
||||||
|
message = f"{short_path} = {display_value} {_fmt()}"
|
||||||
|
else:
|
||||||
|
message = f"{short_path} = {display_value}"
|
||||||
|
else:
|
||||||
|
lvl = "UNKNOWN"
|
||||||
|
if has_display:
|
||||||
|
message = f"{short_path} = {display_value} {_fmt()}"
|
||||||
|
else:
|
||||||
|
message = f"{short_path} = {display_value}"
|
||||||
|
|
||||||
|
# Formatted threshold info stored on AlertState for the UI
|
||||||
|
formatted_threshold_msg = _fmt() if has_display and new_level != AlertLevel.OK else None
|
||||||
|
|
||||||
return lvl, message, formatted_threshold_msg
|
return lvl, message, formatted_threshold_msg
|
||||||
|
|
||||||
def _send_notification(
|
def _send_notification(
|
||||||
@@ -987,23 +1195,28 @@ class ThresholdChecker:
|
|||||||
value: Any,
|
value: Any,
|
||||||
):
|
):
|
||||||
"""Send notification and log to journal/eventlog."""
|
"""Send notification and log to journal/eventlog."""
|
||||||
try:
|
from . import hbdclass
|
||||||
notify_mod.send_notification(
|
host = hbdclass.Host.hosts.get(host_name)
|
||||||
host_name,
|
if host is not None and not host.watched:
|
||||||
notify_mod.Notification(
|
eventlog(host_name, lvl, message, service="threshold")
|
||||||
title=f"[{lvl}] {host_name}",
|
return
|
||||||
body=message,
|
short_path = (metric_path.partition(".")[2] or metric_path).removesuffix("_status_code")
|
||||||
level=lvl,
|
title = f"[{lvl}] {host_name} {short_path}"
|
||||||
),
|
# Strip the "metric = " prefix from message so body is just the value/detail
|
||||||
)
|
prefix = short_path + " = "
|
||||||
logger.info("Notification sent: %s", message)
|
body = message[len(prefix):] if message.startswith(prefix) else message
|
||||||
except Exception as e:
|
asyncio.get_event_loop().create_task(notify_mod.send_notification(
|
||||||
logger.error("Failed to send notification: %s", e)
|
host_name,
|
||||||
|
notify_mod.Notification(
|
||||||
|
title=title,
|
||||||
|
body=body,
|
||||||
|
level=lvl,
|
||||||
|
),
|
||||||
|
))
|
||||||
|
|
||||||
# Log to journal
|
# Log to journal
|
||||||
if self.journal is not None:
|
if self.journal is not None:
|
||||||
try:
|
try:
|
||||||
import asyncio
|
|
||||||
loop = asyncio.get_event_loop()
|
loop = asyncio.get_event_loop()
|
||||||
loop.create_task(self.journal.log_threshold_event(
|
loop.create_task(self.journal.log_threshold_event(
|
||||||
host_name=host_name,
|
host_name=host_name,
|
||||||
@@ -1021,32 +1234,61 @@ class ThresholdChecker:
|
|||||||
self,
|
self,
|
||||||
display_format: str,
|
display_format: str,
|
||||||
value: Any,
|
value: Any,
|
||||||
threshold_value: float,
|
threshold_value: Optional[float],
|
||||||
op_symbol: str,
|
op_symbol: str,
|
||||||
plugin_data: Optional[Dict[str, Any]] = None,
|
plugin_data: Optional[Dict[str, Any]] = None,
|
||||||
|
check_name: Optional[str] = None,
|
||||||
|
metric_name: Optional[str] = None,
|
||||||
) -> str:
|
) -> str:
|
||||||
"""Format the display string using available data.
|
"""Format the display string using available data.
|
||||||
|
|
||||||
Args:
|
Available template variables:
|
||||||
display_format: Format string from threshold config
|
{value} - current metric value
|
||||||
value: Current metric value
|
{threshold_value} - threshold that was exceeded
|
||||||
threshold_value: Threshold value that was exceeded
|
{op_symbol} - comparison operator (>, <, >=, <=, ==, !=)
|
||||||
op_symbol: Comparison operator symbol
|
{check_name} - prefix stripped for generic threshold match
|
||||||
plugin_data: Optional dictionary of plugin data fields
|
(e.g. "check_disk_root" when metric
|
||||||
|
"check_disk_root_status_code" matched generic
|
||||||
|
threshold "status_code")
|
||||||
|
{metric_name} - field name within the plugin data dict
|
||||||
|
Any key from plugin_data is also available.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Formatted display string
|
Formatted display string
|
||||||
"""
|
"""
|
||||||
|
if not display_format:
|
||||||
|
display_format = "(threshold: {op_symbol} {threshold_value})" if threshold_value is not None else ""
|
||||||
|
|
||||||
# Build format context with standard variables
|
# Build format context with standard variables
|
||||||
format_context = {
|
format_context = {
|
||||||
'value': value,
|
'value': value,
|
||||||
'threshold_value': threshold_value,
|
|
||||||
'op_symbol': op_symbol,
|
'op_symbol': op_symbol,
|
||||||
}
|
}
|
||||||
|
if threshold_value is not None:
|
||||||
|
format_context['threshold_value'] = threshold_value
|
||||||
|
|
||||||
|
# Add generic-match context variables when available
|
||||||
|
if check_name is not None:
|
||||||
|
format_context['check_name'] = check_name
|
||||||
|
if metric_name is not None:
|
||||||
|
format_context['metric_name'] = metric_name
|
||||||
|
|
||||||
# Add all plugin data fields if available
|
# Add all plugin data fields if available
|
||||||
if plugin_data:
|
if plugin_data:
|
||||||
format_context.update(plugin_data)
|
format_context.update(plugin_data)
|
||||||
|
|
||||||
|
# For nagios_runner generic matches, expose the matched check's output
|
||||||
|
# and status as short aliases {output} and {status} so display templates
|
||||||
|
# don't need to use the full {check_disk_root_output} form.
|
||||||
|
if check_name and plugin_data:
|
||||||
|
if 'output' not in format_context:
|
||||||
|
output = plugin_data.get(f"{check_name}_output")
|
||||||
|
if output is not None:
|
||||||
|
format_context['output'] = output
|
||||||
|
if 'status' not in format_context:
|
||||||
|
status = plugin_data.get(f"{check_name}_status")
|
||||||
|
if status is not None:
|
||||||
|
format_context['status'] = status
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Format the display string
|
# Format the display string
|
||||||
@@ -1077,17 +1319,22 @@ class ThresholdChecker:
|
|||||||
value: Any,
|
value: Any,
|
||||||
threshold: ThresholdConfig,
|
threshold: ThresholdConfig,
|
||||||
plugin_data: Optional[Dict[str, Any]],
|
plugin_data: Optional[Dict[str, Any]],
|
||||||
|
check_name: Optional[str] = None,
|
||||||
|
metric_name: Optional[str] = None,
|
||||||
) -> None:
|
) -> None:
|
||||||
"""Handle a state-change transition with grace-period logic.
|
"""Handle a state-change transition with grace-period logic.
|
||||||
|
|
||||||
Transitioning INTO alert: defers the notification for grace_seconds.
|
Transitioning INTO alert (worsening): defers the notification for grace_seconds.
|
||||||
|
De-escalation within alert states (e.g. CRITICAL→WARNING): no new notification;
|
||||||
|
the metric is still alerting so no RECOVER was sent.
|
||||||
Transitioning TO OK:
|
Transitioning TO OK:
|
||||||
- Still in grace window (pending_since set): suppresses both the alert
|
- Still in grace window (pending_since set): suppresses both the alert
|
||||||
and the recovery — the spike never warranted a page.
|
and the recovery — the spike never warranted a page.
|
||||||
- Past grace: fires the RECOVER notification normally.
|
- Past grace: fires the RECOVER notification normally.
|
||||||
"""
|
"""
|
||||||
lvl, message, formatted_msg = self._trigger_notification(
|
lvl, message, formatted_msg = self._trigger_notification(
|
||||||
host_name, metric_path, old_level, new_level, value, threshold, plugin_data
|
host_name, metric_path, old_level, new_level, value, threshold, plugin_data,
|
||||||
|
check_name=check_name, metric_name=metric_name,
|
||||||
)
|
)
|
||||||
alert_state.formatted_message = formatted_msg
|
alert_state.formatted_message = formatted_msg
|
||||||
|
|
||||||
@@ -1100,12 +1347,20 @@ class ThresholdChecker:
|
|||||||
alert_state.pending_since = None
|
alert_state.pending_since = None
|
||||||
else:
|
else:
|
||||||
self._send_notification(host_name, lvl, message, metric_path, old_level, new_level, value)
|
self._send_notification(host_name, lvl, message, metric_path, old_level, new_level, value)
|
||||||
else:
|
elif new_level.value > old_level.value:
|
||||||
|
# Worsening (OK→WARNING, OK→CRITICAL, WARNING→CRITICAL): schedule notification.
|
||||||
alert_state.pending_since = time.time()
|
alert_state.pending_since = time.time()
|
||||||
logger.debug(
|
logger.debug(
|
||||||
"Alert deferred (%.0fs grace): %s on %s = %s",
|
"Alert deferred (%.0fs grace): %s on %s = %s",
|
||||||
self.grace_seconds, metric_path, host_name, value,
|
self.grace_seconds, metric_path, host_name, value,
|
||||||
)
|
)
|
||||||
|
else:
|
||||||
|
# De-escalation within alert states (e.g. CRITICAL→WARNING): metric is still
|
||||||
|
# alerting but did not recover, so no new notification.
|
||||||
|
logger.debug(
|
||||||
|
"De-escalation %s→%s for %s on %s, no notification",
|
||||||
|
old_level.name, new_level.name, metric_path, host_name,
|
||||||
|
)
|
||||||
|
|
||||||
def _check_pending_or_renotify(
|
def _check_pending_or_renotify(
|
||||||
self,
|
self,
|
||||||
@@ -1115,6 +1370,8 @@ class ThresholdChecker:
|
|||||||
value: Any,
|
value: Any,
|
||||||
threshold: ThresholdConfig,
|
threshold: ThresholdConfig,
|
||||||
plugin_data: Optional[Dict[str, Any]],
|
plugin_data: Optional[Dict[str, Any]],
|
||||||
|
check_name: Optional[str] = None,
|
||||||
|
metric_name: Optional[str] = None,
|
||||||
) -> None:
|
) -> None:
|
||||||
"""Called when alert level is unchanged and non-OK.
|
"""Called when alert level is unchanged and non-OK.
|
||||||
|
|
||||||
@@ -1124,7 +1381,8 @@ class ThresholdChecker:
|
|||||||
if alert_state.pending_since is not None:
|
if alert_state.pending_since is not None:
|
||||||
if time.time() - alert_state.pending_since >= self.grace_seconds:
|
if time.time() - alert_state.pending_since >= self.grace_seconds:
|
||||||
lvl, message, formatted_msg = self._trigger_notification(
|
lvl, message, formatted_msg = self._trigger_notification(
|
||||||
host_name, metric_path, AlertLevel.OK, alert_state.level, value, threshold, plugin_data
|
host_name, metric_path, AlertLevel.OK, alert_state.level, value, threshold, plugin_data,
|
||||||
|
check_name=check_name, metric_name=metric_name,
|
||||||
)
|
)
|
||||||
alert_state.formatted_message = formatted_msg
|
alert_state.formatted_message = formatted_msg
|
||||||
self._send_notification(
|
self._send_notification(
|
||||||
@@ -1133,7 +1391,7 @@ class ThresholdChecker:
|
|||||||
alert_state.pending_since = None
|
alert_state.pending_since = None
|
||||||
# else: still within grace window, do nothing
|
# else: still within grace window, do nothing
|
||||||
else:
|
else:
|
||||||
self._check_renotify(host_name, alert_state, metric_path, value, threshold, plugin_data)
|
self._check_renotify(host_name, alert_state, metric_path, value, threshold, plugin_data, check_name=check_name, metric_name=metric_name)
|
||||||
|
|
||||||
def _check_renotify(
|
def _check_renotify(
|
||||||
self,
|
self,
|
||||||
@@ -1143,6 +1401,8 @@ class ThresholdChecker:
|
|||||||
value: Any,
|
value: Any,
|
||||||
threshold: ThresholdConfig,
|
threshold: ThresholdConfig,
|
||||||
plugin_data: Optional[Dict[str, Any]] = None,
|
plugin_data: Optional[Dict[str, Any]] = None,
|
||||||
|
check_name: Optional[str] = None,
|
||||||
|
metric_name: Optional[str] = None,
|
||||||
):
|
):
|
||||||
"""Check if we should send a repeat notification.
|
"""Check if we should send a repeat notification.
|
||||||
|
|
||||||
@@ -1180,7 +1440,8 @@ class ThresholdChecker:
|
|||||||
|
|
||||||
# Format operator symbol
|
# Format operator symbol
|
||||||
op_symbol = threshold.operator.value
|
op_symbol = threshold.operator.value
|
||||||
|
short_path = (metric_path.partition(".")[2] or metric_path).removesuffix("_status_code")
|
||||||
|
|
||||||
# Time to re-notify
|
# Time to re-notify
|
||||||
if threshold_value is not None:
|
if threshold_value is not None:
|
||||||
# Use display format string
|
# Use display format string
|
||||||
@@ -1189,27 +1450,50 @@ class ThresholdChecker:
|
|||||||
value=value,
|
value=value,
|
||||||
threshold_value=threshold_value,
|
threshold_value=threshold_value,
|
||||||
op_symbol=op_symbol,
|
op_symbol=op_symbol,
|
||||||
plugin_data=plugin_data
|
plugin_data=plugin_data,
|
||||||
|
check_name=check_name,
|
||||||
|
metric_name=metric_name,
|
||||||
)
|
)
|
||||||
message = f"REMINDER ({alert_state.level.name}): {host_name} - {metric_path} = {value} {threshold_info}, ongoing for {int(now - alert_state.since)}s"
|
body = f"{value} {threshold_info}, ongoing for {int(now - alert_state.since)}s"
|
||||||
else:
|
else:
|
||||||
message = f"REMINDER ({alert_state.level.name}): {host_name} - {metric_path} = {value} (ongoing for {int(now - alert_state.since)}s)"
|
body = f"{value} (ongoing for {int(now - alert_state.since)}s)"
|
||||||
|
message = f"REMINDER ({alert_state.level.name}): {host_name} - {short_path} = {body}"
|
||||||
try:
|
|
||||||
notify_mod.send_notification(
|
from . import hbdclass
|
||||||
|
host = hbdclass.Host.hosts.get(host_name)
|
||||||
|
if host is None or host.watched:
|
||||||
|
asyncio.get_event_loop().create_task(notify_mod.send_notification(
|
||||||
host_name,
|
host_name,
|
||||||
notify_mod.Notification(
|
notify_mod.Notification(
|
||||||
title=f"[REMINDER/{alert_state.level.name}] {host_name}",
|
title=f"[REMINDER/{alert_state.level.name}] {host_name} {short_path}",
|
||||||
body=message,
|
body=body,
|
||||||
level=alert_state.level.name,
|
level=alert_state.level.name,
|
||||||
),
|
),
|
||||||
)
|
))
|
||||||
alert_state.last_notification = now
|
|
||||||
alert_state.notification_count += 1
|
|
||||||
logger.info("Re-notification sent: %s", message)
|
logger.info("Re-notification sent: %s", message)
|
||||||
except Exception as e:
|
alert_state.last_notification = now
|
||||||
logger.error("Failed to send re-notification: %s", e)
|
alert_state.notification_count += 1
|
||||||
|
|
||||||
|
def purge_stale_alerts(self, hbdclass) -> None:
|
||||||
|
"""Remove alert states that have no matching threshold configuration.
|
||||||
|
|
||||||
|
Called after startup (pickle restore) and after each config reload so
|
||||||
|
that alerts orphaned by configuration changes do not linger forever.
|
||||||
|
Alerts whose metric_path is not present in the current threshold config
|
||||||
|
for that host are silently dropped.
|
||||||
|
"""
|
||||||
|
for hostname, host in hbdclass.Host.hosts.items():
|
||||||
|
if not host.alert_states:
|
||||||
|
continue
|
||||||
|
configured = self.get_thresholds_for_host(hostname)
|
||||||
|
stale = [mp for mp in host.alert_states if self._find_threshold(configured, mp)[0] is None]
|
||||||
|
for mp in stale:
|
||||||
|
logger.info(
|
||||||
|
"Purging stale alert state for %s / %s (no threshold configured)",
|
||||||
|
hostname, mp,
|
||||||
|
)
|
||||||
|
del host.alert_states[mp]
|
||||||
|
|
||||||
def get_active_alerts(self, alert_states: Dict[str, AlertState]) -> list:
|
def get_active_alerts(self, alert_states: Dict[str, AlertState]) -> list:
|
||||||
"""
|
"""
|
||||||
Get all currently active (non-OK) alerts.
|
Get all currently active (non-OK) alerts.
|
||||||
|
|||||||
+44
-29
@@ -211,10 +211,11 @@ def _make_timer_callbacks(uname, host, ctx):
|
|||||||
connection.newstate(connection.__class__.OVERDUE, now, cfg.get("grace", 2))
|
connection.newstate(connection.__class__.OVERDUE, now, cfg.get("grace", 2))
|
||||||
msg = f"{connection.afam} overdue"
|
msg = f"{connection.afam} overdue"
|
||||||
eventlog(uname, "CRITICAL", msg)
|
eventlog(uname, "CRITICAL", msg)
|
||||||
notify_mod.send_notification(
|
if host.watched:
|
||||||
uname,
|
asyncio.create_task(notify_mod.send_notification(
|
||||||
notify_mod.Notification(title=f"[CRITICAL] {uname}", body=msg, level="CRITICAL"),
|
uname,
|
||||||
)
|
notify_mod.Notification(title=f"[CRITICAL] {uname}", body=msg, level="CRITICAL"),
|
||||||
|
))
|
||||||
# Track in alert_states so the Alerts Dashboard shows this
|
# Track in alert_states so the Alerts Dashboard shows this
|
||||||
_set_connectivity_alert(host, connection.afam, "CRITICAL")
|
_set_connectivity_alert(host, connection.afam, "CRITICAL")
|
||||||
if threshold_checker:
|
if threshold_checker:
|
||||||
@@ -315,7 +316,6 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
|
|
||||||
cfg = ctx.get("config", {})
|
cfg = ctx.get("config", {})
|
||||||
hbdcls = ctx.get("hbdclass")
|
hbdcls = ctx.get("hbdclass")
|
||||||
log = ctx.get("log")
|
|
||||||
msg_to_websockets = ctx.get("msg_to_websockets")
|
msg_to_websockets = ctx.get("msg_to_websockets")
|
||||||
DEBUG = ctx.get("DEBUG", 0)
|
DEBUG = ctx.get("DEBUG", 0)
|
||||||
verbose = ctx.get("verbose", False)
|
verbose = ctx.get("verbose", False)
|
||||||
@@ -336,8 +336,7 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
# Apply user-access settings from config
|
# Apply user-access settings from config
|
||||||
access = config_mod.get_host_access(cfg, uname)
|
access = config_mod.get_host_access(cfg, uname)
|
||||||
host.apply_access(access["owner"], access["managers"], access["monitors"])
|
host.apply_access(access["owner"], access["managers"], access["monitors"])
|
||||||
if verbose:
|
logger.info("New host signed on: %s (dyn=%s, access=%s)", uname, host.dyn, access)
|
||||||
print(("XX: New host, num now %s" % (len(hbdcls.Host.hosts))))
|
|
||||||
newh = True
|
newh = True
|
||||||
else:
|
else:
|
||||||
host = hbdcls.Host.hosts[uname]
|
host = hbdcls.Host.hosts[uname]
|
||||||
@@ -351,8 +350,10 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
|
|
||||||
if msg.get("ID") == "HTB":
|
if msg.get("ID") == "HTB":
|
||||||
host.doesack = msg.get("acks", -1)
|
host.doesack = msg.get("acks", -1)
|
||||||
# send ACK back
|
# send ACK back; ask client to resend plugin info when we have none yet
|
||||||
rmsg = {"time": time.time()}
|
rmsg = {"time": time.time()}
|
||||||
|
if not host.plugin_data:
|
||||||
|
rmsg["request_update"] = 1
|
||||||
opkt = dicttos("ACK", rmsg)
|
opkt = dicttos("ACK", rmsg)
|
||||||
try:
|
try:
|
||||||
transport.sendto(opkt, addr)
|
transport.sendto(opkt, addr)
|
||||||
@@ -369,6 +370,14 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
if k not in ("ID", "plugin", "id", "name")}
|
if k not in ("ID", "plugin", "id", "name")}
|
||||||
# Store plugin data with timestamp
|
# Store plugin data with timestamp
|
||||||
host.add_plugin_data(plugin_name, plugin_data, timestamp=now)
|
host.add_plugin_data(plugin_name, plugin_data, timestamp=now)
|
||||||
|
|
||||||
|
# If os_info reports an owner and none is configured server-side, apply it
|
||||||
|
if plugin_name == "os_info":
|
||||||
|
config_owner = config_mod.get_host_access(cfg, uname).get("owner")
|
||||||
|
default_owner = config_mod.get_default_owner(cfg)
|
||||||
|
inferred_owner = plugin_data.get("owner", config_owner or default_owner)
|
||||||
|
host.owner = inferred_owner
|
||||||
|
logger.info(f"owner for {uname} is '{host.owner}")
|
||||||
if DEBUG > 1:
|
if DEBUG > 1:
|
||||||
print(f"Stored plugin data for {uname}: {plugin_name}")
|
print(f"Stored plugin data for {uname}: {plugin_name}")
|
||||||
|
|
||||||
@@ -408,10 +417,11 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
|
|
||||||
if res:
|
if res:
|
||||||
eventlog(uname, "WARNING", res)
|
eventlog(uname, "WARNING", res)
|
||||||
notify_mod.send_notification(
|
if host.watched:
|
||||||
uname,
|
asyncio.create_task(notify_mod.send_notification(
|
||||||
notify_mod.Notification(title=f"[WARNING] {uname}", body=res, level="WARNING"),
|
uname,
|
||||||
)
|
notify_mod.Notification(title=f"[WARNING] {uname}", body=res, level="WARNING"),
|
||||||
|
))
|
||||||
|
|
||||||
interval = int(msg.get("interval", 0) or 0)
|
interval = int(msg.get("interval", 0) or 0)
|
||||||
shutdown = msg.get("shutdown", 0)
|
shutdown = msg.get("shutdown", 0)
|
||||||
@@ -421,10 +431,11 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
|
|
||||||
if boot:
|
if boot:
|
||||||
eventlog(uname, "INFO", "booted")
|
eventlog(uname, "INFO", "booted")
|
||||||
notify_mod.send_notification(
|
if host.watched:
|
||||||
uname,
|
asyncio.create_task(notify_mod.send_notification(
|
||||||
notify_mod.Notification(title=f"[INFO] {uname}", body=f"{host.name} booted", level="INFO"),
|
uname,
|
||||||
)
|
notify_mod.Notification(title=f"[INFO] {uname}", body=f"{host.name} booted", level="INFO"),
|
||||||
|
))
|
||||||
if message:
|
if message:
|
||||||
eventlog(uname, "INFO", "msg: %s" % message, service=service)
|
eventlog(uname, "INFO", "msg: %s" % message, service=service)
|
||||||
|
|
||||||
@@ -438,13 +449,18 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
if not newh:
|
if not newh:
|
||||||
if d == 0 or lasts == "unknown":
|
if d == 0 or lasts == "unknown":
|
||||||
m = "%s is up" % (conn.afam)
|
m = "%s is up" % (conn.afam)
|
||||||
|
elif d < 4:
|
||||||
|
# Transient blip (likely client restart) — skip log and notification
|
||||||
|
m = None
|
||||||
else:
|
else:
|
||||||
m = "%s back after being %s for %s" % (conn.afam, lasts, dur(d))
|
m = "%s back after being %s for %s" % (conn.afam, lasts, dur(d))
|
||||||
eventlog(uname, "RECOVER", m)
|
if m:
|
||||||
notify_mod.send_notification(
|
eventlog(uname, "RECOVER", m)
|
||||||
uname,
|
if host.watched:
|
||||||
notify_mod.Notification(title=f"[RECOVER] {uname}", body=m, level="RECOVER"),
|
asyncio.create_task(notify_mod.send_notification(
|
||||||
)
|
uname,
|
||||||
|
notify_mod.Notification(title=f"[RECOVER] {uname}", body=m, level="RECOVER"),
|
||||||
|
))
|
||||||
|
|
||||||
if boot or newh:
|
if boot or newh:
|
||||||
host.upcount = host.doesack
|
host.upcount = host.doesack
|
||||||
@@ -454,10 +470,11 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
if shutdown:
|
if shutdown:
|
||||||
m = "%s shutdown" % conn.afam
|
m = "%s shutdown" % conn.afam
|
||||||
eventlog(uname, "INFO", m)
|
eventlog(uname, "INFO", m)
|
||||||
notify_mod.send_notification(
|
if host.watched:
|
||||||
uname,
|
asyncio.create_task(notify_mod.send_notification(
|
||||||
notify_mod.Notification(title=f"[INFO] {uname}", body=m, level="INFO"),
|
uname,
|
||||||
)
|
notify_mod.Notification(title=f"[INFO] {uname}", body=m, level="INFO"),
|
||||||
|
))
|
||||||
conn.newstate(hbdcls.Connection.DOWN, now)
|
conn.newstate(hbdcls.Connection.DOWN, now)
|
||||||
_set_connectivity_alert(host, conn.afam, "CRITICAL")
|
_set_connectivity_alert(host, conn.afam, "CRITICAL")
|
||||||
|
|
||||||
@@ -491,12 +508,10 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
op, rmsg = host.cmds[0]
|
op, rmsg = host.cmds[0]
|
||||||
if op == "CMD":
|
if op == "CMD":
|
||||||
del host.cmds[0]
|
del host.cmds[0]
|
||||||
if log:
|
eventlog(uname, "INFO", "command sent")
|
||||||
log(uname, "command sent")
|
|
||||||
elif op == "UPD":
|
elif op == "UPD":
|
||||||
del host.cmds[0]
|
del host.cmds[0]
|
||||||
if log:
|
eventlog(uname, "INFO", "update initiated")
|
||||||
log(uname, "update initiated")
|
|
||||||
opkt = dicttos(op, rmsg)
|
opkt = dicttos(op, rmsg)
|
||||||
try:
|
try:
|
||||||
transport.sendto(opkt, addr)
|
transport.sendto(opkt, addr)
|
||||||
|
|||||||
@@ -146,9 +146,14 @@ def load_users(config: dict) -> dict:
|
|||||||
Returns the new ``users`` dict.
|
Returns the new ``users`` dict.
|
||||||
"""
|
"""
|
||||||
global users
|
global users
|
||||||
|
old_users = dict(users) # snapshot before rebuild
|
||||||
users_cfg = config.get("users", {})
|
users_cfg = config.get("users", {})
|
||||||
if not isinstance(users_cfg, dict):
|
if not isinstance(users_cfg, dict):
|
||||||
users = {}
|
users = {}
|
||||||
|
# Preserve OAuth-provisioned users (password_hash == "") that aren't in config.
|
||||||
|
for username, existing_user in old_users.items():
|
||||||
|
if not existing_user.password_hash and username not in users:
|
||||||
|
users[username] = existing_user
|
||||||
return users
|
return users
|
||||||
|
|
||||||
result: dict = {}
|
result: dict = {}
|
||||||
@@ -166,6 +171,10 @@ def load_users(config: dict) -> dict:
|
|||||||
)
|
)
|
||||||
|
|
||||||
users = result
|
users = result
|
||||||
|
# Preserve OAuth-provisioned users (password_hash == "") that aren't in config.
|
||||||
|
for username, existing_user in old_users.items():
|
||||||
|
if not existing_user.password_hash and username not in users:
|
||||||
|
users[username] = existing_user
|
||||||
logger.info("Loaded %d user(s) from config", len(users))
|
logger.info("Loaded %d user(s) from config", len(users))
|
||||||
return users
|
return users
|
||||||
|
|
||||||
@@ -187,6 +196,26 @@ def authenticate(username: str, password: str) -> "User | None":
|
|||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def provision_oauth_user(username: str, full_name: str, avatar: str) -> "User":
|
||||||
|
"""Create or update a user sourced from an OAuth2 provider.
|
||||||
|
|
||||||
|
New users are inserted with no password_hash — they can only authenticate
|
||||||
|
via OAuth. Existing users (e.g. defined in config with a password) have
|
||||||
|
their display name and avatar refreshed; all other attributes are preserved.
|
||||||
|
"""
|
||||||
|
user = users.get(username)
|
||||||
|
if user is None:
|
||||||
|
user = User(username=username, full_name=full_name, avatar=avatar)
|
||||||
|
users[username] = user
|
||||||
|
logger.info("Provisioned OAuth user %r", username)
|
||||||
|
else:
|
||||||
|
if full_name:
|
||||||
|
user.full_name = full_name
|
||||||
|
if avatar:
|
||||||
|
user.avatar = avatar
|
||||||
|
return user
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Session management
|
# Session management
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|||||||
+59
-12
@@ -13,7 +13,8 @@ from . import data
|
|||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
_connections: set = set()
|
# Map of WebSocket → User object (or None when auth is disabled)
|
||||||
|
_connections: dict = {}
|
||||||
_loop: Optional[asyncio.AbstractEventLoop] = None
|
_loop: Optional[asyncio.AbstractEventLoop] = None
|
||||||
_get_hosts: Optional[Callable[[], Iterable]] = None
|
_get_hosts: Optional[Callable[[], Iterable]] = None
|
||||||
_verbose: bool = False
|
_verbose: bool = False
|
||||||
@@ -34,31 +35,63 @@ def setup(
|
|||||||
_verbose = verbose
|
_verbose = verbose
|
||||||
|
|
||||||
|
|
||||||
|
def _user_can_see_host(user, host_name: str) -> bool:
|
||||||
|
"""Return True if *user* may see updates for *host_name* (manager or higher)."""
|
||||||
|
from . import hbdclass, users as users_mod
|
||||||
|
if user is None or not users_mod.users_enabled():
|
||||||
|
return True
|
||||||
|
if user.admin:
|
||||||
|
return True
|
||||||
|
host = hbdclass.Host.hosts.get(host_name)
|
||||||
|
if host is None:
|
||||||
|
return False
|
||||||
|
return host.is_manager(user.username)
|
||||||
|
|
||||||
|
|
||||||
|
def _get_token(request) -> str:
|
||||||
|
"""Extract session token from request (mirrors logic in http.py)."""
|
||||||
|
auth = request.headers.get("Authorization", "")
|
||||||
|
if auth.startswith("Bearer "):
|
||||||
|
return auth[7:].strip()
|
||||||
|
token = request.headers.get("X-Auth-Token", "")
|
||||||
|
if token:
|
||||||
|
return token
|
||||||
|
return request.cookies.get("hbd_session", "")
|
||||||
|
|
||||||
|
|
||||||
async def handler(request):
|
async def handler(request):
|
||||||
"""aiohttp WebSocket upgrade handler — register as GET /ws."""
|
"""aiohttp WebSocket upgrade handler — register as GET /ws."""
|
||||||
from aiohttp import web
|
from aiohttp import web
|
||||||
|
from . import users as users_mod
|
||||||
|
|
||||||
ws = web.WebSocketResponse()
|
ws = web.WebSocketResponse()
|
||||||
await ws.prepare(request)
|
await ws.prepare(request)
|
||||||
|
|
||||||
_connections.add(ws)
|
token = _get_token(request)
|
||||||
|
user = users_mod.get_session_user(token) if token else None
|
||||||
|
|
||||||
|
_connections[ws] = user
|
||||||
remote = request.remote
|
remote = request.remote
|
||||||
logger.info("WebSocket connected from %s", remote)
|
logger.info("WebSocket connected from %s", remote)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Send current host state to the new client
|
# Send current host state, filtered to hosts this user may see
|
||||||
if _get_hosts:
|
if _get_hosts:
|
||||||
try:
|
try:
|
||||||
for h in list(_get_hosts()):
|
for h in list(_get_hosts()):
|
||||||
await ws.send_str(json.dumps({"type": "host", "data": h}))
|
host_name = h.get("raw_name") or h.get("name", "")
|
||||||
|
if _user_can_see_host(user, host_name):
|
||||||
|
await ws.send_str(json.dumps({"type": "host", "data": h}))
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error("Error sending initial hosts: %s", e)
|
logger.error("Error sending initial hosts: %s", e)
|
||||||
|
|
||||||
# Send recent messages
|
# Send recent messages, filtered to hosts this user may see
|
||||||
if data.msgs:
|
if data.msgs:
|
||||||
try:
|
try:
|
||||||
for m in data.msgs:
|
for m in data.msgs:
|
||||||
await ws.send_str(json.dumps({"type": "message", "data": m}))
|
host_name = m.get("host") if isinstance(m, dict) else None
|
||||||
|
if not host_name or _user_can_see_host(user, host_name):
|
||||||
|
await ws.send_str(json.dumps({"type": "message", "data": m}))
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error("Error sending initial messages: %s", e)
|
logger.error("Error sending initial messages: %s", e)
|
||||||
|
|
||||||
@@ -74,7 +107,7 @@ async def handler(request):
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.exception("WebSocket handler error from %s: %s", remote, e)
|
logger.exception("WebSocket handler error from %s: %s", remote, e)
|
||||||
finally:
|
finally:
|
||||||
_connections.discard(ws)
|
_connections.pop(ws, None)
|
||||||
logger.info("WebSocket disconnected from %s", remote)
|
logger.info("WebSocket disconnected from %s", remote)
|
||||||
|
|
||||||
return ws
|
return ws
|
||||||
@@ -83,25 +116,39 @@ async def handler(request):
|
|||||||
def broadcast(typ: str, payload) -> bool:
|
def broadcast(typ: str, payload) -> bool:
|
||||||
"""Thread-safe broadcast to all connected WebSocket clients.
|
"""Thread-safe broadcast to all connected WebSocket clients.
|
||||||
|
|
||||||
|
For host and plugin updates, only sends to clients whose user has
|
||||||
|
manager-or-higher access to that host. Other message types are
|
||||||
|
broadcast to all clients.
|
||||||
|
|
||||||
Can be called from any thread; schedules sends on the event loop.
|
Can be called from any thread; schedules sends on the event loop.
|
||||||
Returns False if the loop is not running yet.
|
Returns False if the loop is not running yet.
|
||||||
"""
|
"""
|
||||||
if not _loop:
|
if not _loop:
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
# Determine the host name for access-filtered message types
|
||||||
|
host_name: Optional[str] = None
|
||||||
|
if typ in ("host", "plugin"):
|
||||||
|
host_name = payload.get("raw_name") or payload.get("host") or payload.get("name")
|
||||||
|
elif typ == "message" and isinstance(payload, dict):
|
||||||
|
host_name = payload.get("host")
|
||||||
|
|
||||||
jmsg = json.dumps({"type": typ, "data": payload})
|
jmsg = json.dumps({"type": typ, "data": payload})
|
||||||
|
|
||||||
async def _send_all():
|
async def _send_all():
|
||||||
dead = set()
|
dead = set()
|
||||||
for ws in list(_connections):
|
for ws, user in list(_connections.items()):
|
||||||
try:
|
try:
|
||||||
if not ws.closed:
|
if ws.closed:
|
||||||
await ws.send_str(jmsg)
|
|
||||||
else:
|
|
||||||
dead.add(ws)
|
dead.add(ws)
|
||||||
|
continue
|
||||||
|
if host_name is not None and not _user_can_see_host(user, host_name):
|
||||||
|
continue
|
||||||
|
await ws.send_str(jmsg)
|
||||||
except Exception:
|
except Exception:
|
||||||
dead.add(ws)
|
dead.add(ws)
|
||||||
for ws in dead:
|
for ws in dead:
|
||||||
_connections.discard(ws)
|
_connections.pop(ws, None)
|
||||||
|
|
||||||
asyncio.run_coroutine_threadsafe(_send_all(), _loop)
|
asyncio.run_coroutine_threadsafe(_send_all(), _loop)
|
||||||
return True
|
return True
|
||||||
|
|||||||
+7
-1
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
|||||||
|
|
||||||
[project]
|
[project]
|
||||||
name = "hbd"
|
name = "hbd"
|
||||||
version = "5.1.4"
|
version = "5.2.5"
|
||||||
description = "Heartbeat monitoring system — client (hbc) and server (hbd)"
|
description = "Heartbeat monitoring system — client (hbc) and server (hbd)"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
requires-python = ">=3.11"
|
requires-python = ">=3.11"
|
||||||
@@ -34,6 +34,9 @@ server = [
|
|||||||
"matrix-nio>=0.24",
|
"matrix-nio>=0.24",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
# Minimal client — hbc_mini only, no external dependencies
|
||||||
|
mini = []
|
||||||
|
|
||||||
# Install both client and server
|
# Install both client and server
|
||||||
all = [
|
all = [
|
||||||
"hbd[client,server]",
|
"hbd[client,server]",
|
||||||
@@ -54,6 +57,9 @@ dev = [
|
|||||||
hbd = "hbd.server.cli:main"
|
hbd = "hbd.server.cli:main"
|
||||||
hbc = "hbd.client.main:main"
|
hbc = "hbd.client.main:main"
|
||||||
|
|
||||||
|
[tool.setuptools]
|
||||||
|
script-files = ["scripts/hb_install.sh", "scripts/hbc_mini.py"]
|
||||||
|
|
||||||
[tool.setuptools.packages.find]
|
[tool.setuptools.packages.find]
|
||||||
where = ["."]
|
where = ["."]
|
||||||
include = ["hbd*"]
|
include = ["hbd*"]
|
||||||
|
|||||||
@@ -4,12 +4,14 @@ set -e
|
|||||||
uv version --bump patch
|
uv version --bump patch
|
||||||
VER=$(uv version --short)
|
VER=$(uv version --short)
|
||||||
sed -i".bak" "s/__version__ = \"[0-9.]*\"\(.*\)$/__version__ = \"$VER\"\1/" hbd/__init__.py
|
sed -i".bak" "s/__version__ = \"[0-9.]*\"\(.*\)$/__version__ = \"$VER\"\1/" hbd/__init__.py
|
||||||
|
sed -i".bak" "s/__version__ = \"[0-9.]*\"\(.*\)$/__version__ = \"$VER\"\1/" scripts/hbc_mini.py
|
||||||
|
|
||||||
# commit pyproject.toml
|
# commit pyproject.toml
|
||||||
git commit -m "version $VER" pyproject.toml hbd/__init__.py
|
git commit -m "version $VER" pyproject.toml hbd/__init__.py scripts/hbc_mini.py
|
||||||
git push
|
git push
|
||||||
# tag version
|
# tag version
|
||||||
git tag -a v$VER -m "Version $VER"
|
git tag -a v$VER -m "Version $VER"
|
||||||
git push --tags
|
git push --tags
|
||||||
|
|
||||||
rm hbd/__init__.py.bak
|
rm hbd/__init__.py.bak
|
||||||
|
rm scripts/hbc_mini.py.bak
|
||||||
|
|||||||
@@ -0,0 +1,2 @@
|
|||||||
|
hbc_mini
|
||||||
|
hbc_mini_dbg
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
CC ?= cc
|
||||||
|
CFLAGS = -O2 -Wall -Wextra -std=c11
|
||||||
|
LDFLAGS = -lz -lpthread -lm
|
||||||
|
TARGET = hbc_mini
|
||||||
|
SRC = hbc_mini.c
|
||||||
|
|
||||||
|
# FreeBSD/NetBSD keep zlib in base; no extra flags needed.
|
||||||
|
# On some NetBSD installs pthreads may need -lpthread from pkgsrc.
|
||||||
|
|
||||||
|
.PHONY: all clean debug
|
||||||
|
|
||||||
|
all: $(TARGET)
|
||||||
|
|
||||||
|
$(TARGET): $(SRC)
|
||||||
|
$(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
|
||||||
|
|
||||||
|
debug: $(SRC)
|
||||||
|
$(CC) -g -fsanitize=address,undefined -o $(TARGET)_dbg $< $(LDFLAGS)
|
||||||
|
|
||||||
|
clean:
|
||||||
|
rm -f $(TARGET) $(TARGET)_dbg
|
||||||
File diff suppressed because it is too large
Load Diff
Executable
+115
@@ -0,0 +1,115 @@
|
|||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
# Helper script to install the heartbeat tools. By default, it will only
|
||||||
|
# install the heartbeat client, hbc. The server is installed when the arg 'server' is passed
|
||||||
|
# to the script. The script will install the heartbeat tools in a python
|
||||||
|
# virtual environment in ~/venvs/hbd. The hbd and hbc commands will be
|
||||||
|
# installed from the wheel and symlinked to ~/bin/hbd and ~/bin/hbc,
|
||||||
|
# respectively. If the virtual environment already exists, it will be
|
||||||
|
# reused. The script will also remove any existing symlinks for hbd and hbc
|
||||||
|
# in ~/bin before creating new ones.
|
||||||
|
|
||||||
|
set -e
|
||||||
|
what=$1
|
||||||
|
on_ha=0
|
||||||
|
where=""
|
||||||
|
venv=""
|
||||||
|
[ "$2" = "HA" ] && on_ha=1
|
||||||
|
[ -z "$what" ] && what="client"
|
||||||
|
|
||||||
|
if [ -d /homeassistant ]; then # if running from HA command line
|
||||||
|
echo "HA, running \"docker exec homeassistant /config/bin/hb_install.sh $@\""
|
||||||
|
docker exec homeassistant /config/bin/hb_install.sh $@ HA
|
||||||
|
rc=$?
|
||||||
|
if [ $rc -ne 0 ]; then
|
||||||
|
echo "Failed to install heartbeat in HA, please check the logs for more details"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ $on_ha -eq 1 ] || [ -r /.dockerenv ] && [ -d /config/bin ]; then
|
||||||
|
# Installing under docker on Home Assistant OS, using /config/bin for executables and /config/venvs for virtual environments
|
||||||
|
echo "Home Assistant OS detected, installing under docker"
|
||||||
|
where="/config/bin"
|
||||||
|
venv="/config/venvs"
|
||||||
|
else
|
||||||
|
if [ ! -d $HOME/.local/bin ] && [ ! -d $HOME/bin ]; then
|
||||||
|
echo "No suitable bin directory found in PATH, please add either $HOME/.local/bin or $HOME/bin to your PATH"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
for where in $HOME/bin $HOME/.local/bin notset ; do
|
||||||
|
if echo ":$PATH:" | grep -q ":$where:" ; then
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
if [ "$where" = "notset" ]; then
|
||||||
|
echo "No suitable bin directory found in PATH, please add either $HOME/.local/bin or $HOME/bin to your PATH"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if [ "$what" = "mini" ]; then
|
||||||
|
venv=""
|
||||||
|
else
|
||||||
|
venv="$HOME/venvs"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
echo "Installing $what to $where"
|
||||||
|
if [ ! -z "$venv" ]; then
|
||||||
|
echo "Using virtual environment at $venv/hbd"
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$venv" != "" ] && [ ! -d $venv/hbd ]; then
|
||||||
|
arg=""
|
||||||
|
have_pip=$(python3 -c "import pip" 2>/dev/null &> /dev/null && echo "Installed" || echo "Not Installed")
|
||||||
|
if [ "$have_pip" = "Not Installed" ]; then
|
||||||
|
# some systems do not have pip installed by default, so we need to fetch get-pip.py and install pip
|
||||||
|
echo "pip is not installed, fetching get-pip.py and installing pip"
|
||||||
|
arg="--without-pip"
|
||||||
|
fi
|
||||||
|
mkdir -p $venv
|
||||||
|
have_venv=$(python3 -c "import venv" 2>/dev/null &> /dev/null && echo "Installed" || echo "Not Installed")
|
||||||
|
if [ "$have_venv" = "Not Installed" ]; then
|
||||||
|
if [ "$have_pip" = "Not Installed" ]; then
|
||||||
|
echo "python has no venv, and no pip to install virtualenv, cannot continue"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
echo "python venv module not found, installing virtualenv"
|
||||||
|
python3 -m pip install --user virtualenv
|
||||||
|
python3 -m virtualenv $venv/hbd --system-site-packages $arg
|
||||||
|
else
|
||||||
|
python3 -m venv $venv/hbd --system-site-packages $arg
|
||||||
|
fi
|
||||||
|
. $venv/hbd/bin/activate
|
||||||
|
if [ -n "$arg" ]; then
|
||||||
|
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && python3 get-pip.py
|
||||||
|
fi
|
||||||
|
deactivate
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ ! -z "$venv" ]; then
|
||||||
|
. $venv/hbd/bin/activate
|
||||||
|
fi
|
||||||
|
if [ "$what" = "mini" ]; then
|
||||||
|
curl -s -o $where/hbc_mini https://git.wrede.ca/andreas/heartbeat/raw/branch/master/scripts/hbc_mini.py
|
||||||
|
chmod +x $where/hbc_mini
|
||||||
|
else
|
||||||
|
python3 -mpip install --upgrade --index-url https://git.wrede.ca/api/packages/andreas/pypi/simple/ --extra-index-url https://pypi.org/simple hbd[$what]
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ ! -z "$venv" ]; then
|
||||||
|
echo "linking executables to $where"
|
||||||
|
if [ "$what" = "server" ]; then
|
||||||
|
rm -f $where/hbd
|
||||||
|
ln -sf $(which hbd) $where/hbd
|
||||||
|
elif [ "$what" = "client" ]; then
|
||||||
|
rm -f $where/hbc
|
||||||
|
ln -sf $(which hbc) $where/hbc
|
||||||
|
fi
|
||||||
|
rm -f $where/hb_install.sh
|
||||||
|
ln -sf $(which hb_install.sh) $where/hb_install.sh
|
||||||
|
fi
|
||||||
|
echo "Installation complete. To upgrade, run the following:"
|
||||||
|
echo " $where/hb_install.sh $what"
|
||||||
|
echo "To install on another machine, run the following obtain the install script and run it:"
|
||||||
|
echo "from https://git.wrede.ca/andreas/heartbeat/raw/branch/master/scripts/hb_install.sh"
|
||||||
|
echo "and then run sh hb_install.sh [mini|client]"
|
||||||
Executable
+1192
File diff suppressed because it is too large
Load Diff
@@ -1,88 +0,0 @@
|
|||||||
#!/bin/sh
|
|
||||||
|
|
||||||
# install the heartbeat client, hbc. The server is installed when the arg 'server' is passed
|
|
||||||
# install the heartbeat client, hbc. The server is installed when the arg 'server' is passed
|
|
||||||
# to the script. The script will install the heartbeat tools in a python
|
|
||||||
# virtual environment in ~/venvs/hbd. The hbd and hbc commands will be
|
|
||||||
# installed from the wheel and symlinked to ~/bin/hbd and ~/bin/hbc,
|
|
||||||
# respectively. If the virtual environment already exists, it will be
|
|
||||||
# reused. The script will also remove any existing symlinks for hbd and hbc
|
|
||||||
# in ~/bin before creating new ones.
|
|
||||||
|
|
||||||
|
|
||||||
# hbd/hbc from wheel and create symlinks for hbd and hbc in ~/bin
|
|
||||||
|
|
||||||
set -e
|
|
||||||
what=$1
|
|
||||||
on_ha=0
|
|
||||||
[ -z "$what" ] && what="client"
|
|
||||||
|
|
||||||
if [ -d /homeassistant ]; then
|
|
||||||
echo "cannot install in HA, run \"docker exec -it homeassistant $0 $@\""
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
if [ -d /config ]; then
|
|
||||||
echo "Installing on HA"
|
|
||||||
where="/config/bin"
|
|
||||||
venv="/config/venvs"
|
|
||||||
on_ha=1
|
|
||||||
else
|
|
||||||
if [ ! -d $HOME/.local/bin ] && [ ! -d $HOME/bin ]; then
|
|
||||||
echo "No suitable bin directory found in PATH, please add either $HOME/.local/bin or $HOME/bin to your PATH"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
for where in $HOME/bin $HOME/.local/bin notset ; do
|
|
||||||
if echo ":$PATH:" | grep -q ":$where:" ; then
|
|
||||||
break
|
|
||||||
fi
|
|
||||||
done
|
|
||||||
if [ "$where" = "notset" ]; then
|
|
||||||
echo "No suitable bin directory found in PATH, please add either $HOME/.local/bin or $HOME/bin to your PATH"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
venv="$HOME/venvs"
|
|
||||||
fi
|
|
||||||
|
|
||||||
echo "Installing heartbeat $what"
|
|
||||||
|
|
||||||
if [ ! -d $venv/hbd ]; then
|
|
||||||
python3 -m pip --version > /dev/null 2>&1
|
|
||||||
if [ $? -ne 0 ]; then
|
|
||||||
# truenas does not have pip installed by default, so we need to fetch get-pip.py and install pip
|
|
||||||
echo "pip is not installed, fetching get-pip.py and installing pip"
|
|
||||||
arg="--without-pip"
|
|
||||||
fi
|
|
||||||
mkdir -p $venv
|
|
||||||
have_venv=$(python3 -c "import venv" &> /dev/null && echo "Installed" || echo "Not Installed")
|
|
||||||
if [ "$have_venv" = "Not Installed" ]; then
|
|
||||||
echo "python venv module not found, installing virtualenv"
|
|
||||||
python3 -m pip install --user virtualenv
|
|
||||||
python3 -m virtualenv $venv/hbd --system-site-packages $arg
|
|
||||||
else
|
|
||||||
python3 -m venv $venv/hbd --system-site-packages $arg
|
|
||||||
fi
|
|
||||||
. $venv/hbd/bin/activate
|
|
||||||
if [ -n "$arg" ]; then
|
|
||||||
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && python3 get-pip.py
|
|
||||||
fi
|
|
||||||
deactivate
|
|
||||||
fi
|
|
||||||
|
|
||||||
. $venv/hbd/bin/activate
|
|
||||||
python3 -mpip install --upgrade --index-url https://git.wrede.ca/api/packages/andreas/pypi/simple/ --extra-index-url https://pypi.org/simple hbd[$what]
|
|
||||||
|
|
||||||
if [ "$what" = "server" ]; then
|
|
||||||
rm -f $where/hbd
|
|
||||||
ln -sf $(which hbd) $where/hbd
|
|
||||||
echo "hbd installed, you can run it with \"$where/hbd\" or \"hbd\" if $where is in your PATH"
|
|
||||||
else
|
|
||||||
rm -f $where/hbc
|
|
||||||
ln -sf $(which hbc) $where/hbc
|
|
||||||
if [ $on_ha -eq 1 ]; then
|
|
||||||
echo "restarting hbc "
|
|
||||||
job=$(grep run_hbc configuration.yaml | sed 's/run_hbc://')
|
|
||||||
$job
|
|
||||||
else
|
|
||||||
echo "hbc installed, you can run it with \"$where/hbc\" or \"hbc\" if $where is in your PATH"
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
+1
-2
@@ -68,8 +68,7 @@ async def test_nagios_runner():
|
|||||||
print(f" ✓ Collected {len(data)} data points")
|
print(f" ✓ Collected {len(data)} data points")
|
||||||
|
|
||||||
print(f"\n4. Results:")
|
print(f"\n4. Results:")
|
||||||
print(f" Overall Status: {data.get('overall_status')} (code: {data.get('overall_status_code')})")
|
print(f" Data points collected: {len(data)}")
|
||||||
print(f" Plugins Executed: {data.get('plugin_count')}")
|
|
||||||
|
|
||||||
# Show individual plugin results
|
# Show individual plugin results
|
||||||
print(f"\n5. Individual Plugin Results:")
|
print(f"\n5. Individual Plugin Results:")
|
||||||
|
|||||||
@@ -0,0 +1,324 @@
|
|||||||
|
import time as time_mod
|
||||||
|
from unittest.mock import AsyncMock, MagicMock, patch
|
||||||
|
from urllib.parse import urlparse, parse_qs
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from hbd.server import oauth
|
||||||
|
from hbd.server import users as users_mod
|
||||||
|
from hbd.server.users import User
|
||||||
|
|
||||||
|
|
||||||
|
CFG_OFF = {}
|
||||||
|
CFG_ON = {
|
||||||
|
"oauth": {
|
||||||
|
"gitea": {
|
||||||
|
"url": "https://git.example.com",
|
||||||
|
"client_id": "cid",
|
||||||
|
"client_secret": "csec",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
CFG_PARTIAL = {"oauth": {"gitea": {"url": "https://git.example.com"}}}
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture(autouse=True)
|
||||||
|
def clear_oauth_states():
|
||||||
|
oauth._states.clear()
|
||||||
|
yield
|
||||||
|
oauth._states.clear()
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture(autouse=True)
|
||||||
|
def reset_users_dict():
|
||||||
|
original = dict(users_mod.users)
|
||||||
|
yield
|
||||||
|
users_mod.users = original
|
||||||
|
|
||||||
|
|
||||||
|
def test_is_enabled_when_all_keys_present():
|
||||||
|
assert oauth.is_enabled(CFG_ON) is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_is_enabled_false_when_no_oauth_key():
|
||||||
|
assert oauth.is_enabled(CFG_OFF) is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_is_enabled_false_when_partial_config():
|
||||||
|
assert oauth.is_enabled(CFG_PARTIAL) is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_make_state_returns_unique_tokens():
|
||||||
|
s1 = oauth.make_state()
|
||||||
|
s2 = oauth.make_state()
|
||||||
|
assert s1 != s2
|
||||||
|
assert len(s1) == 64 # 32 bytes hex
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_state_valid():
|
||||||
|
state = oauth.make_state()
|
||||||
|
assert oauth.validate_state(state) is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_state_consumed_on_use():
|
||||||
|
state = oauth.make_state()
|
||||||
|
oauth.validate_state(state)
|
||||||
|
assert oauth.validate_state(state) is False # replay rejected
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_state_unknown():
|
||||||
|
assert oauth.validate_state("notastate") is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_state_expired(monkeypatch):
|
||||||
|
state = oauth.make_state()
|
||||||
|
# Wind expiry into the past
|
||||||
|
monkeypatch.setitem(oauth._states, state, time_mod.time() - 1000)
|
||||||
|
assert oauth.validate_state(state) is False
|
||||||
|
|
||||||
|
|
||||||
|
def _reset_users(entries=None):
|
||||||
|
users_mod.users = entries or {}
|
||||||
|
|
||||||
|
|
||||||
|
def test_provision_oauth_user_new():
|
||||||
|
_reset_users()
|
||||||
|
user = users_mod.provision_oauth_user("gituser", "Git User", "https://example.com/avatar.png")
|
||||||
|
assert user.username == "gituser"
|
||||||
|
assert user.full_name == "Git User"
|
||||||
|
assert user.avatar == "https://example.com/avatar.png"
|
||||||
|
assert user.admin is False
|
||||||
|
assert user.password_hash == ""
|
||||||
|
assert "gituser" in users_mod.users
|
||||||
|
|
||||||
|
|
||||||
|
def test_provision_oauth_user_no_password_login():
|
||||||
|
_reset_users()
|
||||||
|
user = users_mod.provision_oauth_user("gituser", "Git User", "")
|
||||||
|
assert user.check_password("anything") is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_provision_oauth_user_existing_updates_profile():
|
||||||
|
existing = User(
|
||||||
|
username="alice",
|
||||||
|
full_name="Old Name",
|
||||||
|
avatar="old.png",
|
||||||
|
password_hash="pbkdf2:sha256:1:salt:abc",
|
||||||
|
admin=True,
|
||||||
|
notification_channels=["chan1"],
|
||||||
|
)
|
||||||
|
_reset_users({"alice": existing})
|
||||||
|
user = users_mod.provision_oauth_user("alice", "New Name", "new.png")
|
||||||
|
assert user.full_name == "New Name"
|
||||||
|
assert user.avatar == "new.png"
|
||||||
|
# Preserved
|
||||||
|
assert user.admin is True
|
||||||
|
assert user.password_hash == "pbkdf2:sha256:1:salt:abc"
|
||||||
|
assert user.notification_channels == ["chan1"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_provision_oauth_user_does_not_overwrite_with_empty():
|
||||||
|
existing = User(username="bob", full_name="Bob", avatar="bob.png")
|
||||||
|
_reset_users({"bob": existing})
|
||||||
|
user = users_mod.provision_oauth_user("bob", "", "")
|
||||||
|
assert user.full_name == "Bob"
|
||||||
|
assert user.avatar == "bob.png"
|
||||||
|
|
||||||
|
|
||||||
|
def test_provision_oauth_user_survives_config_reload():
|
||||||
|
_reset_users()
|
||||||
|
users_mod.provision_oauth_user("oauthonly", "OAuth Only", "https://example.com/a.png")
|
||||||
|
assert "oauthonly" in users_mod.users
|
||||||
|
# Reload with empty config — OAuth user should survive
|
||||||
|
users_mod.load_users({})
|
||||||
|
assert "oauthonly" in users_mod.users
|
||||||
|
|
||||||
|
|
||||||
|
def test_authorization_url_shape():
|
||||||
|
state = "teststate"
|
||||||
|
redirect_uri = "https://hbd.example.com/login/oauth/gitea/callback"
|
||||||
|
url = oauth.authorization_url(CFG_ON, state, redirect_uri)
|
||||||
|
parsed = urlparse(url)
|
||||||
|
qs = parse_qs(parsed.query)
|
||||||
|
assert parsed.scheme == "https"
|
||||||
|
assert parsed.netloc == "git.example.com"
|
||||||
|
assert parsed.path == "/login/oauth/authorize"
|
||||||
|
assert qs["client_id"] == ["cid"]
|
||||||
|
assert qs["state"] == ["teststate"]
|
||||||
|
assert qs["redirect_uri"] == [redirect_uri]
|
||||||
|
assert qs["scope"] == ["user:email"]
|
||||||
|
assert qs["response_type"] == ["code"]
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_exchange_code_returns_token():
|
||||||
|
redirect_uri = "https://hbd.example.com/login/oauth/gitea/callback"
|
||||||
|
mock_response = AsyncMock()
|
||||||
|
mock_response.status = 200
|
||||||
|
mock_response.json = AsyncMock(return_value={"access_token": "tok123"})
|
||||||
|
|
||||||
|
mock_session = MagicMock()
|
||||||
|
mock_session.post = MagicMock(return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_response),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
))
|
||||||
|
|
||||||
|
with patch("hbd.server.oauth.aiohttp.ClientSession", return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_session),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
)):
|
||||||
|
token = await oauth.exchange_code(CFG_ON, "mycode", redirect_uri)
|
||||||
|
assert token == "tok123"
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_exchange_code_raises_on_error_status():
|
||||||
|
redirect_uri = "https://hbd.example.com/login/oauth/gitea/callback"
|
||||||
|
mock_response = AsyncMock()
|
||||||
|
mock_response.status = 401
|
||||||
|
mock_response.text = AsyncMock(return_value="unauthorized")
|
||||||
|
|
||||||
|
mock_session = MagicMock()
|
||||||
|
mock_session.post = MagicMock(return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_response),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
))
|
||||||
|
|
||||||
|
with patch("hbd.server.oauth.aiohttp.ClientSession", return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_session),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
)):
|
||||||
|
with pytest.raises(oauth.OAuthError):
|
||||||
|
await oauth.exchange_code(CFG_ON, "badcode", redirect_uri)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_fetch_user_returns_profile():
|
||||||
|
mock_response = AsyncMock()
|
||||||
|
mock_response.status = 200
|
||||||
|
mock_response.json = AsyncMock(return_value={
|
||||||
|
"login": "alice",
|
||||||
|
"full_name": "Alice Smith",
|
||||||
|
"avatar_url": "https://git.example.com/avatars/alice.png",
|
||||||
|
})
|
||||||
|
|
||||||
|
mock_session = MagicMock()
|
||||||
|
mock_session.get = MagicMock(return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_response),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
))
|
||||||
|
|
||||||
|
with patch("hbd.server.oauth.aiohttp.ClientSession", return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_session),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
)):
|
||||||
|
profile = await oauth.fetch_user(CFG_ON, "tok123")
|
||||||
|
assert profile == {
|
||||||
|
"login": "alice",
|
||||||
|
"full_name": "Alice Smith",
|
||||||
|
"avatar_url": "https://git.example.com/avatars/alice.png",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_exchange_code_raises_when_no_access_token():
|
||||||
|
redirect_uri = "https://hbd.example.com/login/oauth/gitea/callback"
|
||||||
|
mock_response = AsyncMock()
|
||||||
|
mock_response.status = 200
|
||||||
|
mock_response.json = AsyncMock(return_value={"error": "bad_request"})
|
||||||
|
|
||||||
|
mock_session = MagicMock()
|
||||||
|
mock_session.post = MagicMock(return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_response),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
))
|
||||||
|
|
||||||
|
with patch("hbd.server.oauth.aiohttp.ClientSession", return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_session),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
)):
|
||||||
|
with pytest.raises(oauth.OAuthError):
|
||||||
|
await oauth.exchange_code(CFG_ON, "mycode", redirect_uri)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_fetch_user_raises_on_error_status():
|
||||||
|
mock_response = AsyncMock()
|
||||||
|
mock_response.status = 401
|
||||||
|
mock_response.text = AsyncMock(return_value="unauthorized")
|
||||||
|
|
||||||
|
mock_session = MagicMock()
|
||||||
|
mock_session.get = MagicMock(return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_response),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
))
|
||||||
|
|
||||||
|
with patch("hbd.server.oauth.aiohttp.ClientSession", return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_session),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
)):
|
||||||
|
with pytest.raises(oauth.OAuthError):
|
||||||
|
await oauth.fetch_user(CFG_ON, "tok123")
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Integration-style tests: callback logic chain
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_callback_invalid_state_rejects():
|
||||||
|
"""Verify validate_state returns False for unknown state tokens."""
|
||||||
|
fake_state = "this-is-not-a-real-state"
|
||||||
|
assert oauth.validate_state(fake_state) is False
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_full_oauth_flow_chain():
|
||||||
|
"""Integration-style test: state → exchange → fetch → provision chain."""
|
||||||
|
redirect_uri = "https://hbd.example.com/login/oauth/gitea/callback"
|
||||||
|
|
||||||
|
# Step 1: create a state token
|
||||||
|
state = oauth.make_state()
|
||||||
|
assert oauth.validate_state(state) is True # consumed; replay would return False
|
||||||
|
|
||||||
|
# Step 2: exchange code → token (mocked)
|
||||||
|
mock_token_response = AsyncMock()
|
||||||
|
mock_token_response.status = 200
|
||||||
|
mock_token_response.json = AsyncMock(return_value={"access_token": "flow_token"})
|
||||||
|
|
||||||
|
mock_user_response = AsyncMock()
|
||||||
|
mock_user_response.status = 200
|
||||||
|
mock_user_response.json = AsyncMock(return_value={
|
||||||
|
"login": "flowuser",
|
||||||
|
"full_name": "Flow User",
|
||||||
|
"avatar_url": "https://git.example.com/avatars/flow.png",
|
||||||
|
})
|
||||||
|
|
||||||
|
mock_session = MagicMock()
|
||||||
|
mock_session.post = MagicMock(return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_token_response),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
))
|
||||||
|
mock_session.get = MagicMock(return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_user_response),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
))
|
||||||
|
|
||||||
|
with patch("hbd.server.oauth.aiohttp.ClientSession", return_value=AsyncMock(
|
||||||
|
__aenter__=AsyncMock(return_value=mock_session),
|
||||||
|
__aexit__=AsyncMock(return_value=False),
|
||||||
|
)):
|
||||||
|
token = await oauth.exchange_code(CFG_ON, "authcode", redirect_uri)
|
||||||
|
profile = await oauth.fetch_user(CFG_ON, token)
|
||||||
|
|
||||||
|
assert token == "flow_token"
|
||||||
|
assert profile["login"] == "flowuser"
|
||||||
|
|
||||||
|
# Step 3: provision user
|
||||||
|
_reset_users()
|
||||||
|
user = users_mod.provision_oauth_user(
|
||||||
|
profile["login"], profile["full_name"], profile["avatar_url"]
|
||||||
|
)
|
||||||
|
assert user.username == "flowuser"
|
||||||
|
assert user.check_password("anything") is False
|
||||||
Reference in New Issue
Block a user