Compare commits
21 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 94cbb31c48 | |||
| ae60844a8a | |||
| 49fa310361 | |||
| 28e2180f7b | |||
| ce0590f015 | |||
| f50acca509 | |||
| 72fc82b91f | |||
| 46f8c32c0b | |||
| 691f62aa69 | |||
| cffc9805f9 | |||
| 917d6a401b | |||
| 2bd3a9beb6 | |||
| 5523c60866 | |||
| ab37ac7194 | |||
| f811a19d80 | |||
| 6239825f43 | |||
| b56245bb23 | |||
| 331c4e804d | |||
| 9fd945a481 | |||
| 26df08eeff | |||
| 5819dd6b25 |
@@ -267,6 +267,41 @@ All plugin metrics can be thresholded:
|
|||||||
- **Network**: errors_total, dropped packets, connection counts
|
- **Network**: errors_total, dropped packets, connection counts
|
||||||
- **Nagios**: exit_code mapping (0=OK, 1=WARNING, 2=CRITICAL)
|
- **Nagios**: exit_code mapping (0=OK, 1=WARNING, 2=CRITICAL)
|
||||||
|
|
||||||
|
### Per-Host Threshold Profiles
|
||||||
|
|
||||||
|
Named threshold configurations let different hosts use different limits. A host's `threshold_config` can be a single name or a **list** — lists are applied left-to-right so profiles compose without duplication:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
threshold_configs:
|
||||||
|
default:
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 80, critical: 90}
|
||||||
|
memory_monitor:
|
||||||
|
memory_percent: {warning: 85, critical: 95}
|
||||||
|
|
||||||
|
tight_cpu: # override CPU limits only
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 60, critical: 75}
|
||||||
|
|
||||||
|
db_disk: # add a database partition check
|
||||||
|
thresholds:
|
||||||
|
disk_monitor:
|
||||||
|
partitions:
|
||||||
|
/var/lib/postgresql:
|
||||||
|
percent: {warning: 75, critical: 88}
|
||||||
|
|
||||||
|
hosts:
|
||||||
|
web-01:
|
||||||
|
threshold_config: default # single profile
|
||||||
|
|
||||||
|
db-01:
|
||||||
|
threshold_config: [tight_cpu, db_disk] # layered: CPU override + extra disk check
|
||||||
|
```
|
||||||
|
|
||||||
|
Each named config's overrides are applied in order on top of the defaults. Metrics not mentioned in a profile are inherited unchanged.
|
||||||
|
|
||||||
See [docs/THRESHOLD_ALERTING.md](docs/THRESHOLD_ALERTING.md) for comprehensive documentation including best practices, troubleshooting, and advanced configuration.
|
See [docs/THRESHOLD_ALERTING.md](docs/THRESHOLD_ALERTING.md) for comprehensive documentation including best practices, troubleshooting, and advanced configuration.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
+183
-47
@@ -814,34 +814,32 @@ Planned features:
|
|||||||
|
|
||||||
## Multi-Threshold Configuration
|
## Multi-Threshold Configuration
|
||||||
|
|
||||||
**New in version 2.0**: Support for multiple named threshold configurations with per-host mapping.
|
Support for multiple named threshold configurations with per-host mapping and composable layering.
|
||||||
|
|
||||||
### Overview
|
### Overview
|
||||||
|
|
||||||
The multi-threshold feature allows you to:
|
The multi-threshold feature allows you to:
|
||||||
- Define multiple sets of threshold configurations
|
- Define multiple named threshold configurations
|
||||||
- Map different hosts to different threshold sets
|
- Assign one or more configurations to each host
|
||||||
|
- Compose configurations by layering — each named config's overrides are applied in order on top of the defaults
|
||||||
- Use different sensitivity levels for different environments
|
- Use different sensitivity levels for different environments
|
||||||
- Maintain a default configuration for unmapped hosts
|
|
||||||
|
|
||||||
### Configuration Structure
|
### Configuration Structure
|
||||||
|
|
||||||
|
Named configurations are defined under `threshold_configs`. Each host selects which ones to use via `threshold_config` in the `hosts` section (a string for a single config, or a list to layer multiple):
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
# Optional: Set the default configuration name (defaults to "default")
|
# Optional: set the default configuration name (defaults to "default")
|
||||||
default_threshold_config: "default"
|
default_threshold_config: "default"
|
||||||
|
|
||||||
# Define multiple named threshold configurations
|
|
||||||
threshold_configs:
|
threshold_configs:
|
||||||
# Configuration name 1
|
|
||||||
default:
|
default:
|
||||||
thresholds:
|
thresholds:
|
||||||
# Standard threshold definitions
|
|
||||||
cpu_monitor:
|
cpu_monitor:
|
||||||
cpu_percent:
|
cpu_percent:
|
||||||
warning: 80.0
|
warning: 80.0
|
||||||
critical: 90.0
|
critical: 90.0
|
||||||
|
|
||||||
# Configuration name 2
|
|
||||||
high_sensitivity:
|
high_sensitivity:
|
||||||
thresholds:
|
thresholds:
|
||||||
cpu_monitor:
|
cpu_monitor:
|
||||||
@@ -849,7 +847,6 @@ threshold_configs:
|
|||||||
warning: 60.0
|
warning: 60.0
|
||||||
critical: 75.0
|
critical: 75.0
|
||||||
|
|
||||||
# Configuration name 3
|
|
||||||
low_sensitivity:
|
low_sensitivity:
|
||||||
thresholds:
|
thresholds:
|
||||||
cpu_monitor:
|
cpu_monitor:
|
||||||
@@ -857,14 +854,77 @@ threshold_configs:
|
|||||||
warning: 90.0
|
warning: 90.0
|
||||||
critical: 95.0
|
critical: 95.0
|
||||||
|
|
||||||
# Map specific hosts to specific configurations
|
hosts:
|
||||||
host_threshold_mapping:
|
prod-web-01:
|
||||||
prod-web-01: high_sensitivity
|
threshold_config: high_sensitivity # single config
|
||||||
prod-web-02: high_sensitivity
|
|
||||||
dev-server-01: low_sensitivity
|
dev-server-01:
|
||||||
# Unmapped hosts use default_threshold_config
|
threshold_config: low_sensitivity
|
||||||
|
|
||||||
|
# Hosts with no threshold_config use default_threshold_config
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Composable Configurations (list form)
|
||||||
|
|
||||||
|
`threshold_config` can be a list. Configs are applied **left to right**: the defaults are the base, then each named config's overrides are layered on top. Later entries in the list win on any metric they define.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
threshold_configs:
|
||||||
|
default:
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 80, critical: 90}
|
||||||
|
memory_monitor:
|
||||||
|
memory_percent: {warning: 85, critical: 95}
|
||||||
|
disk_monitor:
|
||||||
|
partitions:
|
||||||
|
/:
|
||||||
|
percent: {warning: 80, critical: 90}
|
||||||
|
|
||||||
|
# Tighter CPU limits for busy servers
|
||||||
|
high_cpu_load:
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 60, critical: 75}
|
||||||
|
|
||||||
|
# Tighter disk limits for data-heavy servers
|
||||||
|
busy_disk:
|
||||||
|
thresholds:
|
||||||
|
disk_monitor:
|
||||||
|
partitions:
|
||||||
|
/:
|
||||||
|
percent: {warning: 70, critical: 85}
|
||||||
|
|
||||||
|
hosts:
|
||||||
|
# Gets default thresholds only
|
||||||
|
web-01:
|
||||||
|
threshold_config: default
|
||||||
|
|
||||||
|
# Gets tighter CPU limits, default memory and disk
|
||||||
|
build-server:
|
||||||
|
threshold_config: high_cpu_load
|
||||||
|
|
||||||
|
# Layers both: tighter CPU AND tighter disk, default memory
|
||||||
|
db-01:
|
||||||
|
threshold_config: [high_cpu_load, busy_disk]
|
||||||
|
|
||||||
|
# Three layers: busy_disk overrides high_cpu_load if they conflict
|
||||||
|
storage-01:
|
||||||
|
threshold_config: [default, high_cpu_load, busy_disk]
|
||||||
|
```
|
||||||
|
|
||||||
|
**How layering works:**
|
||||||
|
|
||||||
|
Starting from the `default` thresholds:
|
||||||
|
|
||||||
|
| Layer | Applied config | Effect |
|
||||||
|
|-------|---------------|--------|
|
||||||
|
| Base | `default` | all default thresholds |
|
||||||
|
| +1 | `high_cpu_load` | cpu_percent overridden to 60/75 |
|
||||||
|
| +2 | `busy_disk` | disk percent overridden to 70/85; cpu_percent stays at 60/75 |
|
||||||
|
|
||||||
|
Each named config only overrides the metrics it explicitly defines. Metrics not mentioned in a config inherit from the layers beneath.
|
||||||
|
|
||||||
### Use Cases
|
### Use Cases
|
||||||
|
|
||||||
#### 1. Environment-Based Thresholds
|
#### 1. Environment-Based Thresholds
|
||||||
@@ -887,11 +947,15 @@ threshold_configs:
|
|||||||
warning: 90.0 # More relaxed for dev
|
warning: 90.0 # More relaxed for dev
|
||||||
critical: 98.0
|
critical: 98.0
|
||||||
|
|
||||||
host_threshold_mapping:
|
hosts:
|
||||||
prod-web-01: production
|
prod-web-01:
|
||||||
prod-web-02: production
|
threshold_config: production
|
||||||
dev-web-01: development
|
prod-web-02:
|
||||||
dev-web-02: development
|
threshold_config: production
|
||||||
|
dev-web-01:
|
||||||
|
threshold_config: development
|
||||||
|
dev-web-02:
|
||||||
|
threshold_config: development
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 2. Server Role-Based Thresholds
|
#### 2. Server Role-Based Thresholds
|
||||||
@@ -914,7 +978,7 @@ threshold_configs:
|
|||||||
warning: 70.0
|
warning: 70.0
|
||||||
critical: 85.0
|
critical: 85.0
|
||||||
memory_monitor:
|
memory_monitor:
|
||||||
percent:
|
memory_percent:
|
||||||
warning: 90.0 # Databases can use high memory
|
warning: 90.0 # Databases can use high memory
|
||||||
critical: 97.0
|
critical: 97.0
|
||||||
disk_monitor:
|
disk_monitor:
|
||||||
@@ -927,17 +991,23 @@ threshold_configs:
|
|||||||
cache:
|
cache:
|
||||||
thresholds:
|
thresholds:
|
||||||
memory_monitor:
|
memory_monitor:
|
||||||
percent:
|
memory_percent:
|
||||||
warning: 95.0 # Redis/Memcached can use very high memory
|
warning: 95.0 # Redis/Memcached can use very high memory
|
||||||
critical: 99.0
|
critical: 99.0
|
||||||
|
|
||||||
host_threshold_mapping:
|
hosts:
|
||||||
web-01: webserver
|
web-01:
|
||||||
web-02: webserver
|
threshold_config: webserver
|
||||||
db-01: database
|
web-02:
|
||||||
db-02: database
|
threshold_config: webserver
|
||||||
redis-01: cache
|
db-01:
|
||||||
memcached-01: cache
|
threshold_config: database
|
||||||
|
db-02:
|
||||||
|
threshold_config: database
|
||||||
|
redis-01:
|
||||||
|
threshold_config: cache
|
||||||
|
memcached-01:
|
||||||
|
threshold_config: cache
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 3. Sensitivity Levels
|
#### 3. Sensitivity Levels
|
||||||
@@ -952,7 +1022,7 @@ threshold_configs:
|
|||||||
partitions:
|
partitions:
|
||||||
/:
|
/:
|
||||||
percent:
|
percent:
|
||||||
warning: 70.0 # Very sensitive
|
warning: 70.0
|
||||||
critical: 80.0
|
critical: 80.0
|
||||||
hysteresis: 0.15
|
hysteresis: 0.15
|
||||||
|
|
||||||
@@ -976,12 +1046,69 @@ threshold_configs:
|
|||||||
critical: 98.0
|
critical: 98.0
|
||||||
hysteresis: 0.05
|
hysteresis: 0.05
|
||||||
|
|
||||||
host_threshold_mapping:
|
hosts:
|
||||||
payment-gateway: critical
|
payment-gateway:
|
||||||
auth-server: critical
|
threshold_config: critical
|
||||||
web-01: standard
|
auth-server:
|
||||||
web-02: standard
|
threshold_config: critical
|
||||||
test-server: relaxed
|
web-01:
|
||||||
|
threshold_config: standard
|
||||||
|
web-02:
|
||||||
|
threshold_config: standard
|
||||||
|
test-server:
|
||||||
|
threshold_config: relaxed
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4. Composable Profiles
|
||||||
|
|
||||||
|
Build host-specific thresholds by combining small, focused configs:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
threshold_configs:
|
||||||
|
# Baseline — everything at default levels
|
||||||
|
default:
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 80, critical: 90}
|
||||||
|
memory_monitor:
|
||||||
|
memory_percent: {warning: 85, critical: 95}
|
||||||
|
|
||||||
|
# Overlay: tighter CPU only
|
||||||
|
tight_cpu:
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 60, critical: 75}
|
||||||
|
|
||||||
|
# Overlay: tighter memory only
|
||||||
|
tight_memory:
|
||||||
|
thresholds:
|
||||||
|
memory_monitor:
|
||||||
|
memory_percent: {warning: 70, critical: 85}
|
||||||
|
|
||||||
|
# Overlay: extra disk partition for database servers
|
||||||
|
db_disk:
|
||||||
|
thresholds:
|
||||||
|
disk_monitor:
|
||||||
|
partitions:
|
||||||
|
/var/lib/postgresql:
|
||||||
|
percent: {warning: 75, critical: 88}
|
||||||
|
|
||||||
|
hosts:
|
||||||
|
# Plain web server
|
||||||
|
web-01:
|
||||||
|
threshold_config: default
|
||||||
|
|
||||||
|
# Build server: tight CPU, default memory and disk
|
||||||
|
build-01:
|
||||||
|
threshold_config: tight_cpu
|
||||||
|
|
||||||
|
# Database: tight CPU + tight memory + extra disk partition
|
||||||
|
db-01:
|
||||||
|
threshold_config: [tight_cpu, tight_memory, db_disk]
|
||||||
|
|
||||||
|
# Replica database: tight memory + extra disk, normal CPU
|
||||||
|
db-02:
|
||||||
|
threshold_config: [tight_memory, db_disk]
|
||||||
```
|
```
|
||||||
|
|
||||||
### Backward Compatibility
|
### Backward Compatibility
|
||||||
@@ -1012,16 +1139,25 @@ threshold_configs:
|
|||||||
|
|
||||||
### Configuration Priority
|
### Configuration Priority
|
||||||
|
|
||||||
1. **Host-specific mapping**: If host is in `host_threshold_mapping`, use that config
|
1. **Host `threshold_config` (list)**: Layer each named config's overrides left-to-right on top of the defaults
|
||||||
2. **Default config**: Use `default_threshold_config`
|
2. **Host `threshold_config` (string)**: Use that single named config directly
|
||||||
3. **First alphabetically**: If default not found, use first config alphabetically
|
3. **`host_threshold_mapping`** (legacy): Same as above, string only
|
||||||
4. **Legacy fallback**: If `threshold_configs` not present, use `thresholds`
|
4. **`default_threshold_config`**: Used for hosts with no mapping
|
||||||
|
5. **First alphabetically**: If the default config is not found, use the first config alphabetically
|
||||||
|
6. **Legacy `thresholds` section**: Used when `threshold_configs` is absent entirely
|
||||||
|
|
||||||
### Example: Complete Multi-Threshold Setup
|
### Backward Compatibility
|
||||||
|
|
||||||
See `hbd/config_multi_threshold_example.yaml` for a complete example with:
|
The legacy `host_threshold_mapping` top-level key and the flat `thresholds` section are still fully supported:
|
||||||
- 4 named configurations (default, high_sensitivity, low_sensitivity, database)
|
|
||||||
- Host-to-config mappings for production, development, and test systems
|
```yaml
|
||||||
- Specialized database server thresholds
|
# Still works — equivalent to hosts: {prod-web-01: {threshold_config: high_sensitivity}}
|
||||||
- Custom display messages with plugin data
|
host_threshold_mapping:
|
||||||
|
prod-web-01: high_sensitivity
|
||||||
|
|
||||||
|
# Still works — equivalent to threshold_configs: {default: {thresholds: ...}}
|
||||||
|
thresholds:
|
||||||
|
cpu_monitor:
|
||||||
|
cpu_percent: {warning: 80, critical: 90}
|
||||||
|
```
|
||||||
|
|
||||||
|
|||||||
+1
-1
@@ -14,4 +14,4 @@ Install options:
|
|||||||
"""
|
"""
|
||||||
|
|
||||||
__all__ = ["__version__"]
|
__all__ = ["__version__"]
|
||||||
__version__ = "5.1.10"
|
__version__ = "5.1.15"
|
||||||
|
|||||||
@@ -525,6 +525,13 @@ async def async_main(args, config):
|
|||||||
for sig in (signal.SIGTERM, signal.SIGINT):
|
for sig in (signal.SIGTERM, signal.SIGINT):
|
||||||
loop.add_signal_handler(sig, stop)
|
loop.add_signal_handler(sig, stop)
|
||||||
|
|
||||||
|
def _sighup():
|
||||||
|
global dorestart
|
||||||
|
dorestart = True
|
||||||
|
stop()
|
||||||
|
|
||||||
|
loop.add_signal_handler(signal.SIGHUP, _sighup)
|
||||||
|
|
||||||
# Start async tasks
|
# Start async tasks
|
||||||
# Heartbeat senders (one per connection)
|
# Heartbeat senders (one per connection)
|
||||||
for conn in connections:
|
for conn in connections:
|
||||||
|
|||||||
@@ -0,0 +1,130 @@
|
|||||||
|
"""
|
||||||
|
ZFS pool monitoring plugin for Heartbeat.
|
||||||
|
|
||||||
|
Collects per-pool health, capacity, and cumulative I/O statistics via zpool(8).
|
||||||
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import shutil
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
from hbd.client.plugin import MonitorPlugin
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def _int(s: str) -> Optional[int]:
|
||||||
|
try:
|
||||||
|
return int(s.strip().rstrip("KMGTkBkmgt%x"))
|
||||||
|
except (ValueError, AttributeError):
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _float(s: str) -> Optional[float]:
|
||||||
|
try:
|
||||||
|
return float(s.strip().rstrip("%x"))
|
||||||
|
except (ValueError, AttributeError):
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
class ZFSMonitorPlugin(MonitorPlugin):
|
||||||
|
"""Monitor ZFS pool health, capacity, and I/O statistics.
|
||||||
|
|
||||||
|
Collects per pool:
|
||||||
|
- health: ONLINE, DEGRADED, FAULTED, etc.
|
||||||
|
- size / alloc / free: total, allocated and free bytes
|
||||||
|
- capacity: percentage used (0-100)
|
||||||
|
- frag: fragmentation percentage
|
||||||
|
- dedup: deduplication ratio
|
||||||
|
- read_ops / write_ops: cumulative I/O operations since last boot/clear
|
||||||
|
- read_bw / write_bw: cumulative bytes transferred since last boot/clear
|
||||||
|
|
||||||
|
Configuration:
|
||||||
|
interval: collection interval in seconds (default: 300)
|
||||||
|
pools: list of pool names to monitor (default: all)
|
||||||
|
"""
|
||||||
|
|
||||||
|
name = "zfs_monitor"
|
||||||
|
description = "ZFS pool health, capacity, and I/O statistics"
|
||||||
|
interval = 300
|
||||||
|
|
||||||
|
def __init__(self, config: Optional[Dict[str, Any]] = None):
|
||||||
|
super().__init__(config)
|
||||||
|
self.interval = self.config.get("interval", 300)
|
||||||
|
self._pools_filter: Optional[List[str]] = self.config.get("pools", None)
|
||||||
|
|
||||||
|
async def initialize(self) -> bool:
|
||||||
|
if not shutil.which("zpool"):
|
||||||
|
self.skip_reason = "zpool not found"
|
||||||
|
return False
|
||||||
|
logger.info("ZFS monitor initialized (interval: %ds)", self.interval)
|
||||||
|
return True
|
||||||
|
|
||||||
|
async def _run(self, *args: str) -> List[str]:
|
||||||
|
"""Run a command and return its stdout lines, or [] on error."""
|
||||||
|
try:
|
||||||
|
proc = await asyncio.create_subprocess_exec(
|
||||||
|
*args,
|
||||||
|
stdout=asyncio.subprocess.PIPE,
|
||||||
|
stderr=asyncio.subprocess.DEVNULL,
|
||||||
|
)
|
||||||
|
stdout, _ = await asyncio.wait_for(proc.communicate(), timeout=15)
|
||||||
|
return stdout.decode(errors="replace").splitlines()
|
||||||
|
except (FileNotFoundError, asyncio.TimeoutError) as exc:
|
||||||
|
logger.warning("zfs_monitor: %s: %s", args[0], exc)
|
||||||
|
return []
|
||||||
|
|
||||||
|
async def _zpool_list(self) -> Dict[str, Dict]:
|
||||||
|
"""Return per-pool health and capacity from `zpool list`."""
|
||||||
|
lines = await self._run(
|
||||||
|
"zpool", "list", "-H", "-p",
|
||||||
|
"-o", "name,health,size,alloc,free,cap,frag,dedup",
|
||||||
|
)
|
||||||
|
pools: Dict[str, Dict] = {}
|
||||||
|
for line in lines:
|
||||||
|
parts = line.split("\t")
|
||||||
|
if len(parts) < 8:
|
||||||
|
continue
|
||||||
|
name = parts[0].strip()
|
||||||
|
if self._pools_filter and name not in self._pools_filter:
|
||||||
|
continue
|
||||||
|
pools[name] = {
|
||||||
|
"health": parts[1].strip(),
|
||||||
|
"size": _int(parts[2]),
|
||||||
|
"alloc": _int(parts[3]),
|
||||||
|
"free": _int(parts[4]),
|
||||||
|
"capacity": _float(parts[5]),
|
||||||
|
"frag": _float(parts[6]),
|
||||||
|
"dedup": _float(parts[7]),
|
||||||
|
}
|
||||||
|
return pools
|
||||||
|
|
||||||
|
async def _zpool_iostat(self) -> Dict[str, Dict]:
|
||||||
|
"""Return per-pool cumulative I/O counters from `zpool iostat`."""
|
||||||
|
lines = await self._run("zpool", "iostat", "-H", "-p")
|
||||||
|
io: Dict[str, Dict] = {}
|
||||||
|
for line in lines:
|
||||||
|
parts = line.split("\t")
|
||||||
|
if len(parts) < 7:
|
||||||
|
continue
|
||||||
|
name = parts[0].strip()
|
||||||
|
if not name or name.startswith(" "):
|
||||||
|
continue
|
||||||
|
io[name] = {
|
||||||
|
"read_ops": _int(parts[3]),
|
||||||
|
"write_ops": _int(parts[4]),
|
||||||
|
"read_bw": _int(parts[5]),
|
||||||
|
"write_bw": _int(parts[6]),
|
||||||
|
}
|
||||||
|
return io
|
||||||
|
|
||||||
|
async def _collect_metrics(self) -> Dict[str, Any]:
|
||||||
|
pools, io = await asyncio.gather(self._zpool_list(), self._zpool_iostat())
|
||||||
|
for name, stats in io.items():
|
||||||
|
if name in pools:
|
||||||
|
pools[name].update(stats)
|
||||||
|
return {"pools": pools}
|
||||||
|
|
||||||
|
|
||||||
|
plugin = ZFSMonitorPlugin
|
||||||
@@ -225,7 +225,7 @@ def get_watchhosts(config):
|
|||||||
hosts_config = config.get("hosts", {})
|
hosts_config = config.get("hosts", {})
|
||||||
if isinstance(hosts_config, dict):
|
if isinstance(hosts_config, dict):
|
||||||
for host_name, host_attrs in hosts_config.items():
|
for host_name, host_attrs in hosts_config.items():
|
||||||
if isinstance(host_attrs, dict) and host_attrs.get("watch", False):
|
if isinstance(host_attrs, dict) and host_attrs.get("watch", True):
|
||||||
watchhosts.append(host_name)
|
watchhosts.append(host_name)
|
||||||
return watchhosts
|
return watchhosts
|
||||||
|
|
||||||
|
|||||||
@@ -286,7 +286,7 @@ class Host:
|
|||||||
Host.hosts[name] = self
|
Host.hosts[name] = self
|
||||||
self.num = num
|
self.num = num
|
||||||
self.dyn = False
|
self.dyn = False
|
||||||
self.watched = False
|
self.watched = True
|
||||||
self.upcount = 0
|
self.upcount = 0
|
||||||
self.interval = 0
|
self.interval = 0
|
||||||
self.doesack = -1
|
self.doesack = -1
|
||||||
@@ -304,6 +304,7 @@ class Host:
|
|||||||
|
|
||||||
def statedict(self):
|
def statedict(self):
|
||||||
d = {}
|
d = {}
|
||||||
|
d["raw_name"] = self.name
|
||||||
d["name"] = self.name
|
d["name"] = self.name
|
||||||
if self.dyn:
|
if self.dyn:
|
||||||
d["name"] += "*"
|
d["name"] += "*"
|
||||||
|
|||||||
+4
-2
@@ -258,7 +258,9 @@ async def start(
|
|||||||
extra_scripts=extra_scripts,
|
extra_scripts=extra_scripts,
|
||||||
hbd_version=hbd_version,
|
hbd_version=hbd_version,
|
||||||
hosts=[
|
hosts=[
|
||||||
hbdclass.Host.hosts[h].stateinfo() for h in sorted(hbdclass.Host.hosts)
|
hbdclass.Host.hosts[h].stateinfo()
|
||||||
|
for h in sorted(hbdclass.Host.hosts)
|
||||||
|
if _can_operate_host(current_user, hbdclass.Host.hosts[h])
|
||||||
],
|
],
|
||||||
messages=data.msgs[-30:],
|
messages=data.msgs[-30:],
|
||||||
current_user=current_user.to_dict() if current_user else None,
|
current_user=current_user.to_dict() if current_user else None,
|
||||||
@@ -510,7 +512,7 @@ async def start(
|
|||||||
hosts_with_plugins = []
|
hosts_with_plugins = []
|
||||||
for hostname in sorted(hbdclass.Host.hosts.keys()):
|
for hostname in sorted(hbdclass.Host.hosts.keys()):
|
||||||
host = hbdclass.Host.hosts[hostname]
|
host = hbdclass.Host.hosts[hostname]
|
||||||
if not _can_view_host(current_user, host):
|
if not _can_operate_host(current_user, host):
|
||||||
continue
|
continue
|
||||||
if host.plugin_data:
|
if host.plugin_data:
|
||||||
hosts_with_plugins.append({
|
hosts_with_plugins.append({
|
||||||
|
|||||||
+54
-2
@@ -24,7 +24,7 @@ sensitive bool True when the raw value must never be shown
|
|||||||
# Credential field names that should always be masked.
|
# Credential field names that should always be masked.
|
||||||
_SECRET_KEYS = frozenset({
|
_SECRET_KEYS = frozenset({
|
||||||
"password", "token", "user_key", "api_key", "secret",
|
"password", "token", "user_key", "api_key", "secret",
|
||||||
"smtp_password", "smtp_user",
|
"smtp_password", "smtp_user", "api_password", "access_token",
|
||||||
})
|
})
|
||||||
|
|
||||||
_CHANNEL_TYPE_LABELS = {
|
_CHANNEL_TYPE_LABELS = {
|
||||||
@@ -181,6 +181,48 @@ def get_settings_sections(config: dict) -> list:
|
|||||||
"notification_channels": attrs.get("notification_channels", []),
|
"notification_channels": attrs.get("notification_channels", []),
|
||||||
})
|
})
|
||||||
|
|
||||||
|
# ---- Threshold configurations -----------------------------------------
|
||||||
|
def _parse_metric_row(metric_path, metric_cfg):
|
||||||
|
if not isinstance(metric_cfg, dict):
|
||||||
|
return None
|
||||||
|
return {
|
||||||
|
"metric": metric_path,
|
||||||
|
"operator": metric_cfg.get("operator", ">"),
|
||||||
|
"warning": metric_cfg.get("warning"),
|
||||||
|
"critical": metric_cfg.get("critical"),
|
||||||
|
"hysteresis": metric_cfg.get("hysteresis"),
|
||||||
|
"count": metric_cfg.get("count", 1),
|
||||||
|
"enabled": metric_cfg.get("enabled", True),
|
||||||
|
}
|
||||||
|
|
||||||
|
threshold_config_list = []
|
||||||
|
raw_tconfigs = config.get("threshold_configs") or {}
|
||||||
|
if raw_tconfigs:
|
||||||
|
for cfg_name, cfg_data in sorted(raw_tconfigs.items()):
|
||||||
|
if not isinstance(cfg_data, dict):
|
||||||
|
continue
|
||||||
|
metrics = [
|
||||||
|
r for r in (
|
||||||
|
_parse_metric_row(mp, mc)
|
||||||
|
for mp, mc in (cfg_data.get("thresholds") or {}).items()
|
||||||
|
) if r
|
||||||
|
]
|
||||||
|
threshold_config_list.append({
|
||||||
|
"name": cfg_name,
|
||||||
|
"metrics": sorted(metrics, key=lambda m: m["metric"]),
|
||||||
|
})
|
||||||
|
elif config.get("thresholds"):
|
||||||
|
metrics = [
|
||||||
|
r for r in (
|
||||||
|
_parse_metric_row(mp, mc)
|
||||||
|
for mp, mc in config["thresholds"].items()
|
||||||
|
) if r
|
||||||
|
]
|
||||||
|
threshold_config_list.append({
|
||||||
|
"name": "default",
|
||||||
|
"metrics": sorted(metrics, key=lambda m: m["metric"]),
|
||||||
|
})
|
||||||
|
|
||||||
# ---- Hosts summary ----------------------------------------------------
|
# ---- Hosts summary ----------------------------------------------------
|
||||||
hosts_list = []
|
hosts_list = []
|
||||||
for hname, hcfg in (config.get("hosts") or {}).items():
|
for hname, hcfg in (config.get("hosts") or {}).items():
|
||||||
@@ -188,7 +230,7 @@ def get_settings_sections(config: dict) -> list:
|
|||||||
continue
|
continue
|
||||||
hosts_list.append({
|
hosts_list.append({
|
||||||
"name": hname,
|
"name": hname,
|
||||||
"watch": bool(hcfg.get("watch", False)),
|
"watch": bool(hcfg.get("watch", True)),
|
||||||
"dyndns": bool(hcfg.get("dyndns", False)),
|
"dyndns": bool(hcfg.get("dyndns", False)),
|
||||||
"owner": hcfg.get("owner", ""),
|
"owner": hcfg.get("owner", ""),
|
||||||
"managers": hcfg.get("managers", []),
|
"managers": hcfg.get("managers", []),
|
||||||
@@ -312,6 +354,16 @@ def get_settings_sections(config: dict) -> list:
|
|||||||
"hosts": hosts_list,
|
"hosts": hosts_list,
|
||||||
"fields": [],
|
"fields": [],
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"id": "thresholds",
|
||||||
|
"title": "Threshold Configurations",
|
||||||
|
"description": "Named alert threshold sets. Each defines warning/critical levels per metric.",
|
||||||
|
"threshold_configs": threshold_config_list,
|
||||||
|
"fields": [
|
||||||
|
field("default_threshold_config", "Default config", "text",
|
||||||
|
"Threshold config used for hosts with no explicit mapping."),
|
||||||
|
],
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": "runtime",
|
"id": "runtime",
|
||||||
"title": "Runtime",
|
"title": "Runtime",
|
||||||
|
|||||||
@@ -236,6 +236,8 @@
|
|||||||
color: #ff9800;
|
color: #ff9800;
|
||||||
font-weight: 700;
|
font-weight: 700;
|
||||||
}
|
}
|
||||||
|
#ntable a.host-link { color: inherit; text-decoration: none; }
|
||||||
|
#ntable a.host-link:hover { text-decoration: underline; }
|
||||||
</style>
|
</style>
|
||||||
<script type="text/javascript">
|
<script type="text/javascript">
|
||||||
var cnt = 0;
|
var cnt = 0;
|
||||||
@@ -245,11 +247,13 @@
|
|||||||
var HBD_VERSION = "{{ hbd_version }}";
|
var HBD_VERSION = "{{ hbd_version }}";
|
||||||
|
|
||||||
function hostNameHtml(data) {
|
function hostNameHtml(data) {
|
||||||
|
var rawName = data.raw_name || data.name.replace(/<[^>]+>/g, '').replace('*', '').trim();
|
||||||
var nameHtml = data.name;
|
var nameHtml = data.name;
|
||||||
if (!data.hbc_version || data.hbc_version !== HBD_VERSION) {
|
if (!data.hbc_version || data.hbc_version !== HBD_VERSION) {
|
||||||
nameHtml += ' 🥀';
|
nameHtml += ' 🥀';
|
||||||
}
|
}
|
||||||
return data.dyn ? '<b>' + nameHtml + '</b>' : nameHtml;
|
var display = data.dyn ? '<b>' + nameHtml + '</b>' : nameHtml;
|
||||||
|
return '<a class="host-link" href="/plugins#' + encodeURIComponent(rawName) + '">' + display + '</a>';
|
||||||
}
|
}
|
||||||
|
|
||||||
function setup() {
|
function setup() {
|
||||||
@@ -511,7 +515,7 @@
|
|||||||
<tbody id="ntablebody">
|
<tbody id="ntablebody">
|
||||||
{% for host in hosts %}
|
{% for host in hosts %}
|
||||||
<tr class="{% if host.alert_critical_unacked > 0 or host.alert_critical_acked > 0 %}row-critical{% elif host.alert_warning_unacked > 0 or host.alert_warning_acked > 0 %}row-warning{% endif %}">
|
<tr class="{% if host.alert_critical_unacked > 0 or host.alert_critical_acked > 0 %}row-critical{% elif host.alert_warning_unacked > 0 or host.alert_warning_acked > 0 %}row-warning{% endif %}">
|
||||||
<td data-name="{{ host.name }}">{{ host.name }}{% if not host.hbc_version or host.hbc_version != hbd_version %} 🥀{% endif %}</td>
|
<td data-name="{{ host.name }}"><a class="host-link" href="/plugins#{{ host.raw_name | urlencode }}">{{ host.name }}{% if not host.hbc_version or host.hbc_version != hbd_version %} 🥀{% endif %}</a></td>
|
||||||
<td style="text-align: center; color: #ff9800; font-weight: bold;">
|
<td style="text-align: center; color: #ff9800; font-weight: bold;">
|
||||||
{%- set warning_unacked = host.alert_warning_unacked -%}
|
{%- set warning_unacked = host.alert_warning_unacked -%}
|
||||||
{%- set warning_acked = host.alert_warning_acked -%}
|
{%- set warning_acked = host.alert_warning_acked -%}
|
||||||
|
|||||||
@@ -383,7 +383,7 @@
|
|||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="host-body">
|
<div class="host-body">
|
||||||
{% set plugin_order = ['os_info','cpu_monitor','memory_monitor','disk_monitor','network_monitor','nagios_runner','filesystem_info'] %}
|
{% set plugin_order = ['os_info','cpu_monitor','memory_monitor','disk_monitor','network_monitor','zfs_monitor','nagios_runner','filesystem_info'] %}
|
||||||
{% for plugin in plugin_order if plugin in host.plugins %}
|
{% for plugin in plugin_order if plugin in host.plugins %}
|
||||||
<div class="plugin-accordion collapsed"
|
<div class="plugin-accordion collapsed"
|
||||||
data-hostname="{{ host.name }}"
|
data-hostname="{{ host.name }}"
|
||||||
@@ -673,6 +673,19 @@
|
|||||||
text = `${count} filesystem${count !== 1 ? 's' : ''}`;
|
text = `${count} filesystem${count !== 1 ? 's' : ''}`;
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
case 'zfs_monitor': {
|
||||||
|
const pools = d.pools || {};
|
||||||
|
const names = Object.keys(pools);
|
||||||
|
if (names.length === 0) { text = 'No pools'; break; }
|
||||||
|
const degraded = names.filter(n => pools[n].health && pools[n].health !== 'ONLINE');
|
||||||
|
text = names.map(n => {
|
||||||
|
const p = pools[n];
|
||||||
|
const cap = p.capacity != null ? ` ${p.capacity.toFixed(0)}%` : '';
|
||||||
|
return `${n}${cap}`;
|
||||||
|
}).join(' · ');
|
||||||
|
if (degraded.length) text += ` ⚠ ${degraded.map(n => pools[n].health).join(',')}`;
|
||||||
|
break;
|
||||||
|
}
|
||||||
default:
|
default:
|
||||||
text = 'Loaded';
|
text = 'Loaded';
|
||||||
}
|
}
|
||||||
@@ -694,6 +707,7 @@
|
|||||||
case 'memory_monitor': html = renderMemoryTable(cached.data); break;
|
case 'memory_monitor': html = renderMemoryTable(cached.data); break;
|
||||||
case 'disk_monitor': html = renderDiskTables(cached.data); break;
|
case 'disk_monitor': html = renderDiskTables(cached.data); break;
|
||||||
case 'network_monitor':html = renderNetworkTables(cached.data); break;
|
case 'network_monitor':html = renderNetworkTables(cached.data); break;
|
||||||
|
case 'zfs_monitor': html = renderZfsTables(cached.data); break;
|
||||||
case 'nagios_runner': html = renderNagiosTable(cached.data); break;
|
case 'nagios_runner': html = renderNagiosTable(cached.data); break;
|
||||||
case 'filesystem_info':html = renderFilesystemTable(cached.data); break;
|
case 'filesystem_info':html = renderFilesystemTable(cached.data); break;
|
||||||
default: html = renderGenericTable(cached.data); break;
|
default: html = renderGenericTable(cached.data); break;
|
||||||
@@ -1024,6 +1038,66 @@
|
|||||||
return html;
|
return html;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function renderZfsTables(d) {
|
||||||
|
const pools = d.pools || {};
|
||||||
|
const names = Object.keys(pools);
|
||||||
|
if (names.length === 0) return '<div class="no-data">No ZFS pools found</div>';
|
||||||
|
|
||||||
|
const healthCls = h => {
|
||||||
|
if (!h || h === 'ONLINE') return 'pct-ok';
|
||||||
|
if (h === 'DEGRADED') return 'pct-warn';
|
||||||
|
return 'pct-crit';
|
||||||
|
};
|
||||||
|
|
||||||
|
let pt = '<table class="data-table"><thead><tr>'
|
||||||
|
+ '<th>Pool</th><th>Health</th>'
|
||||||
|
+ '<th class="num">Size</th><th class="num">Used</th>'
|
||||||
|
+ '<th class="num">Free</th><th class="num">Cap %</th>'
|
||||||
|
+ '<th class="num">Frag %</th><th class="num">Dedup</th>'
|
||||||
|
+ '</tr></thead><tbody>';
|
||||||
|
for (const name of names) {
|
||||||
|
const p = pools[name];
|
||||||
|
const cap = p.capacity != null ? p.capacity : 0;
|
||||||
|
const capCls = cap > 90 ? 'pct-crit' : cap > 75 ? 'pct-warn' : 'pct-ok';
|
||||||
|
pt += `<tr>
|
||||||
|
<td class="iface-name">${escHtml(name)}</td>
|
||||||
|
<td class="${healthCls(p.health)}">${escHtml(p.health || '—')}</td>
|
||||||
|
<td class="num">${formatBytes(p.size || 0)}</td>
|
||||||
|
<td class="num">${formatBytes(p.alloc || 0)}</td>
|
||||||
|
<td class="num">${formatBytes(p.free || 0)}</td>
|
||||||
|
<td class="num ${capCls}">${cap.toFixed(1)}%</td>
|
||||||
|
<td class="num">${p.frag != null ? p.frag.toFixed(1) + '%' : '—'}</td>
|
||||||
|
<td class="num">${p.dedup != null ? p.dedup.toFixed(2) + 'x' : '—'}</td>
|
||||||
|
</tr>`;
|
||||||
|
}
|
||||||
|
pt += '</tbody></table>';
|
||||||
|
|
||||||
|
const hasIo = names.some(n => pools[n].read_ops != null);
|
||||||
|
if (!hasIo) return pt;
|
||||||
|
|
||||||
|
let iot = '<table class="data-table"><thead><tr>'
|
||||||
|
+ '<th>Pool</th>'
|
||||||
|
+ '<th class="num">Read ops</th><th class="num">Write ops</th>'
|
||||||
|
+ '<th class="num">Read BW</th><th class="num">Write BW</th>'
|
||||||
|
+ '</tr></thead><tbody>';
|
||||||
|
for (const name of names) {
|
||||||
|
const p = pools[name];
|
||||||
|
iot += `<tr>
|
||||||
|
<td class="iface-name">${escHtml(name)}</td>
|
||||||
|
<td class="num">${p.read_ops != null ? p.read_ops.toLocaleString() : '—'}</td>
|
||||||
|
<td class="num">${p.write_ops != null ? p.write_ops.toLocaleString() : '—'}</td>
|
||||||
|
<td class="num">${p.read_bw != null ? formatBytes(p.read_bw) : '—'}</td>
|
||||||
|
<td class="num">${p.write_bw != null ? formatBytes(p.write_bw) : '—'}</td>
|
||||||
|
</tr>`;
|
||||||
|
}
|
||||||
|
iot += '</tbody></table>';
|
||||||
|
|
||||||
|
return `<div class="flex-tables">
|
||||||
|
<div><div class="table-section-label">Pools</div>${pt}</div>
|
||||||
|
<div><div class="table-section-label">I/O (cumulative)</div>${iot}</div>
|
||||||
|
</div>`;
|
||||||
|
}
|
||||||
|
|
||||||
function renderGenericTable(d) {
|
function renderGenericTable(d) {
|
||||||
let html = '<table class="data-table"><thead><tr><th>Field</th><th>Value</th></tr></thead><tbody>';
|
let html = '<table class="data-table"><thead><tr><th>Field</th><th>Value</th></tr></thead><tbody>';
|
||||||
for (const [k, v] of Object.entries(d)) {
|
for (const [k, v] of Object.entries(d)) {
|
||||||
@@ -1082,6 +1156,19 @@
|
|||||||
// ── Init ────────────────────────────────────────────────────────────────
|
// ── Init ────────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
document.addEventListener('DOMContentLoaded', () => {
|
document.addEventListener('DOMContentLoaded', () => {
|
||||||
|
// If a host fragment is in the URL, expand and scroll to that host;
|
||||||
|
// otherwise expand the first host as before.
|
||||||
|
const hash = window.location.hash;
|
||||||
|
if (hash) {
|
||||||
|
const hostname = decodeURIComponent(hash.slice(1));
|
||||||
|
const card = document.querySelector(`.host-card[data-hostname="${hostname}"]`);
|
||||||
|
if (card) {
|
||||||
|
card.classList.remove('collapsed');
|
||||||
|
fetchHostGlance(hostname);
|
||||||
|
setTimeout(() => card.scrollIntoView({ behavior: 'smooth', block: 'start' }), 150);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
const first = document.querySelector('.host-card');
|
const first = document.querySelector('.host-card');
|
||||||
if (first) {
|
if (first) {
|
||||||
first.classList.remove('collapsed');
|
first.classList.remove('collapsed');
|
||||||
|
|||||||
@@ -254,6 +254,17 @@
|
|||||||
.host-bool { text-align: center; }
|
.host-bool { text-align: center; }
|
||||||
.dot-yes { color: #2e7d32; font-size: 1.1em; }
|
.dot-yes { color: #2e7d32; font-size: 1.1em; }
|
||||||
.dot-no { color: #ddd; font-size: 1.1em; }
|
.dot-no { color: #ddd; font-size: 1.1em; }
|
||||||
|
|
||||||
|
/* ---- Threshold configurations ---- */
|
||||||
|
.thresh-config { margin: 12px 20px 20px; }
|
||||||
|
.thresh-config-name {
|
||||||
|
font-weight: 600; font-size: 0.9em; color: #1a237e;
|
||||||
|
margin-bottom: 6px;
|
||||||
|
}
|
||||||
|
.mini-table .warn { color: #e65100; font-weight: 600; }
|
||||||
|
.mini-table .crit { color: #b71c1c; font-weight: 600; }
|
||||||
|
.mini-table .dim { color: #aaa; }
|
||||||
|
.mini-table .metric-path { font-family: monospace; font-size: 0.88em; }
|
||||||
</style>
|
</style>
|
||||||
|
|
||||||
<body>
|
<body>
|
||||||
@@ -394,6 +405,49 @@
|
|||||||
{% endif %}
|
{% endif %}
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
|
||||||
|
{# ---- Threshold configurations section ---- #}
|
||||||
|
{% if section.id == "thresholds" %}
|
||||||
|
{% if section.threshold_configs %}
|
||||||
|
{% for tc in section.threshold_configs %}
|
||||||
|
<div class="thresh-config">
|
||||||
|
<div class="thresh-config-name">{{ tc.name }}</div>
|
||||||
|
{% if tc.metrics %}
|
||||||
|
<div style="overflow-x: auto;">
|
||||||
|
<table class="mini-table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Metric</th>
|
||||||
|
<th>Op</th>
|
||||||
|
<th>Warning</th>
|
||||||
|
<th>Critical</th>
|
||||||
|
<th>Hysteresis</th>
|
||||||
|
<th>Count</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{% for m in tc.metrics %}
|
||||||
|
<tr {% if not m.enabled %} style="opacity:0.45"{% endif %}>
|
||||||
|
<td class="metric-path">{{ m.metric }}</td>
|
||||||
|
<td>{{ m.operator or '>' }}</td>
|
||||||
|
<td class="warn">{{ m.warning if m.warning is not none else '—' }}</td>
|
||||||
|
<td class="crit">{{ m.critical if m.critical is not none else '—' }}</td>
|
||||||
|
<td class="dim">{{ '%.0f%%' % (m.hysteresis * 100) if m.hysteresis else '—' }}</td>
|
||||||
|
<td class="dim">{{ m.count }}</td>
|
||||||
|
</tr>
|
||||||
|
{% endfor %}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
{% else %}
|
||||||
|
<span class="val-empty">No thresholds defined.</span>
|
||||||
|
{% endif %}
|
||||||
|
</div>
|
||||||
|
{% endfor %}
|
||||||
|
{% else %}
|
||||||
|
<div class="field-row"><span class="val-empty">No threshold configurations defined.</span></div>
|
||||||
|
{% endif %}
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
{# ---- Hosts section ---- #}
|
{# ---- Hosts section ---- #}
|
||||||
{% if section.id == "hosts" %}
|
{% if section.id == "hosts" %}
|
||||||
{% if section.hosts %}
|
{% if section.hosts %}
|
||||||
|
|||||||
+85
-34
@@ -9,10 +9,11 @@ This module provides a flexible threshold checking system that:
|
|||||||
- Supports multiple comparison operators
|
- Supports multiple comparison operators
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
import asyncio
|
||||||
import logging
|
import logging
|
||||||
import time
|
import time
|
||||||
from enum import Enum
|
from enum import Enum
|
||||||
from typing import Dict, Any, Optional, Tuple, Callable
|
from typing import Dict, List, Any, Optional, Tuple, Callable
|
||||||
from . import notify as notify_mod
|
from . import notify as notify_mod
|
||||||
from .config import THRESHOLD_DEFAULTS
|
from .config import THRESHOLD_DEFAULTS
|
||||||
|
|
||||||
@@ -328,14 +329,17 @@ class ThresholdChecker:
|
|||||||
renotify_interval: Seconds between repeat notifications (default: 1 hour)
|
renotify_interval: Seconds between repeat notifications (default: 1 hour)
|
||||||
journal: Optional MessageJournal instance for logging threshold events
|
journal: Optional MessageJournal instance for logging threshold events
|
||||||
"""
|
"""
|
||||||
# Named threshold configurations: {config_name: {metric_path: ThresholdConfig}}
|
# Named threshold configurations (pre-merged: defaults + overrides): {config_name: {metric_path: ThresholdConfig}}
|
||||||
self.threshold_configs = {}
|
self.threshold_configs = {}
|
||||||
|
|
||||||
|
# Raw overrides only for each named config (no defaults baked in): {config_name: {metric_path: ThresholdConfig}}
|
||||||
|
self.threshold_raw_configs: Dict[str, Dict[str, ThresholdConfig]] = {}
|
||||||
|
|
||||||
# Single threshold set for backward compatibility: {metric_path: ThresholdConfig}
|
# Single threshold set for backward compatibility: {metric_path: ThresholdConfig}
|
||||||
self.thresholds = {}
|
self.thresholds = {}
|
||||||
|
|
||||||
# Host to config name mapping: {host_name: config_name}
|
# Host to ordered list of config names: {host_name: [config_name, ...]}
|
||||||
self.host_config_mapping = {}
|
self.host_config_mapping: Dict[str, List[str]] = {}
|
||||||
|
|
||||||
# Default config name to use when no mapping exists
|
# Default config name to use when no mapping exists
|
||||||
self.default_config = "default"
|
self.default_config = "default"
|
||||||
@@ -372,6 +376,7 @@ class ThresholdChecker:
|
|||||||
|
|
||||||
# Clear old configuration
|
# Clear old configuration
|
||||||
self.threshold_configs.clear()
|
self.threshold_configs.clear()
|
||||||
|
self.threshold_raw_configs.clear()
|
||||||
self.thresholds.clear()
|
self.thresholds.clear()
|
||||||
self.host_config_mapping.clear()
|
self.host_config_mapping.clear()
|
||||||
self.grace_seconds = float(config.get("grace", 2))
|
self.grace_seconds = float(config.get("grace", 2))
|
||||||
@@ -424,9 +429,10 @@ class ThresholdChecker:
|
|||||||
self._parse_plugin_thresholds(plugin_name, plugin_thresholds, target_dict=effective_defaults)
|
self._parse_plugin_thresholds(plugin_name, plugin_thresholds, target_dict=effective_defaults)
|
||||||
|
|
||||||
self.threshold_configs["default"] = dict(effective_defaults)
|
self.threshold_configs["default"] = dict(effective_defaults)
|
||||||
|
self.threshold_raw_configs["default"] = {}
|
||||||
logger.info("Registered 'default' threshold config with %d metrics", len(effective_defaults))
|
logger.info("Registered 'default' threshold config with %d metrics", len(effective_defaults))
|
||||||
|
|
||||||
# Parse each named configuration, seeding it with effective_defaults first
|
# Parse each named configuration
|
||||||
for config_name, config_data in threshold_configs.items():
|
for config_name, config_data in threshold_configs.items():
|
||||||
if config_name == "default":
|
if config_name == "default":
|
||||||
continue # already handled above
|
continue # already handled above
|
||||||
@@ -440,33 +446,41 @@ class ThresholdChecker:
|
|||||||
continue
|
continue
|
||||||
|
|
||||||
logger.info("Parsing threshold configuration: %s", config_name)
|
logger.info("Parsing threshold configuration: %s", config_name)
|
||||||
self.threshold_configs[config_name] = dict(effective_defaults)
|
|
||||||
|
|
||||||
|
# Raw overrides only (used for multi-config layering)
|
||||||
|
raw_overrides: Dict[str, ThresholdConfig] = {}
|
||||||
thresholds_config = config_data["thresholds"]
|
thresholds_config = config_data["thresholds"]
|
||||||
for plugin_name, plugin_thresholds in thresholds_config.items():
|
for plugin_name, plugin_thresholds in thresholds_config.items():
|
||||||
if not isinstance(plugin_thresholds, dict):
|
if isinstance(plugin_thresholds, dict):
|
||||||
continue
|
self._parse_plugin_thresholds(plugin_name, plugin_thresholds, target_dict=raw_overrides)
|
||||||
|
self.threshold_raw_configs[config_name] = raw_overrides
|
||||||
|
|
||||||
self._parse_plugin_thresholds(
|
# Pre-merged version (defaults + overrides) for single-config fast path
|
||||||
plugin_name,
|
self.threshold_configs[config_name] = dict(effective_defaults)
|
||||||
plugin_thresholds,
|
self.threshold_configs[config_name].update(raw_overrides)
|
||||||
target_dict=self.threshold_configs[config_name]
|
|
||||||
)
|
|
||||||
|
|
||||||
# Parse host to config mapping from two possible sources
|
# Parse host → config list mapping from two possible sources
|
||||||
# 1. New format: hosts section with threshold_config attribute
|
|
||||||
|
def _normalise(value) -> List[str]:
|
||||||
|
"""Accept a string or list; always return a list."""
|
||||||
|
if isinstance(value, list):
|
||||||
|
return [str(v) for v in value]
|
||||||
|
return [str(value)]
|
||||||
|
|
||||||
|
# 1. hosts section with threshold_config attribute (string or list)
|
||||||
if "hosts" in config:
|
if "hosts" in config:
|
||||||
hosts_config = config["hosts"]
|
hosts_config = config["hosts"]
|
||||||
if isinstance(hosts_config, dict):
|
if isinstance(hosts_config, dict):
|
||||||
for host_name, host_attrs in hosts_config.items():
|
for host_name, host_attrs in hosts_config.items():
|
||||||
if isinstance(host_attrs, dict) and "threshold_config" in host_attrs:
|
if isinstance(host_attrs, dict) and "threshold_config" in host_attrs:
|
||||||
self.host_config_mapping[host_name] = host_attrs["threshold_config"]
|
self.host_config_mapping[host_name] = _normalise(host_attrs["threshold_config"])
|
||||||
|
|
||||||
# 2. Legacy format: host_threshold_mapping section (for backward compatibility)
|
# 2. Legacy host_threshold_mapping section (string values only)
|
||||||
if "host_threshold_mapping" in config:
|
if "host_threshold_mapping" in config:
|
||||||
legacy_mapping = config.get("host_threshold_mapping", {})
|
legacy_mapping = config.get("host_threshold_mapping", {})
|
||||||
if isinstance(legacy_mapping, dict):
|
if isinstance(legacy_mapping, dict):
|
||||||
self.host_config_mapping.update(legacy_mapping)
|
for host_name, value in legacy_mapping.items():
|
||||||
|
self.host_config_mapping[host_name] = _normalise(value)
|
||||||
|
|
||||||
# Set default config (first one alphabetically or explicitly set)
|
# Set default config (first one alphabetically or explicitly set)
|
||||||
self.default_config = config.get("default_threshold_config", "default")
|
self.default_config = config.get("default_threshold_config", "default")
|
||||||
@@ -664,7 +678,10 @@ class ThresholdChecker:
|
|||||||
)
|
)
|
||||||
|
|
||||||
def get_thresholds_for_host(self, host_name: str) -> Dict[str, ThresholdConfig]:
|
def get_thresholds_for_host(self, host_name: str) -> Dict[str, ThresholdConfig]:
|
||||||
"""Get the appropriate threshold configuration for a host.
|
"""Get the effective threshold configuration for a host.
|
||||||
|
|
||||||
|
When threshold_config is a list, configs are applied left-to-right on top
|
||||||
|
of the default thresholds so earlier entries can be overridden by later ones.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
host_name: Name of the host
|
host_name: Name of the host
|
||||||
@@ -676,23 +693,40 @@ class ThresholdChecker:
|
|||||||
if self.thresholds and not self.threshold_configs:
|
if self.thresholds and not self.threshold_configs:
|
||||||
return self.thresholds
|
return self.thresholds
|
||||||
|
|
||||||
# Multi-config mode: look up host-specific configuration
|
if not self.threshold_configs:
|
||||||
if self.threshold_configs:
|
return {}
|
||||||
config_name = self.host_config_mapping.get(host_name, self.default_config)
|
|
||||||
|
|
||||||
if config_name in self.threshold_configs:
|
config_names = self.host_config_mapping.get(host_name)
|
||||||
return self.threshold_configs[config_name]
|
|
||||||
else:
|
# No host-specific mapping → return pre-merged default
|
||||||
|
if not config_names:
|
||||||
|
return self.threshold_configs.get(self.default_config, {})
|
||||||
|
|
||||||
|
# Single config → fast path using pre-merged copy
|
||||||
|
if len(config_names) == 1:
|
||||||
|
name = config_names[0]
|
||||||
|
if name in self.threshold_configs:
|
||||||
|
return self.threshold_configs[name]
|
||||||
logger.warning(
|
logger.warning(
|
||||||
"Threshold config '%s' not found for host '%s', using default '%s'",
|
"Threshold config '%s' not found for host '%s', using default '%s'",
|
||||||
config_name,
|
name, host_name, self.default_config,
|
||||||
host_name,
|
|
||||||
self.default_config
|
|
||||||
)
|
)
|
||||||
return self.threshold_configs.get(self.default_config, {})
|
return self.threshold_configs.get(self.default_config, {})
|
||||||
|
|
||||||
# No thresholds configured
|
# Multiple configs → start from defaults, layer raw overrides in order
|
||||||
return {}
|
result = dict(self.threshold_configs.get(self.default_config, {}))
|
||||||
|
for name in config_names:
|
||||||
|
if name == self.default_config:
|
||||||
|
continue # defaults already the base
|
||||||
|
raw = self.threshold_raw_configs.get(name)
|
||||||
|
if raw is None:
|
||||||
|
logger.warning(
|
||||||
|
"Threshold config '%s' not found for host '%s', skipping",
|
||||||
|
name, host_name,
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
result.update(raw)
|
||||||
|
return result
|
||||||
|
|
||||||
def check_value(
|
def check_value(
|
||||||
self,
|
self,
|
||||||
@@ -987,6 +1021,11 @@ class ThresholdChecker:
|
|||||||
value: Any,
|
value: Any,
|
||||||
):
|
):
|
||||||
"""Send notification and log to journal/eventlog."""
|
"""Send notification and log to journal/eventlog."""
|
||||||
|
from . import hbdclass
|
||||||
|
host = hbdclass.Host.hosts.get(host_name)
|
||||||
|
if host is not None and not host.watched:
|
||||||
|
eventlog(host_name, lvl, message, service="threshold")
|
||||||
|
return
|
||||||
asyncio.get_event_loop().create_task(notify_mod.send_notification(
|
asyncio.get_event_loop().create_task(notify_mod.send_notification(
|
||||||
host_name,
|
host_name,
|
||||||
notify_mod.Notification(
|
notify_mod.Notification(
|
||||||
@@ -999,7 +1038,6 @@ class ThresholdChecker:
|
|||||||
# Log to journal
|
# Log to journal
|
||||||
if self.journal is not None:
|
if self.journal is not None:
|
||||||
try:
|
try:
|
||||||
import asyncio
|
|
||||||
loop = asyncio.get_event_loop()
|
loop = asyncio.get_event_loop()
|
||||||
loop.create_task(self.journal.log_threshold_event(
|
loop.create_task(self.journal.log_threshold_event(
|
||||||
host_name=host_name,
|
host_name=host_name,
|
||||||
@@ -1076,7 +1114,9 @@ class ThresholdChecker:
|
|||||||
) -> None:
|
) -> None:
|
||||||
"""Handle a state-change transition with grace-period logic.
|
"""Handle a state-change transition with grace-period logic.
|
||||||
|
|
||||||
Transitioning INTO alert: defers the notification for grace_seconds.
|
Transitioning INTO alert (worsening): defers the notification for grace_seconds.
|
||||||
|
De-escalation within alert states (e.g. CRITICAL→WARNING): no new notification;
|
||||||
|
the metric is still alerting so no RECOVER was sent.
|
||||||
Transitioning TO OK:
|
Transitioning TO OK:
|
||||||
- Still in grace window (pending_since set): suppresses both the alert
|
- Still in grace window (pending_since set): suppresses both the alert
|
||||||
and the recovery — the spike never warranted a page.
|
and the recovery — the spike never warranted a page.
|
||||||
@@ -1096,12 +1136,20 @@ class ThresholdChecker:
|
|||||||
alert_state.pending_since = None
|
alert_state.pending_since = None
|
||||||
else:
|
else:
|
||||||
self._send_notification(host_name, lvl, message, metric_path, old_level, new_level, value)
|
self._send_notification(host_name, lvl, message, metric_path, old_level, new_level, value)
|
||||||
else:
|
elif new_level.value > old_level.value:
|
||||||
|
# Worsening (OK→WARNING, OK→CRITICAL, WARNING→CRITICAL): schedule notification.
|
||||||
alert_state.pending_since = time.time()
|
alert_state.pending_since = time.time()
|
||||||
logger.debug(
|
logger.debug(
|
||||||
"Alert deferred (%.0fs grace): %s on %s = %s",
|
"Alert deferred (%.0fs grace): %s on %s = %s",
|
||||||
self.grace_seconds, metric_path, host_name, value,
|
self.grace_seconds, metric_path, host_name, value,
|
||||||
)
|
)
|
||||||
|
else:
|
||||||
|
# De-escalation within alert states (e.g. CRITICAL→WARNING): metric is still
|
||||||
|
# alerting but did not recover, so no new notification.
|
||||||
|
logger.debug(
|
||||||
|
"De-escalation %s→%s for %s on %s, no notification",
|
||||||
|
old_level.name, new_level.name, metric_path, host_name,
|
||||||
|
)
|
||||||
|
|
||||||
def _check_pending_or_renotify(
|
def _check_pending_or_renotify(
|
||||||
self,
|
self,
|
||||||
@@ -1191,6 +1239,9 @@ class ThresholdChecker:
|
|||||||
else:
|
else:
|
||||||
message = f"REMINDER ({alert_state.level.name}): {host_name} - {metric_path} = {value} (ongoing for {int(now - alert_state.since)}s)"
|
message = f"REMINDER ({alert_state.level.name}): {host_name} - {metric_path} = {value} (ongoing for {int(now - alert_state.since)}s)"
|
||||||
|
|
||||||
|
from . import hbdclass
|
||||||
|
host = hbdclass.Host.hosts.get(host_name)
|
||||||
|
if host is None or host.watched:
|
||||||
asyncio.get_event_loop().create_task(notify_mod.send_notification(
|
asyncio.get_event_loop().create_task(notify_mod.send_notification(
|
||||||
host_name,
|
host_name,
|
||||||
notify_mod.Notification(
|
notify_mod.Notification(
|
||||||
@@ -1199,9 +1250,9 @@ class ThresholdChecker:
|
|||||||
level=alert_state.level.name,
|
level=alert_state.level.name,
|
||||||
),
|
),
|
||||||
))
|
))
|
||||||
|
logger.info("Re-notification sent: %s", message)
|
||||||
alert_state.last_notification = now
|
alert_state.last_notification = now
|
||||||
alert_state.notification_count += 1
|
alert_state.notification_count += 1
|
||||||
logger.info("Re-notification sent: %s", message)
|
|
||||||
|
|
||||||
def get_active_alerts(self, alert_states: Dict[str, AlertState]) -> list:
|
def get_active_alerts(self, alert_states: Dict[str, AlertState]) -> list:
|
||||||
"""
|
"""
|
||||||
|
|||||||
@@ -211,6 +211,7 @@ def _make_timer_callbacks(uname, host, ctx):
|
|||||||
connection.newstate(connection.__class__.OVERDUE, now, cfg.get("grace", 2))
|
connection.newstate(connection.__class__.OVERDUE, now, cfg.get("grace", 2))
|
||||||
msg = f"{connection.afam} overdue"
|
msg = f"{connection.afam} overdue"
|
||||||
eventlog(uname, "CRITICAL", msg)
|
eventlog(uname, "CRITICAL", msg)
|
||||||
|
if host.watched:
|
||||||
asyncio.create_task(notify_mod.send_notification(
|
asyncio.create_task(notify_mod.send_notification(
|
||||||
uname,
|
uname,
|
||||||
notify_mod.Notification(title=f"[CRITICAL] {uname}", body=msg, level="CRITICAL"),
|
notify_mod.Notification(title=f"[CRITICAL] {uname}", body=msg, level="CRITICAL"),
|
||||||
@@ -407,6 +408,7 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
|
|
||||||
if res:
|
if res:
|
||||||
eventlog(uname, "WARNING", res)
|
eventlog(uname, "WARNING", res)
|
||||||
|
if host.watched:
|
||||||
asyncio.create_task(notify_mod.send_notification(
|
asyncio.create_task(notify_mod.send_notification(
|
||||||
uname,
|
uname,
|
||||||
notify_mod.Notification(title=f"[WARNING] {uname}", body=res, level="WARNING"),
|
notify_mod.Notification(title=f"[WARNING] {uname}", body=res, level="WARNING"),
|
||||||
@@ -420,6 +422,7 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
|
|
||||||
if boot:
|
if boot:
|
||||||
eventlog(uname, "INFO", "booted")
|
eventlog(uname, "INFO", "booted")
|
||||||
|
if host.watched:
|
||||||
asyncio.create_task(notify_mod.send_notification(
|
asyncio.create_task(notify_mod.send_notification(
|
||||||
uname,
|
uname,
|
||||||
notify_mod.Notification(title=f"[INFO] {uname}", body=f"{host.name} booted", level="INFO"),
|
notify_mod.Notification(title=f"[INFO] {uname}", body=f"{host.name} booted", level="INFO"),
|
||||||
@@ -437,9 +440,14 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
if not newh:
|
if not newh:
|
||||||
if d == 0 or lasts == "unknown":
|
if d == 0 or lasts == "unknown":
|
||||||
m = "%s is up" % (conn.afam)
|
m = "%s is up" % (conn.afam)
|
||||||
|
elif d < 4:
|
||||||
|
# Transient blip (likely client restart) — skip log and notification
|
||||||
|
m = None
|
||||||
else:
|
else:
|
||||||
m = "%s back after being %s for %s" % (conn.afam, lasts, dur(d))
|
m = "%s back after being %s for %s" % (conn.afam, lasts, dur(d))
|
||||||
|
if m:
|
||||||
eventlog(uname, "RECOVER", m)
|
eventlog(uname, "RECOVER", m)
|
||||||
|
if host.watched:
|
||||||
asyncio.create_task(notify_mod.send_notification(
|
asyncio.create_task(notify_mod.send_notification(
|
||||||
uname,
|
uname,
|
||||||
notify_mod.Notification(title=f"[RECOVER] {uname}", body=m, level="RECOVER"),
|
notify_mod.Notification(title=f"[RECOVER] {uname}", body=m, level="RECOVER"),
|
||||||
@@ -453,6 +461,7 @@ def handle_datagram(msg: dict, addr, transport, ctx: dict):
|
|||||||
if shutdown:
|
if shutdown:
|
||||||
m = "%s shutdown" % conn.afam
|
m = "%s shutdown" % conn.afam
|
||||||
eventlog(uname, "INFO", m)
|
eventlog(uname, "INFO", m)
|
||||||
|
if host.watched:
|
||||||
asyncio.create_task(notify_mod.send_notification(
|
asyncio.create_task(notify_mod.send_notification(
|
||||||
uname,
|
uname,
|
||||||
notify_mod.Notification(title=f"[INFO] {uname}", body=m, level="INFO"),
|
notify_mod.Notification(title=f"[INFO] {uname}", body=m, level="INFO"),
|
||||||
|
|||||||
+52
-9
@@ -13,7 +13,8 @@ from . import data
|
|||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
_connections: set = set()
|
# Map of WebSocket → User object (or None when auth is disabled)
|
||||||
|
_connections: dict = {}
|
||||||
_loop: Optional[asyncio.AbstractEventLoop] = None
|
_loop: Optional[asyncio.AbstractEventLoop] = None
|
||||||
_get_hosts: Optional[Callable[[], Iterable]] = None
|
_get_hosts: Optional[Callable[[], Iterable]] = None
|
||||||
_verbose: bool = False
|
_verbose: bool = False
|
||||||
@@ -34,22 +35,52 @@ def setup(
|
|||||||
_verbose = verbose
|
_verbose = verbose
|
||||||
|
|
||||||
|
|
||||||
|
def _user_can_see_host(user, host_name: str) -> bool:
|
||||||
|
"""Return True if *user* may see updates for *host_name* (manager or higher)."""
|
||||||
|
from . import hbdclass, users as users_mod
|
||||||
|
if user is None or not users_mod.users_enabled():
|
||||||
|
return True
|
||||||
|
if user.admin:
|
||||||
|
return True
|
||||||
|
host = hbdclass.Host.hosts.get(host_name)
|
||||||
|
if host is None:
|
||||||
|
return False
|
||||||
|
return host.is_manager(user.username)
|
||||||
|
|
||||||
|
|
||||||
|
def _get_token(request) -> str:
|
||||||
|
"""Extract session token from request (mirrors logic in http.py)."""
|
||||||
|
auth = request.headers.get("Authorization", "")
|
||||||
|
if auth.startswith("Bearer "):
|
||||||
|
return auth[7:].strip()
|
||||||
|
token = request.headers.get("X-Auth-Token", "")
|
||||||
|
if token:
|
||||||
|
return token
|
||||||
|
return request.cookies.get("hbd_session", "")
|
||||||
|
|
||||||
|
|
||||||
async def handler(request):
|
async def handler(request):
|
||||||
"""aiohttp WebSocket upgrade handler — register as GET /ws."""
|
"""aiohttp WebSocket upgrade handler — register as GET /ws."""
|
||||||
from aiohttp import web
|
from aiohttp import web
|
||||||
|
from . import users as users_mod
|
||||||
|
|
||||||
ws = web.WebSocketResponse()
|
ws = web.WebSocketResponse()
|
||||||
await ws.prepare(request)
|
await ws.prepare(request)
|
||||||
|
|
||||||
_connections.add(ws)
|
token = _get_token(request)
|
||||||
|
user = users_mod.get_session_user(token) if token else None
|
||||||
|
|
||||||
|
_connections[ws] = user
|
||||||
remote = request.remote
|
remote = request.remote
|
||||||
logger.info("WebSocket connected from %s", remote)
|
logger.info("WebSocket connected from %s", remote)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Send current host state to the new client
|
# Send current host state, filtered to hosts this user may see
|
||||||
if _get_hosts:
|
if _get_hosts:
|
||||||
try:
|
try:
|
||||||
for h in list(_get_hosts()):
|
for h in list(_get_hosts()):
|
||||||
|
host_name = h.get("raw_name") or h.get("name", "")
|
||||||
|
if _user_can_see_host(user, host_name):
|
||||||
await ws.send_str(json.dumps({"type": "host", "data": h}))
|
await ws.send_str(json.dumps({"type": "host", "data": h}))
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error("Error sending initial hosts: %s", e)
|
logger.error("Error sending initial hosts: %s", e)
|
||||||
@@ -74,7 +105,7 @@ async def handler(request):
|
|||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.exception("WebSocket handler error from %s: %s", remote, e)
|
logger.exception("WebSocket handler error from %s: %s", remote, e)
|
||||||
finally:
|
finally:
|
||||||
_connections.discard(ws)
|
_connections.pop(ws, None)
|
||||||
logger.info("WebSocket disconnected from %s", remote)
|
logger.info("WebSocket disconnected from %s", remote)
|
||||||
|
|
||||||
return ws
|
return ws
|
||||||
@@ -83,25 +114,37 @@ async def handler(request):
|
|||||||
def broadcast(typ: str, payload) -> bool:
|
def broadcast(typ: str, payload) -> bool:
|
||||||
"""Thread-safe broadcast to all connected WebSocket clients.
|
"""Thread-safe broadcast to all connected WebSocket clients.
|
||||||
|
|
||||||
|
For host and plugin updates, only sends to clients whose user has
|
||||||
|
manager-or-higher access to that host. Other message types are
|
||||||
|
broadcast to all clients.
|
||||||
|
|
||||||
Can be called from any thread; schedules sends on the event loop.
|
Can be called from any thread; schedules sends on the event loop.
|
||||||
Returns False if the loop is not running yet.
|
Returns False if the loop is not running yet.
|
||||||
"""
|
"""
|
||||||
if not _loop:
|
if not _loop:
|
||||||
return False
|
return False
|
||||||
|
|
||||||
|
# Determine the host name for access-filtered message types
|
||||||
|
host_name: Optional[str] = None
|
||||||
|
if typ in ("host", "plugin"):
|
||||||
|
host_name = payload.get("raw_name") or payload.get("host") or payload.get("name")
|
||||||
|
|
||||||
jmsg = json.dumps({"type": typ, "data": payload})
|
jmsg = json.dumps({"type": typ, "data": payload})
|
||||||
|
|
||||||
async def _send_all():
|
async def _send_all():
|
||||||
dead = set()
|
dead = set()
|
||||||
for ws in list(_connections):
|
for ws, user in list(_connections.items()):
|
||||||
try:
|
try:
|
||||||
if not ws.closed:
|
if ws.closed:
|
||||||
await ws.send_str(jmsg)
|
|
||||||
else:
|
|
||||||
dead.add(ws)
|
dead.add(ws)
|
||||||
|
continue
|
||||||
|
if host_name is not None and not _user_can_see_host(user, host_name):
|
||||||
|
continue
|
||||||
|
await ws.send_str(jmsg)
|
||||||
except Exception:
|
except Exception:
|
||||||
dead.add(ws)
|
dead.add(ws)
|
||||||
for ws in dead:
|
for ws in dead:
|
||||||
_connections.discard(ws)
|
_connections.pop(ws, None)
|
||||||
|
|
||||||
asyncio.run_coroutine_threadsafe(_send_all(), _loop)
|
asyncio.run_coroutine_threadsafe(_send_all(), _loop)
|
||||||
return True
|
return True
|
||||||
|
|||||||
+1
-1
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
|
|||||||
|
|
||||||
[project]
|
[project]
|
||||||
name = "hbd"
|
name = "hbd"
|
||||||
version = "5.1.10"
|
version = "5.1.15"
|
||||||
description = "Heartbeat monitoring system — client (hbc) and server (hbd)"
|
description = "Heartbeat monitoring system — client (hbc) and server (hbd)"
|
||||||
readme = "README.md"
|
readme = "README.md"
|
||||||
requires-python = ">=3.11"
|
requires-python = ">=3.11"
|
||||||
|
|||||||
+27
-44
@@ -14,13 +14,12 @@ what=$1
|
|||||||
on_ha=0
|
on_ha=0
|
||||||
where=""
|
where=""
|
||||||
venv=""
|
venv=""
|
||||||
prog=$(realpath $0)
|
|
||||||
[ "$2" = "HA" ] && on_ha=1
|
[ "$2" = "HA" ] && on_ha=1
|
||||||
[ -z "$what" ] && what="client"
|
[ -z "$what" ] && what="client"
|
||||||
|
|
||||||
if [ -d /homeassistant ]; then
|
if [ -d /homeassistant ]; then # if running from HA command line
|
||||||
echo "HA, running \"docker exec homeassistant $prog $@\""
|
echo "HA, running \"docker exec homeassistant /config/bin/hb_install.sh $@\""
|
||||||
docker exec homeassistant $prog $@ HA
|
docker exec homeassistant /config/bin/hb_install.sh $@ HA
|
||||||
rc=$?
|
rc=$?
|
||||||
if [ $rc -ne 0 ]; then
|
if [ $rc -ne 0 ]; then
|
||||||
echo "Failed to install heartbeat in HA, please check the logs for more details"
|
echo "Failed to install heartbeat in HA, please check the logs for more details"
|
||||||
@@ -29,8 +28,9 @@ if [ -d /homeassistant ]; then
|
|||||||
exit 0
|
exit 0
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [ $on_ha -eq 1 ]; then
|
if [ $on_ha -eq 1 ] || [ -r /.dockerenv ] && [ -d /config/bin ]; then
|
||||||
echo "Installing under docker on Home Assistant OS, using /config/bin for executables and /config/venvs for virtual environments "
|
# Installing under docker on Home Assistant OS, using /config/bin for executables and /config/venvs for virtual environments
|
||||||
|
echo "Home Assistant OS detected, installing under docker"
|
||||||
where="/config/bin"
|
where="/config/bin"
|
||||||
venv="/config/venvs"
|
venv="/config/venvs"
|
||||||
else
|
else
|
||||||
@@ -53,23 +53,26 @@ else
|
|||||||
venv="$HOME/venvs"
|
venv="$HOME/venvs"
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
echo "Installing $what to $where"
|
||||||
echo "Installing heartbeat $what"
|
if [ ! -z "$venv" ]; then
|
||||||
|
echo "Using virtual environment at $venv/hbd"
|
||||||
|
fi
|
||||||
|
|
||||||
if [ "$venv" != "" ] && [ ! -d $venv/hbd ]; then
|
if [ "$venv" != "" ] && [ ! -d $venv/hbd ]; then
|
||||||
set +e
|
|
||||||
python3 -m pip --version > /dev/null 2>&1
|
|
||||||
rc=$?
|
|
||||||
set -e
|
|
||||||
arg=""
|
arg=""
|
||||||
if [ $rc -ne 0 ]; then
|
have_pip=$(python3 -c "import pip" 2>/dev/null &> /dev/null && echo "Installed" || echo "Not Installed")
|
||||||
|
if [ "$have_pip" = "Not Installed" ]; then
|
||||||
# some systems do not have pip installed by default, so we need to fetch get-pip.py and install pip
|
# some systems do not have pip installed by default, so we need to fetch get-pip.py and install pip
|
||||||
echo "pip is not installed, fetching get-pip.py and installing pip"
|
echo "pip is not installed, fetching get-pip.py and installing pip"
|
||||||
arg="--without-pip"
|
arg="--without-pip"
|
||||||
fi
|
fi
|
||||||
mkdir -p $venv
|
mkdir -p $venv
|
||||||
have_venv=$(python3 -c "import venv" &> /dev/null && echo "Installed" || echo "Not Installed")
|
have_venv=$(python3 -c "import venv" 2>/dev/null &> /dev/null && echo "Installed" || echo "Not Installed")
|
||||||
if [ "$have_venv" = "Not Installed" ]; then
|
if [ "$have_venv" = "Not Installed" ]; then
|
||||||
|
if [ "$have_pip" = "Not Installed" ]; then
|
||||||
|
echo "python has no venv, and no pip to install virtualenv, cannot continue"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
echo "python venv module not found, installing virtualenv"
|
echo "python venv module not found, installing virtualenv"
|
||||||
python3 -m pip install --user virtualenv
|
python3 -m pip install --user virtualenv
|
||||||
python3 -m virtualenv $venv/hbd --system-site-packages $arg
|
python3 -m virtualenv $venv/hbd --system-site-packages $arg
|
||||||
@@ -83,50 +86,30 @@ if [ "$venv" != "" ] && [ ! -d $venv/hbd ]; then
|
|||||||
deactivate
|
deactivate
|
||||||
fi
|
fi
|
||||||
|
|
||||||
if [ -z "$venv" ]; then
|
if [ ! -z "$venv" ]; then
|
||||||
echo "Installing heartbeat $what globally"
|
|
||||||
else
|
|
||||||
echo "Installing heartbeat $what in virtual environment $venv/hbd"
|
|
||||||
. $venv/hbd/bin/activate
|
. $venv/hbd/bin/activate
|
||||||
fi
|
fi
|
||||||
if [ "$what" = "mini" ]; then
|
if [ "$what" = "mini" ]; then
|
||||||
echo "Installing hbc mini, which has no external dependencies and is meant for quick setup and testing. For the full client with all features, please run this script with the 'client' argument."
|
|
||||||
curl -s -o $where/hbc_mini https://git.wrede.ca/andreas/heartbeat/raw/branch/master/scripts/hbc_mini.py
|
curl -s -o $where/hbc_mini https://git.wrede.ca/andreas/heartbeat/raw/branch/master/scripts/hbc_mini.py
|
||||||
chmod +x $where/hbc_mini
|
chmod +x $where/hbc_mini
|
||||||
else
|
else
|
||||||
echo "Installing heartbeat $what, which includes the full client with all features. If you want to install the minimal client with no external dependencies, please run this script with the 'mini' argument."
|
|
||||||
python3 -mpip install --upgrade --index-url https://git.wrede.ca/api/packages/andreas/pypi/simple/ --extra-index-url https://pypi.org/simple hbd[$what]
|
python3 -mpip install --upgrade --index-url https://git.wrede.ca/api/packages/andreas/pypi/simple/ --extra-index-url https://pypi.org/simple hbd[$what]
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
if [ ! -z "$venv" ]; then
|
||||||
|
echo "linking executables to $where"
|
||||||
if [ "$what" = "server" ]; then
|
if [ "$what" = "server" ]; then
|
||||||
rm -f $where/hbd
|
rm -f $where/hbd
|
||||||
ln -sf $(which hbd) $where/hbd
|
ln -sf $(which hbd) $where/hbd
|
||||||
echo "hbd installed, you can run it with \"$where/hbd\" or \"hbd\" if $where is in your PATH"
|
|
||||||
elif [ "$what" = "client" ]; then
|
elif [ "$what" = "client" ]; then
|
||||||
hbc_path=$(which hbc)
|
|
||||||
if [ -z "$hbc_path" ]; then
|
|
||||||
echo "hbc not found in PATH, installation failed"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
if [ "$hbc_path" != "$where/hbc" ]; then
|
|
||||||
rm -f $where/hbc
|
rm -f $where/hbc
|
||||||
ln -sf $(which hbc) $where/hbc
|
ln -sf $(which hbc) $where/hbc
|
||||||
fi
|
fi
|
||||||
if [ "$prog" != "$where/hb_install.sh" ]; then
|
rm -f $where/hb_install.sh
|
||||||
cp "$prog" $where/hb_install.sh
|
ln -sf $(which hb_install.sh) $where/hb_install.sh
|
||||||
chmod +x $where/hb_install.sh
|
|
||||||
fi
|
|
||||||
if [ $on_ha -eq 1 ]; then
|
|
||||||
echo "restarting hbc "
|
|
||||||
job=$(grep run_hbc configuration.yaml | sed 's/run_hbc://')
|
|
||||||
$job
|
|
||||||
else
|
|
||||||
echo "hbc installed, you can run it with \"$where/hbc\" or \"hbc\" if $where is in your PATH"
|
|
||||||
fi
|
|
||||||
elif [ "$what" = "mini" ]; then
|
|
||||||
hbc_path=$(which hbc_mini)
|
|
||||||
if [ "$hbc_path" != "$where/hbc_mini" ]; then
|
|
||||||
ln -sf $hbc_path $where/hbc_mini
|
|
||||||
fi
|
|
||||||
echo "hbc mini installed, you can run it with \"$where/hbc_mini\" or \"hbc_mini\" if $where is in your PATH"
|
|
||||||
fi
|
fi
|
||||||
|
echo "Installation complete. To upgrade, run the following:"
|
||||||
|
echo " $where/hb_install.sh $what"
|
||||||
|
echo "To install on another machine, run the following obtain the install script and run it:"
|
||||||
|
echo "from https://git.wrede.ca/andreas/heartbeat/raw/branch/master/scripts/hb_install.sh"
|
||||||
|
echo "and then run sh hb_install.sh [mini|client]"
|
||||||
+8
-1
@@ -41,7 +41,7 @@ from pathlib import Path
|
|||||||
from typing import Any, Dict, List, Optional, Tuple
|
from typing import Any, Dict, List, Optional, Tuple
|
||||||
|
|
||||||
# updated by scripts/bumpminor.sh
|
# updated by scripts/bumpminor.sh
|
||||||
__version__ = "5.1.10"
|
__version__ = "5.1.15"
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Protocol (mirrors hbd/common/proto.py)
|
# Protocol (mirrors hbd/common/proto.py)
|
||||||
@@ -1066,6 +1066,13 @@ async def _async_main(args, cfg: Dict[str, Any]) -> int:
|
|||||||
for sig in (signal.SIGTERM, signal.SIGINT):
|
for sig in (signal.SIGTERM, signal.SIGINT):
|
||||||
loop.add_signal_handler(sig, _stop)
|
loop.add_signal_handler(sig, _stop)
|
||||||
|
|
||||||
|
def _sighup():
|
||||||
|
global _dorestart
|
||||||
|
_dorestart = True
|
||||||
|
_stop()
|
||||||
|
|
||||||
|
loop.add_signal_handler(signal.SIGHUP, _sighup)
|
||||||
|
|
||||||
for conn in connections:
|
for conn in connections:
|
||||||
_active_tasks.append(asyncio.create_task(_heartbeat_sender(conn, interval)))
|
_active_tasks.append(asyncio.create_task(_heartbeat_sender(conn, interval)))
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user