re-factor notifications, add sms and matrix as channels

This commit is contained in:
Andreas Wrede
2026-04-12 11:04:00 -04:00
parent 7f049a4e26
commit 0199ca4693
11 changed files with 887 additions and 864 deletions
+237 -475
View File
@@ -2,532 +2,294 @@
## Overview
The Heartbeat Monitoring System includes a flexible notification system that can send alerts through multiple channels including Email, Pushover, Signal, and Mattermost. The system supports centralized channel definitions with per-host routing, allowing fine-grained control over notification delivery.
Notifications are dispatched to the **owner and managers** of a host, each via their own configured notification channels. Channel definitions are global; users reference them by name. No users configured → no notifications sent.
## Architecture
### Components
```
Alert event (udp.py / threshold.py)
└─ notify.send_notification(host_name, Notification)
├─ look up host.owner + host.managers
├─ for each user → user.notification_channels
└─ for each channel → _dispatch_to_channel (filtered by min_level)
```
1. **Notification Channels** (`notification_channels` in config)
- Centralized definitions of notification providers
- Each channel has a type and type-specific credentials
- Reusable across multiple hosts
2. **Channel Dispatcher** (`hbd/server/notify.py`)
- `pushmsg_for_host(hostname, message)`: Main entry point for host-specific notifications
- `_dispatch_to_channel(channel_name, channel_config, message)`: Routes to specific provider
- Provider functions: `pushover()`, `pushsignal()`, `pushmattermost()`, `send_email()`
3. **Configuration Utilities** (`hbd/server/config.py`)
- `get_notification_channels_for_host(config, hostname)`: Retrieves channel names for a host
- `get_notification_channels_config(config, hostname)`: Retrieves full channel configurations
- `get_channel_config(config, channel_name)`: Gets configuration for a specific channel
4. **Integration Points**
- **Threshold alerts**: `threshold.py` calls `notify_mod.pushmsg_for_host()`
- **Heartbeat events**: `udp.py` calls `notify_mod.pushmsg_for_host()` for boot/shutdown/overdue
- **Custom alerts**: Any code can call `notify_mod.pushmsg_for_host(hostname, message)`
Every notification carries:
- **title** — `[LEVEL] hostname` (e.g. `[CRITICAL] webserver01`)
- **body** — detail message (metric value, threshold, duration)
- **url** — link to the plugin metrics page (`{base_url}/plugins#{hostname}`)
- **level** — `RECOVER | WARNING | CRITICAL | INFO`
## Configuration
### Centralized Channel Definitions
### Base URL
Define notification channels once in your configuration file:
Set `base_url` so notification links point to your hbd instance:
```yaml
base_url: https://hbd.example.com
```
### Global channel definitions
Define channels once; reference them by name from user configs:
```yaml
notification_channels:
# Signal notifications
signal_ops:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +1234567890 # Your Signal number
recipient: +1234567890 # Recipient number
signal_oncall:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +1234567890
recipient: +0987654321 # Different recipient
# Email notifications
pushover_ops:
type: pushover
token: your-app-token
user: your-user-key
min_level: WARNING # optional, default: WARNING
email_ops:
type: email
recipients:
- ops@example.com
- alerts@example.com
sender: heartbeat@example.com
recipients: [ops@example.com]
sender: hbd@example.com
smtp_server: smtp.example.com
smtp_port: 587
smtp_user: heartbeat@example.com
smtp_password: your-smtp-password
email_devteam:
type: email
recipients: [dev-alerts@example.com]
sender: heartbeat-dev@example.com
smtp_server: smtp.example.com
smtp_port: 587
smtp_user: heartbeat-dev@example.com
smtp_password: your-smtp-password
# Pushover notifications
pushover_urgent:
type: pushover
token: your-pushover-app-token
user: your-pushover-user-key
pushover_normal:
type: pushover
token: your-pushover-app-token
user: another-user-key
# Mattermost notifications
mattermost_devops:
type: mattermost
host: mattermost.example.com
token: your-webhook-token
channel: devops-alerts
username: heartbeat-bot
icon: https://example.com/heartbeat-icon.png
```
smtp_user: hbd@example.com
smtp_password: secret
min_level: WARNING
### Default Notification Channels
matrix_oncall:
type: matrix
homeserver: https://matrix.example.org
access_token: syt_xxx
room_id: "!abc:matrix.example.org"
min_level: CRITICAL # only send critical alerts to this room
Specify default channels for hosts that don't have specific channel assignments:
sms_oncall:
type: sms_voipms
api_user: me@example.com
api_password: secret
did: "5551234567" # your voip.ms DID number
dst: "5559876543" # destination number
min_level: CRITICAL
```yaml
default_notification_channels:
- email_ops
- mattermost_devops
```
Hosts without `notification_channels` defined will use these defaults.
### Per-Host Channel Assignment
Assign specific channels to each host in the `hosts` section:
```yaml
hosts:
# Critical production web server - multiple channels for redundancy
prod-web-01:
threshold_config: high_sensitivity
watch: true
notification_channels:
- signal_oncall # Immediate mobile notification
- pushover_urgent # Secondary mobile notification
- email_ops # Email for record keeping
dyndns: false
# Database server - ops team notifications only
prod-db-01:
threshold_config: database
watch: true
notification_channels:
- signal_ops
- email_ops
dyndns: false
# Development server - email only, no urgent notifications
dev-server-01:
threshold_config: low_sensitivity
watch: false
notification_channels:
- email_devteam
dyndns: false
# Test server - uses default_notification_channels
test-server-01:
threshold_config: default
watch: false
dyndns: false
# No notification_channels specified = uses default_notification_channels
```
## Channel Types
### Email
Sends notifications via SMTP.
**Configuration fields:**
```yaml
type: email
recipients: [email1@example.com, email2@example.com] # Required: List of recipients
sender: heartbeat@example.com # Required: From address
smtp_server: smtp.example.com # Required: SMTP server hostname
smtp_port: 587 # Optional: Default 587
smtp_user: heartbeat@example.com # Optional: For authenticated SMTP
smtp_password: your-password # Optional: For authenticated SMTP
```
**Features:**
- Supports multiple recipients
- TLS/STARTTLS support on port 587
- Authenticated and unauthenticated SMTP
**Example:**
```yaml
notification_channels:
email_critical:
type: email
recipients: [admin@example.com, oncall@example.com]
sender: alerts@example.com
smtp_server: smtp.fastmail.com
smtp_port: 587
smtp_user: alerts@example.com
smtp_password: app-specific-password
```
### Pushover
Sends push notifications to mobile devices via Pushover API.
**Configuration fields:**
```yaml
type: pushover
token: your-application-token # Required: Your Pushover app token
user: your-user-key # Required: Recipient's user key
```
**Features:**
- Instant mobile push notifications
- Works on iOS and Android
- Supports delivery confirmations
**Setup:**
1. Create a Pushover account at https://pushover.net
2. Create an application to get your app token
3. Note your user key from your account dashboard
**Example:**
```yaml
notification_channels:
pushover_admin:
type: pushover
token: azGDORePK8gMaC0QOYAMyEEuzJnyUi
user: uQiRzpo4DXghDmr9QzzfQu27cmVRsG
```
### Signal
Sends notifications via Signal messenger using signal-cli.
**Configuration fields:**
```yaml
type: signal
cli_path: /usr/local/bin/signal-cli # Optional: Path to signal-cli binary
user: +1234567890 # Required: Your Signal phone number
recipient: +0987654321 # Required: Recipient phone number
```
**Prerequisites:**
1. Install signal-cli: https://github.com/AsamK/signal-cli
2. Register signal-cli with your phone number:
```bash
signal-cli -u +1234567890 register
signal-cli -u +1234567890 verify CODE
```
3. Ensure signal-cli is in PATH or specify full path in config
**Features:**
- End-to-end encrypted messaging
- Works without phone being online
- No API fees or rate limits
**Example:**
```yaml
notification_channels:
signal_admin:
signal_ops:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +12025551234
recipient: +12025559999
```
### Mattermost
Sends notifications to Mattermost team chat via incoming webhooks.
**Configuration fields:**
```yaml
type: mattermost
host: mattermost.example.com # Required: Mattermost server hostname
token: your-webhook-token # Required: Incoming webhook token
channel: channel-name # Required: Target channel name
username: heartbeat-bot # Optional: Bot display name
icon: https://example.com/icon.png # Optional: Bot icon URL
```
**Prerequisites:**
1. Enable incoming webhooks in Mattermost
2. Create an incoming webhook for your team
3. Note the webhook token from the webhook URL
**Features:**
- Team-wide visibility
- Rich formatting support
- Message threading
**Example:**
```yaml
notification_channels:
mattermost_ops:
mattermost_devops:
type: mattermost
host: chat.example.com
token: abc123def456ghi789
channel: infrastructure-alerts
username: heartbeat-monitor
icon: https://example.com/heartbeat-icon.png
host: mattermost.example.com
token: webhook-token
channel: devops-alerts
username: heartbeat-bot
```
## Notification Events
### Users with notification channels
The system sends notifications for various events:
Each user lists which global channels they receive notifications on:
### Threshold Alerts
```yaml
users:
alice:
full_name: Alice Smith
password: pbkdf2:sha256:...
admin: true
notification_channels: [pushover_ops, email_ops]
When monitored metrics exceed configured thresholds:
- **State changes**: OK → WARNING, WARNING → CRITICAL, CRITICAL → OK
- **Format**: `{LEVEL}: {hostname} - {metric_path} = {value} {threshold_info}`
- **Example**: `CRITICAL: prod-web-01 - cpu_monitor.cpu_percent = 95.2 (threshold: > 90.0)`
- **Re-notifications**: Periodic reminders for ongoing alerts (default: hourly)
### Heartbeat Events
Host lifecycle events:
- **Host boot**: `{hostname} booted`
- **Host shutdown**: `{hostname} {connection_type} shutdown`
- **Host recovery**: `{hostname} {connection_type} is back`
- **Connection issues**: `{hostname} {message}`
- **Host overdue**: `{hostname} {connection_type} overdue`
Only hosts with `watch: true` send heartbeat event notifications.
### Custom Alerts
Application code can send custom notifications:
```python
from hbd.server import notify as notify_mod
# Send to host-specific channels
notify_mod.pushmsg_for_host("prod-web-01", "Custom alert message")
# Send using global config
notify_mod.pushmsg_from_config("Global notification")
# Send to specific config
notify_mod.pushmsg(custom_config_dict, "Targeted notification")
bob:
full_name: Bob Jones
password: pbkdf2:sha256:...
notification_channels: [sms_oncall, matrix_oncall]
```
## Design Principles
### Host access — owner and managers
The notification system follows these core principles:
- **Centralization**: Define notification providers once, reference them by name
- **Flexibility**: Each host can use different channels for different notification needs
- **Redundancy**: Critical hosts can specify multiple channels for failover
- **Clarity**: Clean separation between channel definition and channel assignment
- **Type Safety**: Provider-specific validation at configuration time
## Best Practices
### Channel Organization
- **Create purpose-specific channels**: `email_ops`, `signal_oncall`, `pushover_urgent`
- **Separate by team/role**: `email_devteam`, `signal_dbateam`, `mattermost_security`
- **Use descriptive names**: Channel names appear in logs and debugging
### Redundancy
For critical hosts, use multiple notification channels:
Notifications for a host go to its owner and all managers:
```yaml
hosts:
critical-db:
notification_channels:
- signal_oncall # Primary: Mobile alert
- pushover_urgent # Backup: Different mobile platform
- email_ops # Tertiary: Email for record-keeping
webserver01:
owner: alice # receives all notifications for this host
managers: [bob] # also receives notifications
threshold_config: default
watch: true # bold in dashboard (cosmetic only)
dyndns: false
dbserver01:
owner: alice
managers: [bob]
threshold_config: database
dyndns: false
```
### Notification Fatigue Prevention
`watch: true` only affects display (bold name in the live dashboard). Notifications are now controlled entirely by owner/managers.
- **Use `watch: false`** for non-critical hosts
- **Configure appropriate thresholds** to avoid false positives
- **Set different channels for different severities**
- **Use `default_notification_channels`** for baseline, add more for critical systems
## Channel Types
### Security
### `min_level` filtering
- **Protect credentials**: Use file permissions to protect config files with passwords/tokens
- **Rotate tokens**: Periodically rotate API tokens and passwords
- **Use app-specific passwords**: For email, use app-specific passwords instead of main account password
- **Separate accounts**: Consider separate notification accounts for different environments (prod vs dev)
Every channel accepts an optional `min_level` field:
### Testing
| Value | Channels receive |
|---|---|
| `WARNING` (default) | WARNING, CRITICAL, RECOVER |
| `CRITICAL` | CRITICAL only (and RECOVER) |
Test notification channels before relying on them:
`RECOVER` is always passed through — you don't want to miss a recovery.
### pushover
Sends push notifications via [Pushover](https://pushover.net). Includes title, body, and a clickable URL.
```yaml
type: pushover
token: your-app-token # Required: Pushover application token
user: your-user-key # Required: Recipient's user key
min_level: WARNING
```
### email
Sends via SMTP. Subject = title, body = message + URL on final line.
```yaml
type: email
recipients: [ops@example.com, oncall@example.com]
sender: hbd@example.com
smtp_server: smtp.example.com
smtp_port: 587 # 587 = STARTTLS (default), 465 = SSL
smtp_user: hbd@example.com
smtp_password: secret
min_level: WARNING
```
### matrix
Sends a formatted HTML message to a Matrix room via [matrix-nio](https://github.com/poljar/matrix-nio).
```yaml
type: matrix
homeserver: https://matrix.example.org
access_token: syt_xxx # Bot account access token
room_id: "!abc:matrix.example.org"
min_level: WARNING
```
**Setup:**
1. Create a bot Matrix account
2. Obtain its access token (Element → Settings → Help & About → Access Token)
3. Invite the bot to the target room and note the room ID
### sms_voipms
Sends SMS via the [voip.ms REST API](https://voip.ms/api/v1/rest.php). Message is truncated to 160 characters.
```yaml
type: sms_voipms
api_user: me@example.com # voip.ms account email
api_password: secret # voip.ms API password
did: "5551234567" # Your voip.ms DID (sending number)
dst: "5559876543" # Destination number
min_level: CRITICAL
```
### signal
Sends via [signal-cli](https://github.com/AsamK/signal-cli).
```yaml
type: signal
cli_path: /usr/local/bin/signal-cli
user: +12025551234 # Your registered Signal number
recipient: +12025559999 # Recipient number
min_level: WARNING
```
**Setup:**
```bash
# Test signal-cli directly
signal-cli -u +1234567890 send -m "Test message" +0987654321
# Test SMTP
echo "Test" | mail -s "Test Subject" admin@example.com
# Test through heartbeat system (Python REPL)
from hbd.server import notify as notify_mod, config as config_mod
cfg = config_mod.load_config(".hb.yaml")
notify_mod.setup(cfg)
notify_mod.pushmsg_for_host("test-host", "Test notification")
signal-cli -u +12025551234 register
signal-cli -u +12025551234 verify CODE
```
### mattermost
Sends via Mattermost incoming webhook. Message is formatted as Markdown.
```yaml
type: mattermost
host: mattermost.example.com
token: your-webhook-token
channel: devops-alerts
username: heartbeat-bot # Optional: display name
icon: https://…/icon.png # Optional: bot icon URL
min_level: WARNING
```
## Notification events
| Source | Level | Title example | Body example |
|---|---|---|---|
| Host overdue | CRITICAL | `[CRITICAL] webserver01` | `IPv4 overdue` |
| Host recover | RECOVER | `[RECOVER] webserver01` | `IPv4 back after being overdue for 5:23` |
| Host boot | INFO | `[INFO] webserver01` | `webserver01 booted` |
| Host shutdown | INFO | `[INFO] webserver01` | `IPv4 shutdown` |
| Threshold breach | WARNING/CRITICAL | `[CRITICAL] webserver01` | `cpu_percent = 95.2 (threshold: > 90.0)` |
| Threshold reminder | CRITICAL | `[REMINDER/CRITICAL] webserver01` | `REMINDER (CRITICAL): … ongoing for 3600s` |
| Connection issue | WARNING | `[WARNING] webserver01` | `new address detected …` |
Reminder notifications (re-notify) are sent only for CRITICAL level alerts.
## API reference
### `send_notification(host_name, notif) -> dict`
Main entry point. Dispatches to owner + managers.
```python
from hbd.server.notify import send_notification, Notification
send_notification(
"webserver01",
Notification(
title="[CRITICAL] webserver01",
body="cpu_percent = 95.2 (threshold: > 90.0)",
level="CRITICAL",
url="https://hbd.example.com/plugins#webserver01",
),
)
```
Returns `{channel_name: bool}` for each channel dispatched.
### `setup(cfg, loop=None)`
Called once at startup from `main.py`. Pass the running asyncio event loop so Matrix sends work correctly.
## Troubleshooting
### Notifications Not Sending
**No notifications sent:**
- Check that users are configured (`users:` section in yaml)
- Check that the host has an `owner` or `managers` set
- Check that users have `notification_channels` listed
- Check that the channel names in user config match keys under `notification_channels:`
1. **Check logs**: Look for "Failed to send notification" errors
2. **Verify host is watched**: Ensure `watch: true` in host definition
3. **Check channel configuration**: Verify credentials and settings
4. **Test channel directly**: Use command-line tools to test provider
5. **Check network**: Ensure server can reach notification endpoints
**min_level filtering too aggressive:**
- Default is `WARNING` — both WARNING and CRITICAL are sent
- Set `min_level: WARNING` explicitly if you were expecting warnings but set CRITICAL
### Signal Issues
**Matrix sends time out:**
- Verify the access token is valid and the bot is in the room
- `matrix-nio` must be installed: `pip install matrix-nio`
- **signal-cli not found**: Specify full path in `cli_path`
- **Not registered**: Run `signal-cli -u +NUMBER register` and verify
- **Trust issues**: Run `signal-cli -u +NUMBER receive` to sync trust store
- **Recipient not found**: Ensure recipient is in your Signal contacts
**voip.ms SMS fails:**
- Enable the API in your voip.ms account (Account → API)
- Verify the DID is SMS-capable in your voip.ms account
### Email Issues
**Signal not found:**
- Specify full `cli_path`
- Run `signal-cli -u +NUMBER receive` to sync trust store
- **Authentication failed**: Check SMTP username/password
- **TLS errors**: Verify SMTP port (587 for STARTTLS, 465 for SSL)
- **Relay denied**: Ensure SMTP server allows relay from your IP
- **Timeout**: Check firewall rules for SMTP ports
**Email authentication failed:**
- Use app-specific passwords for Gmail/Fastmail
- Verify port: 587 for STARTTLS, 465 for SSL
### Pushover Issues
- **Invalid token/user**: Verify token and user key from Pushover dashboard
- **API rate limits**: Pushover has monthly message limits on free tier
- **HTTP errors**: Check Pushover API status page
### Mattermost Issues
- **Webhook not found**: Verify webhook token and ensure webhook is enabled
- **Channel not found**: Check channel name spelling and permissions
- **Driver import error**: Install mattermostdriver: `pip install mattermostdriver`
## API Reference
### Main Functions
#### `pushmsg_for_host(hostname: str, msg: str, debug: int = 0) -> dict`
Send notification to host-specific channels.
**Parameters:**
- `hostname`: Name of the host (used to look up notification channels)
- `msg`: Message to send
- `debug`: Debug level (0=no debug, 1+=debug output)
**Returns:** Dictionary of results per channel: `{"signal_ops": True, "email_ops": False}`
**Example:**
```python
from hbd.server import notify as notify_mod
notify_mod.pushmsg_for_host("prod-web-01", "Server CPU at 95%")
```
**Behavior:**
1. Looks up notification channels configured for the host
2. If no host-specific channels, uses `default_notification_channels`
3. Dispatches to each channel in parallel
4. Returns dict of results keyed by channel name
5. Logs success/failure for each channel
## Examples
### Complete Configuration Example
```yaml
# Notification channel definitions
notification_channels:
signal_oncall:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +12025551234
recipient: +12025555678
email_ops:
type: email
recipients: [ops@example.com, alerts@example.com]
sender: heartbeat@example.com
smtp_server: smtp.fastmail.com
smtp_port: 587
smtp_user: heartbeat@example.com
smtp_password: app-password-here
# Default channels
default_notification_channels: [email_ops]
# Host definitions with channel assignments
hosts:
prod-web-01:
threshold_config: high_sensitivity
watch: true
notification_channels: [signal_oncall, email_ops]
dyndns: false
dev-server-01:
threshold_config: low_sensitivity
watch: false
notification_channels: [email_ops]
dyndns: false
```
### Multiple Environments Example
```yaml
notification_channels:
# Production channels
signal_prod_oncall:
type: signal
user: +12025551234
recipient: +12025551111 # On-call phone
email_prod_ops:
type: email
recipients: [prod-ops@example.com]
sender: prod-heartbeat@example.com
smtp_server: smtp.example.com
# Staging channels
email_staging:
type: email
recipients: [staging-alerts@example.com]
sender: staging-heartbeat@example.com
smtp_server: smtp.example.com
# Development channels
mattermost_dev:
type: mattermost
host: chat.example.com
token: dev-webhook-token
channel: dev-alerts
hosts:
prod-api-01:
notification_channels: [signal_prod_oncall, email_prod_ops]
staging-api-01:
notification_channels: [email_staging]
dev-api-01:
notification_channels: [mattermost_dev]
```
**Pushover `400` errors:**
- Double-check `token` (app) and `user` (user key) — they are different values