296 lines
8.1 KiB
Markdown
296 lines
8.1 KiB
Markdown
# Notification System
|
|
|
|
## Overview
|
|
|
|
Notifications are dispatched to the **owner and managers** of a host, each via their own configured notification channels. Channel definitions are global; users reference them by name. No users configured → no notifications sent.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Alert event (udp.py / threshold.py)
|
|
└─ notify.send_notification(host_name, Notification)
|
|
├─ look up host.owner + host.managers
|
|
├─ for each user → user.notification_channels
|
|
└─ for each channel → _dispatch_to_channel (filtered by min_level)
|
|
```
|
|
|
|
Every notification carries:
|
|
- **title** — `[LEVEL] hostname` (e.g. `[CRITICAL] webserver01`)
|
|
- **body** — detail message (metric value, threshold, duration)
|
|
- **url** — link to the plugin metrics page (`{base_url}/plugins#{hostname}`)
|
|
- **level** — `RECOVER | WARNING | CRITICAL | INFO`
|
|
|
|
## Configuration
|
|
|
|
### Base URL
|
|
|
|
Set `base_url` so notification links point to your hbd instance:
|
|
|
|
```yaml
|
|
base_url: https://hbd.example.com
|
|
```
|
|
|
|
### Global channel definitions
|
|
|
|
Define channels once; reference them by name from user configs:
|
|
|
|
```yaml
|
|
notification_channels:
|
|
|
|
pushover_ops:
|
|
type: pushover
|
|
token: your-app-token
|
|
user: your-user-key
|
|
min_level: WARNING # optional, default: WARNING
|
|
|
|
email_ops:
|
|
type: email
|
|
recipients: [ops@example.com]
|
|
sender: hbd@example.com
|
|
smtp_server: smtp.example.com
|
|
smtp_port: 587
|
|
smtp_user: hbd@example.com
|
|
smtp_password: secret
|
|
min_level: WARNING
|
|
|
|
matrix_oncall:
|
|
type: matrix
|
|
homeserver: https://matrix.example.org
|
|
access_token: syt_xxx
|
|
room_id: "!abc:matrix.example.org"
|
|
min_level: CRITICAL # only send critical alerts to this room
|
|
|
|
sms_oncall:
|
|
type: sms_voipms
|
|
api_user: me@example.com
|
|
api_password: secret
|
|
did: "5551234567" # your voip.ms DID number
|
|
dst: "5559876543" # destination number
|
|
min_level: CRITICAL
|
|
|
|
signal_ops:
|
|
type: signal
|
|
cli_path: /usr/local/bin/signal-cli
|
|
user: +12025551234
|
|
recipient: +12025559999
|
|
|
|
mattermost_devops:
|
|
type: mattermost
|
|
host: mattermost.example.com
|
|
token: webhook-token
|
|
channel: devops-alerts
|
|
username: heartbeat-bot
|
|
```
|
|
|
|
### Users with notification channels
|
|
|
|
Each user lists which global channels they receive notifications on:
|
|
|
|
```yaml
|
|
users:
|
|
alice:
|
|
full_name: Alice Smith
|
|
password: pbkdf2:sha256:...
|
|
admin: true
|
|
notification_channels: [pushover_ops, email_ops]
|
|
|
|
bob:
|
|
full_name: Bob Jones
|
|
password: pbkdf2:sha256:...
|
|
notification_channels: [sms_oncall, matrix_oncall]
|
|
```
|
|
|
|
### Host access — owner and managers
|
|
|
|
Notifications for a host go to its owner and all managers:
|
|
|
|
```yaml
|
|
hosts:
|
|
webserver01:
|
|
owner: alice # receives all notifications for this host
|
|
managers: [bob] # also receives notifications
|
|
threshold_config: default
|
|
watch: true # bold in dashboard (cosmetic only)
|
|
dyndns: false
|
|
|
|
dbserver01:
|
|
owner: alice
|
|
managers: [bob]
|
|
threshold_config: database
|
|
dyndns: false
|
|
```
|
|
|
|
`watch: true` only affects display (bold name in the live dashboard). Notifications are now controlled entirely by owner/managers.
|
|
|
|
## Channel Types
|
|
|
|
### `min_level` filtering
|
|
|
|
Every channel accepts an optional `min_level` field:
|
|
|
|
| Value | Channels receive |
|
|
|---|---|
|
|
| `WARNING` (default) | WARNING, CRITICAL, RECOVER |
|
|
| `CRITICAL` | CRITICAL only (and RECOVER) |
|
|
|
|
`RECOVER` is always passed through — you don't want to miss a recovery.
|
|
|
|
### pushover
|
|
|
|
Sends push notifications via [Pushover](https://pushover.net). Includes title, body, and a clickable URL.
|
|
|
|
```yaml
|
|
type: pushover
|
|
token: your-app-token # Required: Pushover application token
|
|
user: your-user-key # Required: Recipient's user key
|
|
min_level: WARNING
|
|
```
|
|
|
|
### email
|
|
|
|
Sends via SMTP. Subject = title, body = message + URL on final line.
|
|
|
|
```yaml
|
|
type: email
|
|
recipients: [ops@example.com, oncall@example.com]
|
|
sender: hbd@example.com
|
|
smtp_server: smtp.example.com
|
|
smtp_port: 587 # 587 = STARTTLS (default), 465 = SSL
|
|
smtp_user: hbd@example.com
|
|
smtp_password: secret
|
|
min_level: WARNING
|
|
```
|
|
|
|
### matrix
|
|
|
|
Sends a formatted HTML message to a Matrix room via [matrix-nio](https://github.com/poljar/matrix-nio).
|
|
|
|
```yaml
|
|
type: matrix
|
|
homeserver: https://matrix.example.org
|
|
access_token: syt_xxx # Bot account access token
|
|
room_id: "!abc:matrix.example.org"
|
|
min_level: WARNING
|
|
```
|
|
|
|
**Setup:**
|
|
1. Create a bot Matrix account
|
|
2. Obtain its access token (Element → Settings → Help & About → Access Token)
|
|
3. Invite the bot to the target room and note the room ID
|
|
|
|
### sms_voipms
|
|
|
|
Sends SMS via the [voip.ms REST API](https://voip.ms/api/v1/rest.php). Message is truncated to 160 characters.
|
|
|
|
```yaml
|
|
type: sms_voipms
|
|
api_user: me@example.com # voip.ms account email
|
|
api_password: secret # voip.ms API password
|
|
did: "5551234567" # Your voip.ms DID (sending number)
|
|
dst: "5559876543" # Destination number
|
|
min_level: CRITICAL
|
|
```
|
|
|
|
### signal
|
|
|
|
Sends via [signal-cli](https://github.com/AsamK/signal-cli).
|
|
|
|
```yaml
|
|
type: signal
|
|
cli_path: /usr/local/bin/signal-cli
|
|
user: +12025551234 # Your registered Signal number
|
|
recipient: +12025559999 # Recipient number
|
|
min_level: WARNING
|
|
```
|
|
|
|
**Setup:**
|
|
```bash
|
|
signal-cli -u +12025551234 register
|
|
signal-cli -u +12025551234 verify CODE
|
|
```
|
|
|
|
### mattermost
|
|
|
|
Sends via Mattermost incoming webhook. Message is formatted as Markdown.
|
|
|
|
```yaml
|
|
type: mattermost
|
|
host: mattermost.example.com
|
|
token: your-webhook-token
|
|
channel: devops-alerts
|
|
username: heartbeat-bot # Optional: display name
|
|
icon: https://…/icon.png # Optional: bot icon URL
|
|
min_level: WARNING
|
|
```
|
|
|
|
## Notification events
|
|
|
|
| Source | Level | Title example | Body example |
|
|
|---|---|---|---|
|
|
| Host overdue | CRITICAL | `[CRITICAL] webserver01` | `IPv4 overdue` |
|
|
| Host recover | RECOVER | `[RECOVER] webserver01` | `IPv4 back after being overdue for 5:23` |
|
|
| Host boot | INFO | `[INFO] webserver01` | `webserver01 booted` |
|
|
| Host shutdown | INFO | `[INFO] webserver01` | `IPv4 shutdown` |
|
|
| Threshold breach | WARNING/CRITICAL | `[CRITICAL] webserver01` | `cpu_percent = 95.2 (threshold: > 90.0)` |
|
|
| Threshold reminder | CRITICAL | `[REMINDER/CRITICAL] webserver01` | `REMINDER (CRITICAL): … ongoing for 3600s` |
|
|
| Connection issue | WARNING | `[WARNING] webserver01` | `new address detected …` |
|
|
|
|
Reminder notifications (re-notify) are sent only for CRITICAL level alerts.
|
|
|
|
## API reference
|
|
|
|
### `send_notification(host_name, notif) -> dict`
|
|
|
|
Main entry point. Dispatches to owner + managers.
|
|
|
|
```python
|
|
from hbd.server.notify import send_notification, Notification
|
|
|
|
send_notification(
|
|
"webserver01",
|
|
Notification(
|
|
title="[CRITICAL] webserver01",
|
|
body="cpu_percent = 95.2 (threshold: > 90.0)",
|
|
level="CRITICAL",
|
|
url="https://hbd.example.com/plugins#webserver01",
|
|
),
|
|
)
|
|
```
|
|
|
|
Returns `{channel_name: bool}` for each channel dispatched.
|
|
|
|
### `setup(cfg, loop=None)`
|
|
|
|
Called once at startup from `main.py`. Pass the running asyncio event loop so Matrix sends work correctly.
|
|
|
|
## Troubleshooting
|
|
|
|
**No notifications sent:**
|
|
- Check that users are configured (`users:` section in yaml)
|
|
- Check that the host has an `owner` or `managers` set
|
|
- Check that users have `notification_channels` listed
|
|
- Check that the channel names in user config match keys under `notification_channels:`
|
|
|
|
**min_level filtering too aggressive:**
|
|
- Default is `WARNING` — both WARNING and CRITICAL are sent
|
|
- Set `min_level: WARNING` explicitly if you were expecting warnings but set CRITICAL
|
|
|
|
**Matrix sends time out:**
|
|
- Verify the access token is valid and the bot is in the room
|
|
- `matrix-nio` must be installed: `pip install matrix-nio`
|
|
|
|
**voip.ms SMS fails:**
|
|
- Enable the API in your voip.ms account (Account → API)
|
|
- Verify the DID is SMS-capable in your voip.ms account
|
|
|
|
**Signal not found:**
|
|
- Specify full `cli_path`
|
|
- Run `signal-cli -u +NUMBER receive` to sync trust store
|
|
|
|
**Email authentication failed:**
|
|
- Use app-specific passwords for Gmail/Fastmail
|
|
- Verify port: 587 for STARTTLS, 465 for SSL
|
|
|
|
**Pushover `400` errors:**
|
|
- Double-check `token` (app) and `user` (user key) — they are different values
|