Files
heartbeat/docs/NOTIFICATIONS.md
T
2026-04-01 19:41:53 -04:00

534 lines
15 KiB
Markdown

# Notification System
## Overview
The Heartbeat Monitoring System includes a flexible notification system that can send alerts through multiple channels including Email, Pushover, Signal, and Mattermost. The system supports centralized channel definitions with per-host routing, allowing fine-grained control over notification delivery.
## Architecture
### Components
1. **Notification Channels** (`notification_channels` in config)
- Centralized definitions of notification providers
- Each channel has a type and type-specific credentials
- Reusable across multiple hosts
2. **Channel Dispatcher** (`hbd/server/notify.py`)
- `pushmsg_for_host(hostname, message)`: Main entry point for host-specific notifications
- `_dispatch_to_channel(channel_name, channel_config, message)`: Routes to specific provider
- Provider functions: `pushover()`, `pushsignal()`, `pushmattermost()`, `send_email()`
3. **Configuration Utilities** (`hbd/server/config.py`)
- `get_notification_channels_for_host(config, hostname)`: Retrieves channel names for a host
- `get_notification_channels_config(config, hostname)`: Retrieves full channel configurations
- `get_channel_config(config, channel_name)`: Gets configuration for a specific channel
4. **Integration Points**
- **Threshold alerts**: `threshold.py` calls `notify_mod.pushmsg_for_host()`
- **Heartbeat events**: `udp.py` calls `notify_mod.pushmsg_for_host()` for boot/shutdown/overdue
- **Custom alerts**: Any code can call `notify_mod.pushmsg_for_host(hostname, message)`
## Configuration
### Centralized Channel Definitions
Define notification channels once in your configuration file:
```yaml
notification_channels:
# Signal notifications
signal_ops:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +1234567890 # Your Signal number
recipient: +1234567890 # Recipient number
signal_oncall:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +1234567890
recipient: +0987654321 # Different recipient
# Email notifications
email_ops:
type: email
recipients:
- ops@example.com
- alerts@example.com
sender: heartbeat@example.com
smtp_server: smtp.example.com
smtp_port: 587
smtp_user: heartbeat@example.com
smtp_password: your-smtp-password
email_devteam:
type: email
recipients: [dev-alerts@example.com]
sender: heartbeat-dev@example.com
smtp_server: smtp.example.com
smtp_port: 587
smtp_user: heartbeat-dev@example.com
smtp_password: your-smtp-password
# Pushover notifications
pushover_urgent:
type: pushover
token: your-pushover-app-token
user: your-pushover-user-key
pushover_normal:
type: pushover
token: your-pushover-app-token
user: another-user-key
# Mattermost notifications
mattermost_devops:
type: mattermost
host: mattermost.example.com
token: your-webhook-token
channel: devops-alerts
username: heartbeat-bot
icon: https://example.com/heartbeat-icon.png
```
### Default Notification Channels
Specify default channels for hosts that don't have specific channel assignments:
```yaml
default_notification_channels:
- email_ops
- mattermost_devops
```
Hosts without `notification_channels` defined will use these defaults.
### Per-Host Channel Assignment
Assign specific channels to each host in the `hosts` section:
```yaml
hosts:
# Critical production web server - multiple channels for redundancy
prod-web-01:
threshold_config: high_sensitivity
watch: true
notification_channels:
- signal_oncall # Immediate mobile notification
- pushover_urgent # Secondary mobile notification
- email_ops # Email for record keeping
dyndns: false
# Database server - ops team notifications only
prod-db-01:
threshold_config: database
watch: true
notification_channels:
- signal_ops
- email_ops
dyndns: false
# Development server - email only, no urgent notifications
dev-server-01:
threshold_config: low_sensitivity
watch: false
notification_channels:
- email_devteam
dyndns: false
# Test server - uses default_notification_channels
test-server-01:
threshold_config: default
watch: false
dyndns: false
# No notification_channels specified = uses default_notification_channels
```
## Channel Types
### Email
Sends notifications via SMTP.
**Configuration fields:**
```yaml
type: email
recipients: [email1@example.com, email2@example.com] # Required: List of recipients
sender: heartbeat@example.com # Required: From address
smtp_server: smtp.example.com # Required: SMTP server hostname
smtp_port: 587 # Optional: Default 587
smtp_user: heartbeat@example.com # Optional: For authenticated SMTP
smtp_password: your-password # Optional: For authenticated SMTP
```
**Features:**
- Supports multiple recipients
- TLS/STARTTLS support on port 587
- Authenticated and unauthenticated SMTP
**Example:**
```yaml
notification_channels:
email_critical:
type: email
recipients: [admin@example.com, oncall@example.com]
sender: alerts@example.com
smtp_server: smtp.fastmail.com
smtp_port: 587
smtp_user: alerts@example.com
smtp_password: app-specific-password
```
### Pushover
Sends push notifications to mobile devices via Pushover API.
**Configuration fields:**
```yaml
type: pushover
token: your-application-token # Required: Your Pushover app token
user: your-user-key # Required: Recipient's user key
```
**Features:**
- Instant mobile push notifications
- Works on iOS and Android
- Supports delivery confirmations
**Setup:**
1. Create a Pushover account at https://pushover.net
2. Create an application to get your app token
3. Note your user key from your account dashboard
**Example:**
```yaml
notification_channels:
pushover_admin:
type: pushover
token: azGDORePK8gMaC0QOYAMyEEuzJnyUi
user: uQiRzpo4DXghDmr9QzzfQu27cmVRsG
```
### Signal
Sends notifications via Signal messenger using signal-cli.
**Configuration fields:**
```yaml
type: signal
cli_path: /usr/local/bin/signal-cli # Optional: Path to signal-cli binary
user: +1234567890 # Required: Your Signal phone number
recipient: +0987654321 # Required: Recipient phone number
```
**Prerequisites:**
1. Install signal-cli: https://github.com/AsamK/signal-cli
2. Register signal-cli with your phone number:
```bash
signal-cli -u +1234567890 register
signal-cli -u +1234567890 verify CODE
```
3. Ensure signal-cli is in PATH or specify full path in config
**Features:**
- End-to-end encrypted messaging
- Works without phone being online
- No API fees or rate limits
**Example:**
```yaml
notification_channels:
signal_admin:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +12025551234
recipient: +12025559999
```
### Mattermost
Sends notifications to Mattermost team chat via incoming webhooks.
**Configuration fields:**
```yaml
type: mattermost
host: mattermost.example.com # Required: Mattermost server hostname
token: your-webhook-token # Required: Incoming webhook token
channel: channel-name # Required: Target channel name
username: heartbeat-bot # Optional: Bot display name
icon: https://example.com/icon.png # Optional: Bot icon URL
```
**Prerequisites:**
1. Enable incoming webhooks in Mattermost
2. Create an incoming webhook for your team
3. Note the webhook token from the webhook URL
**Features:**
- Team-wide visibility
- Rich formatting support
- Message threading
**Example:**
```yaml
notification_channels:
mattermost_ops:
type: mattermost
host: chat.example.com
token: abc123def456ghi789
channel: infrastructure-alerts
username: heartbeat-monitor
icon: https://example.com/heartbeat-icon.png
```
## Notification Events
The system sends notifications for various events:
### Threshold Alerts
When monitored metrics exceed configured thresholds:
- **State changes**: OK → WARNING, WARNING → CRITICAL, CRITICAL → OK
- **Format**: `{LEVEL}: {hostname} - {metric_path} = {value} {threshold_info}`
- **Example**: `CRITICAL: prod-web-01 - cpu_monitor.cpu_percent = 95.2 (threshold: > 90.0)`
- **Re-notifications**: Periodic reminders for ongoing alerts (default: hourly)
### Heartbeat Events
Host lifecycle events:
- **Host boot**: `{hostname} booted`
- **Host shutdown**: `{hostname} {connection_type} shutdown`
- **Host recovery**: `{hostname} {connection_type} is back`
- **Connection issues**: `{hostname} {message}`
- **Host overdue**: `{hostname} {connection_type} overdue`
Only hosts with `watch: true` send heartbeat event notifications.
### Custom Alerts
Application code can send custom notifications:
```python
from hbd.server import notify as notify_mod
# Send to host-specific channels
notify_mod.pushmsg_for_host("prod-web-01", "Custom alert message")
# Send using global config
notify_mod.pushmsg_from_config("Global notification")
# Send to specific config
notify_mod.pushmsg(custom_config_dict, "Targeted notification")
```
## Design Principles
The notification system follows these core principles:
- **Centralization**: Define notification providers once, reference them by name
- **Flexibility**: Each host can use different channels for different notification needs
- **Redundancy**: Critical hosts can specify multiple channels for failover
- **Clarity**: Clean separation between channel definition and channel assignment
- **Type Safety**: Provider-specific validation at configuration time
## Best Practices
### Channel Organization
- **Create purpose-specific channels**: `email_ops`, `signal_oncall`, `pushover_urgent`
- **Separate by team/role**: `email_devteam`, `signal_dbateam`, `mattermost_security`
- **Use descriptive names**: Channel names appear in logs and debugging
### Redundancy
For critical hosts, use multiple notification channels:
```yaml
hosts:
critical-db:
notification_channels:
- signal_oncall # Primary: Mobile alert
- pushover_urgent # Backup: Different mobile platform
- email_ops # Tertiary: Email for record-keeping
```
### Notification Fatigue Prevention
- **Use `watch: false`** for non-critical hosts
- **Configure appropriate thresholds** to avoid false positives
- **Set different channels for different severities**
- **Use `default_notification_channels`** for baseline, add more for critical systems
### Security
- **Protect credentials**: Use file permissions to protect config files with passwords/tokens
- **Rotate tokens**: Periodically rotate API tokens and passwords
- **Use app-specific passwords**: For email, use app-specific passwords instead of main account password
- **Separate accounts**: Consider separate notification accounts for different environments (prod vs dev)
### Testing
Test notification channels before relying on them:
```bash
# Test signal-cli directly
signal-cli -u +1234567890 send -m "Test message" +0987654321
# Test SMTP
echo "Test" | mail -s "Test Subject" admin@example.com
# Test through heartbeat system (Python REPL)
from hbd.server import notify as notify_mod, config as config_mod
cfg = config_mod.load_config(".hb.yaml")
notify_mod.setup(cfg)
notify_mod.pushmsg_for_host("test-host", "Test notification")
```
## Troubleshooting
### Notifications Not Sending
1. **Check logs**: Look for "Failed to send notification" errors
2. **Verify host is watched**: Ensure `watch: true` in host definition
3. **Check channel configuration**: Verify credentials and settings
4. **Test channel directly**: Use command-line tools to test provider
5. **Check network**: Ensure server can reach notification endpoints
### Signal Issues
- **signal-cli not found**: Specify full path in `cli_path`
- **Not registered**: Run `signal-cli -u +NUMBER register` and verify
- **Trust issues**: Run `signal-cli -u +NUMBER receive` to sync trust store
- **Recipient not found**: Ensure recipient is in your Signal contacts
### Email Issues
- **Authentication failed**: Check SMTP username/password
- **TLS errors**: Verify SMTP port (587 for STARTTLS, 465 for SSL)
- **Relay denied**: Ensure SMTP server allows relay from your IP
- **Timeout**: Check firewall rules for SMTP ports
### Pushover Issues
- **Invalid token/user**: Verify token and user key from Pushover dashboard
- **API rate limits**: Pushover has monthly message limits on free tier
- **HTTP errors**: Check Pushover API status page
### Mattermost Issues
- **Webhook not found**: Verify webhook token and ensure webhook is enabled
- **Channel not found**: Check channel name spelling and permissions
- **Driver import error**: Install mattermostdriver: `pip install mattermostdriver`
## API Reference
### Main Functions
#### `pushmsg_for_host(hostname: str, msg: str, debug: int = 0) -> dict`
Send notification to host-specific channels.
**Parameters:**
- `hostname`: Name of the host (used to look up notification channels)
- `msg`: Message to send
- `debug`: Debug level (0=no debug, 1+=debug output)
**Returns:** Dictionary of results per channel: `{"signal_ops": True, "email_ops": False}`
**Example:**
```python
from hbd.server import notify as notify_mod
notify_mod.pushmsg_for_host("prod-web-01", "Server CPU at 95%")
```
**Behavior:**
1. Looks up notification channels configured for the host
2. If no host-specific channels, uses `default_notification_channels`
3. Dispatches to each channel in parallel
4. Returns dict of results keyed by channel name
5. Logs success/failure for each channel
## Examples
### Complete Configuration Example
```yaml
# Notification channel definitions
notification_channels:
signal_oncall:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +12025551234
recipient: +12025555678
email_ops:
type: email
recipients: [ops@example.com, alerts@example.com]
sender: heartbeat@example.com
smtp_server: smtp.fastmail.com
smtp_port: 587
smtp_user: heartbeat@example.com
smtp_password: app-password-here
# Default channels
default_notification_channels: [email_ops]
# Host definitions with channel assignments
hosts:
prod-web-01:
threshold_config: high_sensitivity
watch: true
notification_channels: [signal_oncall, email_ops]
dyndns: false
dev-server-01:
threshold_config: low_sensitivity
watch: false
notification_channels: [email_ops]
dyndns: false
```
### Multiple Environments Example
```yaml
notification_channels:
# Production channels
signal_prod_oncall:
type: signal
user: +12025551234
recipient: +12025551111 # On-call phone
email_prod_ops:
type: email
recipients: [prod-ops@example.com]
sender: prod-heartbeat@example.com
smtp_server: smtp.example.com
# Staging channels
email_staging:
type: email
recipients: [staging-alerts@example.com]
sender: staging-heartbeat@example.com
smtp_server: smtp.example.com
# Development channels
mattermost_dev:
type: mattermost
host: chat.example.com
token: dev-webhook-token
channel: dev-alerts
hosts:
prod-api-01:
notification_channels: [signal_prod_oncall, email_prod_ops]
staging-api-01:
notification_channels: [email_staging]
dev-api-01:
notification_channels: [mattermost_dev]
```