534 lines
15 KiB
Markdown
534 lines
15 KiB
Markdown
# Notification System
|
|
|
|
## Overview
|
|
|
|
The Heartbeat Monitoring System includes a flexible notification system that can send alerts through multiple channels including Email, Pushover, Signal, and Mattermost. The system supports centralized channel definitions with per-host routing, allowing fine-grained control over notification delivery.
|
|
|
|
## Architecture
|
|
|
|
### Components
|
|
|
|
1. **Notification Channels** (`notification_channels` in config)
|
|
- Centralized definitions of notification providers
|
|
- Each channel has a type and type-specific credentials
|
|
- Reusable across multiple hosts
|
|
|
|
2. **Channel Dispatcher** (`hbd/server/notify.py`)
|
|
- `pushmsg_for_host(hostname, message)`: Main entry point for host-specific notifications
|
|
- `_dispatch_to_channel(channel_name, channel_config, message)`: Routes to specific provider
|
|
- Provider functions: `pushover()`, `pushsignal()`, `pushmattermost()`, `send_email()`
|
|
|
|
3. **Configuration Utilities** (`hbd/server/config.py`)
|
|
- `get_notification_channels_for_host(config, hostname)`: Retrieves channel names for a host
|
|
- `get_notification_channels_config(config, hostname)`: Retrieves full channel configurations
|
|
- `get_channel_config(config, channel_name)`: Gets configuration for a specific channel
|
|
|
|
4. **Integration Points**
|
|
- **Threshold alerts**: `threshold.py` calls `notify_mod.pushmsg_for_host()`
|
|
- **Heartbeat events**: `udp.py` calls `notify_mod.pushmsg_for_host()` for boot/shutdown/overdue
|
|
- **Custom alerts**: Any code can call `notify_mod.pushmsg_for_host(hostname, message)`
|
|
|
|
## Configuration
|
|
|
|
### Centralized Channel Definitions
|
|
|
|
Define notification channels once in your configuration file:
|
|
|
|
```yaml
|
|
notification_channels:
|
|
# Signal notifications
|
|
signal_ops:
|
|
type: signal
|
|
cli_path: /usr/local/bin/signal-cli
|
|
user: +1234567890 # Your Signal number
|
|
recipient: +1234567890 # Recipient number
|
|
|
|
signal_oncall:
|
|
type: signal
|
|
cli_path: /usr/local/bin/signal-cli
|
|
user: +1234567890
|
|
recipient: +0987654321 # Different recipient
|
|
|
|
# Email notifications
|
|
email_ops:
|
|
type: email
|
|
recipients:
|
|
- ops@example.com
|
|
- alerts@example.com
|
|
sender: heartbeat@example.com
|
|
smtp_server: smtp.example.com
|
|
smtp_port: 587
|
|
smtp_user: heartbeat@example.com
|
|
smtp_password: your-smtp-password
|
|
|
|
email_devteam:
|
|
type: email
|
|
recipients: [dev-alerts@example.com]
|
|
sender: heartbeat-dev@example.com
|
|
smtp_server: smtp.example.com
|
|
smtp_port: 587
|
|
smtp_user: heartbeat-dev@example.com
|
|
smtp_password: your-smtp-password
|
|
|
|
# Pushover notifications
|
|
pushover_urgent:
|
|
type: pushover
|
|
token: your-pushover-app-token
|
|
user: your-pushover-user-key
|
|
|
|
pushover_normal:
|
|
type: pushover
|
|
token: your-pushover-app-token
|
|
user: another-user-key
|
|
|
|
# Mattermost notifications
|
|
mattermost_devops:
|
|
type: mattermost
|
|
host: mattermost.example.com
|
|
token: your-webhook-token
|
|
channel: devops-alerts
|
|
username: heartbeat-bot
|
|
icon: https://example.com/heartbeat-icon.png
|
|
```
|
|
|
|
### Default Notification Channels
|
|
|
|
Specify default channels for hosts that don't have specific channel assignments:
|
|
|
|
```yaml
|
|
default_notification_channels:
|
|
- email_ops
|
|
- mattermost_devops
|
|
```
|
|
|
|
Hosts without `notification_channels` defined will use these defaults.
|
|
|
|
### Per-Host Channel Assignment
|
|
|
|
Assign specific channels to each host in the `hosts` section:
|
|
|
|
```yaml
|
|
hosts:
|
|
# Critical production web server - multiple channels for redundancy
|
|
prod-web-01:
|
|
threshold_config: high_sensitivity
|
|
watch: true
|
|
notification_channels:
|
|
- signal_oncall # Immediate mobile notification
|
|
- pushover_urgent # Secondary mobile notification
|
|
- email_ops # Email for record keeping
|
|
dyndns: false
|
|
|
|
# Database server - ops team notifications only
|
|
prod-db-01:
|
|
threshold_config: database
|
|
watch: true
|
|
notification_channels:
|
|
- signal_ops
|
|
- email_ops
|
|
dyndns: false
|
|
|
|
# Development server - email only, no urgent notifications
|
|
dev-server-01:
|
|
threshold_config: low_sensitivity
|
|
watch: false
|
|
notification_channels:
|
|
- email_devteam
|
|
dyndns: false
|
|
|
|
# Test server - uses default_notification_channels
|
|
test-server-01:
|
|
threshold_config: default
|
|
watch: false
|
|
dyndns: false
|
|
# No notification_channels specified = uses default_notification_channels
|
|
```
|
|
|
|
## Channel Types
|
|
|
|
### Email
|
|
|
|
Sends notifications via SMTP.
|
|
|
|
**Configuration fields:**
|
|
```yaml
|
|
type: email
|
|
recipients: [email1@example.com, email2@example.com] # Required: List of recipients
|
|
sender: heartbeat@example.com # Required: From address
|
|
smtp_server: smtp.example.com # Required: SMTP server hostname
|
|
smtp_port: 587 # Optional: Default 587
|
|
smtp_user: heartbeat@example.com # Optional: For authenticated SMTP
|
|
smtp_password: your-password # Optional: For authenticated SMTP
|
|
```
|
|
|
|
**Features:**
|
|
- Supports multiple recipients
|
|
- TLS/STARTTLS support on port 587
|
|
- Authenticated and unauthenticated SMTP
|
|
|
|
**Example:**
|
|
```yaml
|
|
notification_channels:
|
|
email_critical:
|
|
type: email
|
|
recipients: [admin@example.com, oncall@example.com]
|
|
sender: alerts@example.com
|
|
smtp_server: smtp.fastmail.com
|
|
smtp_port: 587
|
|
smtp_user: alerts@example.com
|
|
smtp_password: app-specific-password
|
|
```
|
|
|
|
### Pushover
|
|
|
|
Sends push notifications to mobile devices via Pushover API.
|
|
|
|
**Configuration fields:**
|
|
```yaml
|
|
type: pushover
|
|
token: your-application-token # Required: Your Pushover app token
|
|
user: your-user-key # Required: Recipient's user key
|
|
```
|
|
|
|
**Features:**
|
|
- Instant mobile push notifications
|
|
- Works on iOS and Android
|
|
- Supports delivery confirmations
|
|
|
|
**Setup:**
|
|
1. Create a Pushover account at https://pushover.net
|
|
2. Create an application to get your app token
|
|
3. Note your user key from your account dashboard
|
|
|
|
**Example:**
|
|
```yaml
|
|
notification_channels:
|
|
pushover_admin:
|
|
type: pushover
|
|
token: azGDORePK8gMaC0QOYAMyEEuzJnyUi
|
|
user: uQiRzpo4DXghDmr9QzzfQu27cmVRsG
|
|
```
|
|
|
|
### Signal
|
|
|
|
Sends notifications via Signal messenger using signal-cli.
|
|
|
|
**Configuration fields:**
|
|
```yaml
|
|
type: signal
|
|
cli_path: /usr/local/bin/signal-cli # Optional: Path to signal-cli binary
|
|
user: +1234567890 # Required: Your Signal phone number
|
|
recipient: +0987654321 # Required: Recipient phone number
|
|
```
|
|
|
|
**Prerequisites:**
|
|
1. Install signal-cli: https://github.com/AsamK/signal-cli
|
|
2. Register signal-cli with your phone number:
|
|
```bash
|
|
signal-cli -u +1234567890 register
|
|
signal-cli -u +1234567890 verify CODE
|
|
```
|
|
3. Ensure signal-cli is in PATH or specify full path in config
|
|
|
|
**Features:**
|
|
- End-to-end encrypted messaging
|
|
- Works without phone being online
|
|
- No API fees or rate limits
|
|
|
|
**Example:**
|
|
```yaml
|
|
notification_channels:
|
|
signal_admin:
|
|
type: signal
|
|
cli_path: /usr/local/bin/signal-cli
|
|
user: +12025551234
|
|
recipient: +12025559999
|
|
```
|
|
|
|
### Mattermost
|
|
|
|
Sends notifications to Mattermost team chat via incoming webhooks.
|
|
|
|
**Configuration fields:**
|
|
```yaml
|
|
type: mattermost
|
|
host: mattermost.example.com # Required: Mattermost server hostname
|
|
token: your-webhook-token # Required: Incoming webhook token
|
|
channel: channel-name # Required: Target channel name
|
|
username: heartbeat-bot # Optional: Bot display name
|
|
icon: https://example.com/icon.png # Optional: Bot icon URL
|
|
```
|
|
|
|
**Prerequisites:**
|
|
1. Enable incoming webhooks in Mattermost
|
|
2. Create an incoming webhook for your team
|
|
3. Note the webhook token from the webhook URL
|
|
|
|
**Features:**
|
|
- Team-wide visibility
|
|
- Rich formatting support
|
|
- Message threading
|
|
|
|
**Example:**
|
|
```yaml
|
|
notification_channels:
|
|
mattermost_ops:
|
|
type: mattermost
|
|
host: chat.example.com
|
|
token: abc123def456ghi789
|
|
channel: infrastructure-alerts
|
|
username: heartbeat-monitor
|
|
icon: https://example.com/heartbeat-icon.png
|
|
```
|
|
|
|
## Notification Events
|
|
|
|
The system sends notifications for various events:
|
|
|
|
### Threshold Alerts
|
|
|
|
When monitored metrics exceed configured thresholds:
|
|
|
|
- **State changes**: OK → WARNING, WARNING → CRITICAL, CRITICAL → OK
|
|
- **Format**: `{LEVEL}: {hostname} - {metric_path} = {value} {threshold_info}`
|
|
- **Example**: `CRITICAL: prod-web-01 - cpu_monitor.cpu_percent = 95.2 (threshold: > 90.0)`
|
|
- **Re-notifications**: Periodic reminders for ongoing alerts (default: hourly)
|
|
|
|
### Heartbeat Events
|
|
|
|
Host lifecycle events:
|
|
|
|
- **Host boot**: `{hostname} booted`
|
|
- **Host shutdown**: `{hostname} {connection_type} shutdown`
|
|
- **Host recovery**: `{hostname} {connection_type} is back`
|
|
- **Connection issues**: `{hostname} {message}`
|
|
- **Host overdue**: `{hostname} {connection_type} overdue`
|
|
|
|
Only hosts with `watch: true` send heartbeat event notifications.
|
|
|
|
### Custom Alerts
|
|
|
|
Application code can send custom notifications:
|
|
|
|
```python
|
|
from hbd.server import notify as notify_mod
|
|
|
|
# Send to host-specific channels
|
|
notify_mod.pushmsg_for_host("prod-web-01", "Custom alert message")
|
|
|
|
# Send using global config
|
|
notify_mod.pushmsg_from_config("Global notification")
|
|
|
|
# Send to specific config
|
|
notify_mod.pushmsg(custom_config_dict, "Targeted notification")
|
|
```
|
|
|
|
## Design Principles
|
|
|
|
The notification system follows these core principles:
|
|
|
|
- **Centralization**: Define notification providers once, reference them by name
|
|
- **Flexibility**: Each host can use different channels for different notification needs
|
|
- **Redundancy**: Critical hosts can specify multiple channels for failover
|
|
- **Clarity**: Clean separation between channel definition and channel assignment
|
|
- **Type Safety**: Provider-specific validation at configuration time
|
|
|
|
## Best Practices
|
|
|
|
### Channel Organization
|
|
|
|
- **Create purpose-specific channels**: `email_ops`, `signal_oncall`, `pushover_urgent`
|
|
- **Separate by team/role**: `email_devteam`, `signal_dbateam`, `mattermost_security`
|
|
- **Use descriptive names**: Channel names appear in logs and debugging
|
|
|
|
### Redundancy
|
|
|
|
For critical hosts, use multiple notification channels:
|
|
|
|
```yaml
|
|
hosts:
|
|
critical-db:
|
|
notification_channels:
|
|
- signal_oncall # Primary: Mobile alert
|
|
- pushover_urgent # Backup: Different mobile platform
|
|
- email_ops # Tertiary: Email for record-keeping
|
|
```
|
|
|
|
### Notification Fatigue Prevention
|
|
|
|
- **Use `watch: false`** for non-critical hosts
|
|
- **Configure appropriate thresholds** to avoid false positives
|
|
- **Set different channels for different severities**
|
|
- **Use `default_notification_channels`** for baseline, add more for critical systems
|
|
|
|
### Security
|
|
|
|
- **Protect credentials**: Use file permissions to protect config files with passwords/tokens
|
|
- **Rotate tokens**: Periodically rotate API tokens and passwords
|
|
- **Use app-specific passwords**: For email, use app-specific passwords instead of main account password
|
|
- **Separate accounts**: Consider separate notification accounts for different environments (prod vs dev)
|
|
|
|
### Testing
|
|
|
|
Test notification channels before relying on them:
|
|
|
|
```bash
|
|
# Test signal-cli directly
|
|
signal-cli -u +1234567890 send -m "Test message" +0987654321
|
|
|
|
# Test SMTP
|
|
echo "Test" | mail -s "Test Subject" admin@example.com
|
|
|
|
# Test through heartbeat system (Python REPL)
|
|
from hbd.server import notify as notify_mod, config as config_mod
|
|
cfg = config_mod.load_config(".hb.yaml")
|
|
notify_mod.setup(cfg)
|
|
notify_mod.pushmsg_for_host("test-host", "Test notification")
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Notifications Not Sending
|
|
|
|
1. **Check logs**: Look for "Failed to send notification" errors
|
|
2. **Verify host is watched**: Ensure `watch: true` in host definition
|
|
3. **Check channel configuration**: Verify credentials and settings
|
|
4. **Test channel directly**: Use command-line tools to test provider
|
|
5. **Check network**: Ensure server can reach notification endpoints
|
|
|
|
### Signal Issues
|
|
|
|
- **signal-cli not found**: Specify full path in `cli_path`
|
|
- **Not registered**: Run `signal-cli -u +NUMBER register` and verify
|
|
- **Trust issues**: Run `signal-cli -u +NUMBER receive` to sync trust store
|
|
- **Recipient not found**: Ensure recipient is in your Signal contacts
|
|
|
|
### Email Issues
|
|
|
|
- **Authentication failed**: Check SMTP username/password
|
|
- **TLS errors**: Verify SMTP port (587 for STARTTLS, 465 for SSL)
|
|
- **Relay denied**: Ensure SMTP server allows relay from your IP
|
|
- **Timeout**: Check firewall rules for SMTP ports
|
|
|
|
### Pushover Issues
|
|
|
|
- **Invalid token/user**: Verify token and user key from Pushover dashboard
|
|
- **API rate limits**: Pushover has monthly message limits on free tier
|
|
- **HTTP errors**: Check Pushover API status page
|
|
|
|
### Mattermost Issues
|
|
|
|
- **Webhook not found**: Verify webhook token and ensure webhook is enabled
|
|
- **Channel not found**: Check channel name spelling and permissions
|
|
- **Driver import error**: Install mattermostdriver: `pip install mattermostdriver`
|
|
|
|
## API Reference
|
|
|
|
### Main Functions
|
|
|
|
#### `pushmsg_for_host(hostname: str, msg: str, debug: int = 0) -> dict`
|
|
|
|
Send notification to host-specific channels.
|
|
|
|
**Parameters:**
|
|
- `hostname`: Name of the host (used to look up notification channels)
|
|
- `msg`: Message to send
|
|
- `debug`: Debug level (0=no debug, 1+=debug output)
|
|
|
|
**Returns:** Dictionary of results per channel: `{"signal_ops": True, "email_ops": False}`
|
|
|
|
**Example:**
|
|
```python
|
|
from hbd.server import notify as notify_mod
|
|
|
|
notify_mod.pushmsg_for_host("prod-web-01", "Server CPU at 95%")
|
|
```
|
|
|
|
**Behavior:**
|
|
1. Looks up notification channels configured for the host
|
|
2. If no host-specific channels, uses `default_notification_channels`
|
|
3. Dispatches to each channel in parallel
|
|
4. Returns dict of results keyed by channel name
|
|
5. Logs success/failure for each channel
|
|
|
|
## Examples
|
|
|
|
### Complete Configuration Example
|
|
|
|
```yaml
|
|
# Notification channel definitions
|
|
notification_channels:
|
|
signal_oncall:
|
|
type: signal
|
|
cli_path: /usr/local/bin/signal-cli
|
|
user: +12025551234
|
|
recipient: +12025555678
|
|
|
|
email_ops:
|
|
type: email
|
|
recipients: [ops@example.com, alerts@example.com]
|
|
sender: heartbeat@example.com
|
|
smtp_server: smtp.fastmail.com
|
|
smtp_port: 587
|
|
smtp_user: heartbeat@example.com
|
|
smtp_password: app-password-here
|
|
|
|
# Default channels
|
|
default_notification_channels: [email_ops]
|
|
|
|
# Host definitions with channel assignments
|
|
hosts:
|
|
prod-web-01:
|
|
threshold_config: high_sensitivity
|
|
watch: true
|
|
notification_channels: [signal_oncall, email_ops]
|
|
dyndns: false
|
|
|
|
dev-server-01:
|
|
threshold_config: low_sensitivity
|
|
watch: false
|
|
notification_channels: [email_ops]
|
|
dyndns: false
|
|
```
|
|
|
|
### Multiple Environments Example
|
|
|
|
```yaml
|
|
notification_channels:
|
|
# Production channels
|
|
signal_prod_oncall:
|
|
type: signal
|
|
user: +12025551234
|
|
recipient: +12025551111 # On-call phone
|
|
|
|
email_prod_ops:
|
|
type: email
|
|
recipients: [prod-ops@example.com]
|
|
sender: prod-heartbeat@example.com
|
|
smtp_server: smtp.example.com
|
|
|
|
# Staging channels
|
|
email_staging:
|
|
type: email
|
|
recipients: [staging-alerts@example.com]
|
|
sender: staging-heartbeat@example.com
|
|
smtp_server: smtp.example.com
|
|
|
|
# Development channels
|
|
mattermost_dev:
|
|
type: mattermost
|
|
host: chat.example.com
|
|
token: dev-webhook-token
|
|
channel: dev-alerts
|
|
|
|
hosts:
|
|
prod-api-01:
|
|
notification_channels: [signal_prod_oncall, email_prod_ops]
|
|
|
|
staging-api-01:
|
|
notification_channels: [email_staging]
|
|
|
|
dev-api-01:
|
|
notification_channels: [mattermost_dev]
|
|
```
|