Fix rtt, including bug in time compute

This commit is contained in:
Andreas Wrede
2026-04-01 19:41:53 -04:00
parent 090d341244
commit 460d2be9e9
13 changed files with 1366 additions and 372 deletions
+533
View File
@@ -0,0 +1,533 @@
# Notification System
## Overview
The Heartbeat Monitoring System includes a flexible notification system that can send alerts through multiple channels including Email, Pushover, Signal, and Mattermost. The system supports centralized channel definitions with per-host routing, allowing fine-grained control over notification delivery.
## Architecture
### Components
1. **Notification Channels** (`notification_channels` in config)
- Centralized definitions of notification providers
- Each channel has a type and type-specific credentials
- Reusable across multiple hosts
2. **Channel Dispatcher** (`hbd/server/notify.py`)
- `pushmsg_for_host(hostname, message)`: Main entry point for host-specific notifications
- `_dispatch_to_channel(channel_name, channel_config, message)`: Routes to specific provider
- Provider functions: `pushover()`, `pushsignal()`, `pushmattermost()`, `send_email()`
3. **Configuration Utilities** (`hbd/server/config.py`)
- `get_notification_channels_for_host(config, hostname)`: Retrieves channel names for a host
- `get_notification_channels_config(config, hostname)`: Retrieves full channel configurations
- `get_channel_config(config, channel_name)`: Gets configuration for a specific channel
4. **Integration Points**
- **Threshold alerts**: `threshold.py` calls `notify_mod.pushmsg_for_host()`
- **Heartbeat events**: `udp.py` calls `notify_mod.pushmsg_for_host()` for boot/shutdown/overdue
- **Custom alerts**: Any code can call `notify_mod.pushmsg_for_host(hostname, message)`
## Configuration
### Centralized Channel Definitions
Define notification channels once in your configuration file:
```yaml
notification_channels:
# Signal notifications
signal_ops:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +1234567890 # Your Signal number
recipient: +1234567890 # Recipient number
signal_oncall:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +1234567890
recipient: +0987654321 # Different recipient
# Email notifications
email_ops:
type: email
recipients:
- ops@example.com
- alerts@example.com
sender: heartbeat@example.com
smtp_server: smtp.example.com
smtp_port: 587
smtp_user: heartbeat@example.com
smtp_password: your-smtp-password
email_devteam:
type: email
recipients: [dev-alerts@example.com]
sender: heartbeat-dev@example.com
smtp_server: smtp.example.com
smtp_port: 587
smtp_user: heartbeat-dev@example.com
smtp_password: your-smtp-password
# Pushover notifications
pushover_urgent:
type: pushover
token: your-pushover-app-token
user: your-pushover-user-key
pushover_normal:
type: pushover
token: your-pushover-app-token
user: another-user-key
# Mattermost notifications
mattermost_devops:
type: mattermost
host: mattermost.example.com
token: your-webhook-token
channel: devops-alerts
username: heartbeat-bot
icon: https://example.com/heartbeat-icon.png
```
### Default Notification Channels
Specify default channels for hosts that don't have specific channel assignments:
```yaml
default_notification_channels:
- email_ops
- mattermost_devops
```
Hosts without `notification_channels` defined will use these defaults.
### Per-Host Channel Assignment
Assign specific channels to each host in the `hosts` section:
```yaml
hosts:
# Critical production web server - multiple channels for redundancy
prod-web-01:
threshold_config: high_sensitivity
watch: true
notification_channels:
- signal_oncall # Immediate mobile notification
- pushover_urgent # Secondary mobile notification
- email_ops # Email for record keeping
dyndns: false
# Database server - ops team notifications only
prod-db-01:
threshold_config: database
watch: true
notification_channels:
- signal_ops
- email_ops
dyndns: false
# Development server - email only, no urgent notifications
dev-server-01:
threshold_config: low_sensitivity
watch: false
notification_channels:
- email_devteam
dyndns: false
# Test server - uses default_notification_channels
test-server-01:
threshold_config: default
watch: false
dyndns: false
# No notification_channels specified = uses default_notification_channels
```
## Channel Types
### Email
Sends notifications via SMTP.
**Configuration fields:**
```yaml
type: email
recipients: [email1@example.com, email2@example.com] # Required: List of recipients
sender: heartbeat@example.com # Required: From address
smtp_server: smtp.example.com # Required: SMTP server hostname
smtp_port: 587 # Optional: Default 587
smtp_user: heartbeat@example.com # Optional: For authenticated SMTP
smtp_password: your-password # Optional: For authenticated SMTP
```
**Features:**
- Supports multiple recipients
- TLS/STARTTLS support on port 587
- Authenticated and unauthenticated SMTP
**Example:**
```yaml
notification_channels:
email_critical:
type: email
recipients: [admin@example.com, oncall@example.com]
sender: alerts@example.com
smtp_server: smtp.fastmail.com
smtp_port: 587
smtp_user: alerts@example.com
smtp_password: app-specific-password
```
### Pushover
Sends push notifications to mobile devices via Pushover API.
**Configuration fields:**
```yaml
type: pushover
token: your-application-token # Required: Your Pushover app token
user: your-user-key # Required: Recipient's user key
```
**Features:**
- Instant mobile push notifications
- Works on iOS and Android
- Supports delivery confirmations
**Setup:**
1. Create a Pushover account at https://pushover.net
2. Create an application to get your app token
3. Note your user key from your account dashboard
**Example:**
```yaml
notification_channels:
pushover_admin:
type: pushover
token: azGDORePK8gMaC0QOYAMyEEuzJnyUi
user: uQiRzpo4DXghDmr9QzzfQu27cmVRsG
```
### Signal
Sends notifications via Signal messenger using signal-cli.
**Configuration fields:**
```yaml
type: signal
cli_path: /usr/local/bin/signal-cli # Optional: Path to signal-cli binary
user: +1234567890 # Required: Your Signal phone number
recipient: +0987654321 # Required: Recipient phone number
```
**Prerequisites:**
1. Install signal-cli: https://github.com/AsamK/signal-cli
2. Register signal-cli with your phone number:
```bash
signal-cli -u +1234567890 register
signal-cli -u +1234567890 verify CODE
```
3. Ensure signal-cli is in PATH or specify full path in config
**Features:**
- End-to-end encrypted messaging
- Works without phone being online
- No API fees or rate limits
**Example:**
```yaml
notification_channels:
signal_admin:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +12025551234
recipient: +12025559999
```
### Mattermost
Sends notifications to Mattermost team chat via incoming webhooks.
**Configuration fields:**
```yaml
type: mattermost
host: mattermost.example.com # Required: Mattermost server hostname
token: your-webhook-token # Required: Incoming webhook token
channel: channel-name # Required: Target channel name
username: heartbeat-bot # Optional: Bot display name
icon: https://example.com/icon.png # Optional: Bot icon URL
```
**Prerequisites:**
1. Enable incoming webhooks in Mattermost
2. Create an incoming webhook for your team
3. Note the webhook token from the webhook URL
**Features:**
- Team-wide visibility
- Rich formatting support
- Message threading
**Example:**
```yaml
notification_channels:
mattermost_ops:
type: mattermost
host: chat.example.com
token: abc123def456ghi789
channel: infrastructure-alerts
username: heartbeat-monitor
icon: https://example.com/heartbeat-icon.png
```
## Notification Events
The system sends notifications for various events:
### Threshold Alerts
When monitored metrics exceed configured thresholds:
- **State changes**: OK → WARNING, WARNING → CRITICAL, CRITICAL → OK
- **Format**: `{LEVEL}: {hostname} - {metric_path} = {value} {threshold_info}`
- **Example**: `CRITICAL: prod-web-01 - cpu_monitor.cpu_percent = 95.2 (threshold: > 90.0)`
- **Re-notifications**: Periodic reminders for ongoing alerts (default: hourly)
### Heartbeat Events
Host lifecycle events:
- **Host boot**: `{hostname} booted`
- **Host shutdown**: `{hostname} {connection_type} shutdown`
- **Host recovery**: `{hostname} {connection_type} is back`
- **Connection issues**: `{hostname} {message}`
- **Host overdue**: `{hostname} {connection_type} overdue`
Only hosts with `watch: true` send heartbeat event notifications.
### Custom Alerts
Application code can send custom notifications:
```python
from hbd.server import notify as notify_mod
# Send to host-specific channels
notify_mod.pushmsg_for_host("prod-web-01", "Custom alert message")
# Send using global config
notify_mod.pushmsg_from_config("Global notification")
# Send to specific config
notify_mod.pushmsg(custom_config_dict, "Targeted notification")
```
## Design Principles
The notification system follows these core principles:
- **Centralization**: Define notification providers once, reference them by name
- **Flexibility**: Each host can use different channels for different notification needs
- **Redundancy**: Critical hosts can specify multiple channels for failover
- **Clarity**: Clean separation between channel definition and channel assignment
- **Type Safety**: Provider-specific validation at configuration time
## Best Practices
### Channel Organization
- **Create purpose-specific channels**: `email_ops`, `signal_oncall`, `pushover_urgent`
- **Separate by team/role**: `email_devteam`, `signal_dbateam`, `mattermost_security`
- **Use descriptive names**: Channel names appear in logs and debugging
### Redundancy
For critical hosts, use multiple notification channels:
```yaml
hosts:
critical-db:
notification_channels:
- signal_oncall # Primary: Mobile alert
- pushover_urgent # Backup: Different mobile platform
- email_ops # Tertiary: Email for record-keeping
```
### Notification Fatigue Prevention
- **Use `watch: false`** for non-critical hosts
- **Configure appropriate thresholds** to avoid false positives
- **Set different channels for different severities**
- **Use `default_notification_channels`** for baseline, add more for critical systems
### Security
- **Protect credentials**: Use file permissions to protect config files with passwords/tokens
- **Rotate tokens**: Periodically rotate API tokens and passwords
- **Use app-specific passwords**: For email, use app-specific passwords instead of main account password
- **Separate accounts**: Consider separate notification accounts for different environments (prod vs dev)
### Testing
Test notification channels before relying on them:
```bash
# Test signal-cli directly
signal-cli -u +1234567890 send -m "Test message" +0987654321
# Test SMTP
echo "Test" | mail -s "Test Subject" admin@example.com
# Test through heartbeat system (Python REPL)
from hbd.server import notify as notify_mod, config as config_mod
cfg = config_mod.load_config(".hb.yaml")
notify_mod.setup(cfg)
notify_mod.pushmsg_for_host("test-host", "Test notification")
```
## Troubleshooting
### Notifications Not Sending
1. **Check logs**: Look for "Failed to send notification" errors
2. **Verify host is watched**: Ensure `watch: true` in host definition
3. **Check channel configuration**: Verify credentials and settings
4. **Test channel directly**: Use command-line tools to test provider
5. **Check network**: Ensure server can reach notification endpoints
### Signal Issues
- **signal-cli not found**: Specify full path in `cli_path`
- **Not registered**: Run `signal-cli -u +NUMBER register` and verify
- **Trust issues**: Run `signal-cli -u +NUMBER receive` to sync trust store
- **Recipient not found**: Ensure recipient is in your Signal contacts
### Email Issues
- **Authentication failed**: Check SMTP username/password
- **TLS errors**: Verify SMTP port (587 for STARTTLS, 465 for SSL)
- **Relay denied**: Ensure SMTP server allows relay from your IP
- **Timeout**: Check firewall rules for SMTP ports
### Pushover Issues
- **Invalid token/user**: Verify token and user key from Pushover dashboard
- **API rate limits**: Pushover has monthly message limits on free tier
- **HTTP errors**: Check Pushover API status page
### Mattermost Issues
- **Webhook not found**: Verify webhook token and ensure webhook is enabled
- **Channel not found**: Check channel name spelling and permissions
- **Driver import error**: Install mattermostdriver: `pip install mattermostdriver`
## API Reference
### Main Functions
#### `pushmsg_for_host(hostname: str, msg: str, debug: int = 0) -> dict`
Send notification to host-specific channels.
**Parameters:**
- `hostname`: Name of the host (used to look up notification channels)
- `msg`: Message to send
- `debug`: Debug level (0=no debug, 1+=debug output)
**Returns:** Dictionary of results per channel: `{"signal_ops": True, "email_ops": False}`
**Example:**
```python
from hbd.server import notify as notify_mod
notify_mod.pushmsg_for_host("prod-web-01", "Server CPU at 95%")
```
**Behavior:**
1. Looks up notification channels configured for the host
2. If no host-specific channels, uses `default_notification_channels`
3. Dispatches to each channel in parallel
4. Returns dict of results keyed by channel name
5. Logs success/failure for each channel
## Examples
### Complete Configuration Example
```yaml
# Notification channel definitions
notification_channels:
signal_oncall:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +12025551234
recipient: +12025555678
email_ops:
type: email
recipients: [ops@example.com, alerts@example.com]
sender: heartbeat@example.com
smtp_server: smtp.fastmail.com
smtp_port: 587
smtp_user: heartbeat@example.com
smtp_password: app-password-here
# Default channels
default_notification_channels: [email_ops]
# Host definitions with channel assignments
hosts:
prod-web-01:
threshold_config: high_sensitivity
watch: true
notification_channels: [signal_oncall, email_ops]
dyndns: false
dev-server-01:
threshold_config: low_sensitivity
watch: false
notification_channels: [email_ops]
dyndns: false
```
### Multiple Environments Example
```yaml
notification_channels:
# Production channels
signal_prod_oncall:
type: signal
user: +12025551234
recipient: +12025551111 # On-call phone
email_prod_ops:
type: email
recipients: [prod-ops@example.com]
sender: prod-heartbeat@example.com
smtp_server: smtp.example.com
# Staging channels
email_staging:
type: email
recipients: [staging-alerts@example.com]
sender: staging-heartbeat@example.com
smtp_server: smtp.example.com
# Development channels
mattermost_dev:
type: mattermost
host: chat.example.com
token: dev-webhook-token
channel: dev-alerts
hosts:
prod-api-01:
notification_channels: [signal_prod_oncall, email_prod_ops]
staging-api-01:
notification_channels: [email_staging]
dev-api-01:
notification_channels: [mattermost_dev]
```
+90 -22
View File
@@ -335,43 +335,111 @@ threshold_renotify_interval: 3600 # Re-notify every hour for ongoing alerts
### Notification Channels
Thresholds use the same notification infrastructure as heartbeat monitoring:
The system supports centralized notification channel definitions, allowing different hosts to use different notification providers and credentials. This provides fine-grained control over who gets notified about what.
#### Supported Channel Types
- **Email** (via SMTP)
- **Pushover** (mobile notifications)
- **Mattermost** (team chat)
- **Custom webhooks**
- **Signal** (via signal-cli)
- **Mattermost** (team chat webhooks)
Configuration:
#### Centralized Channel Configuration
Define notification channels once in the configuration file:
```yaml
# Email
toemail:
- admin@example.com
- oncall@example.com
fromemail: heartbeat@example.com
smtpserver: smtp.example.com
smtpport: 587
smtpuser: heartbeat@example.com
smtppassword: your-password
notification_channels:
# Signal notifications
signal_ops:
type: signal
cli_path: /usr/local/bin/signal-cli
user: +1234567890
recipient: +1234567890
# Email notifications
email_ops:
type: email
recipients: [ops@example.com, alerts@example.com]
sender: heartbeat@example.com
smtp_server: smtp.example.com
smtp_port: 587
smtp_user: heartbeat@example.com
smtp_password: your-smtp-password
# Pushover notifications
pushover_urgent:
type: pushover
token: your-pushover-app-token
user: your-pushover-user-key
# Mattermost notifications
mattermost_devops:
type: mattermost
host: mattermost.example.com
token: your-webhook-token
channel: devops-alerts
username: heartbeat-bot
icon: https://example.com/heartbeat-icon.png
# Pushover
pushover_token: your-app-token
pushover_user: your-user-key
# Default channels for hosts that don't specify channels
default_notification_channels: [email_ops]
```
#### Per-Host Channel Assignment
Assign notification channels to specific hosts in the `hosts` section:
```yaml
hosts:
# Critical server - multiple notification channels
prod-web-01:
threshold_config: high_sensitivity
watch: true
notification_channels: [signal_ops, pushover_urgent, email_ops]
dyndns: false
# Database server - ops team only
prod-db-01:
threshold_config: database
watch: true
notification_channels: [signal_ops, email_ops]
dyndns: false
# Development server - email only
dev-server-01:
threshold_config: low_sensitivity
watch: false
notification_channels: [email_ops]
dyndns: false
# Uses default_notification_channels if not specified
test-server-01:
threshold_config: default
watch: false
dyndns: false
```
### Watched Hosts
Only hosts in the `watchhosts` list will trigger notifications:
Only hosts with `watch: true` in the `hosts` section will trigger notifications:
```yaml
watchhosts:
- webserver01
- database01
- mailserver
hosts:
webserver01:
watch: true
notification_channels: [email_ops]
database01:
watch: true
notification_channels: [signal_ops, email_ops]
mailserver:
watch: true
notification_channels: [pushover_urgent]
```
Hosts not in this list will still have thresholds checked and alert states tracked, but won't send notifications.
Hosts not marked for watching will still have thresholds checked and alert states tracked, but won't send notifications.
## Alert State Tracking