7.8 KiB
Configuration Reload
The heartbeat daemon (hbd) supports runtime configuration reloading without requiring a full restart. This allows you to update certain configuration settings while the service continues running.
How to Reload Configuration
Send a SIGHUP signal to the running hbd process:
# Find the process ID
ps aux | grep hbd
# Or use pidof/pgrep
pidof hbd
pgrep -f hbd
# Send SIGHUP signal
kill -HUP <pid>
# Or if using systemd
systemctl reload heartbeat
What Can Be Reloaded
The following configuration sections can be reloaded without restarting:
✅ Fully Reloadable
-
Notification Channels (
notification_channels)- Add, remove, or modify notification channel definitions
- Update tokens, API keys, SMTP credentials
- Change recipient lists
-
Threshold Configurations (
threshold_configs)- Modify warning and critical thresholds
- Add or remove threshold rules
- Change operators and hysteresis values
- Update display formats
-
Host Configuration (
hosts)- Change watch status
- Update notification channel assignments
- Modify threshold config assignments
- Change dyndns status
-
Host Lists
watchhosts- hosts to monitordyndnshosts- hosts with dynamic DNSdrophosts- hosts to ignore
-
Runtime Settings
grace- grace period multiplierinterval- expected heartbeat intervalthreshold_renotify_interval- re-notification intervaldebug- debug levelverbose- verbose output
-
DNS Settings
dyndomains- dynamic DNS domainsnsupdate_bin- nsupdate binary pathrndc_key- RNDC key path
⚠️ Requires Restart
The following settings cannot be reloaded and require a service restart:
-
Network Ports
hb_port- UDP heartbeat porthbd_port- HTTP API portws_port- WebSocket portwss_port- Secure WebSocket port
-
SSL/TLS Settings
cert_path- SSL certificate pathwss_pem- SSL certificate filewss_key- SSL key file
-
Persistence
pickfile- Pickle file path
-
Logging
logfile- Log file pathlogfmt- Log format
-
Journal Settings
journal_enabled- Enable/disable journalingjournal_dir- Journal directoryjournal_file- Journal filenamejournal_max_size- Maximum journal sizejournal_max_backups- Number of backup files
Reload Process
When a SIGHUP signal is received:
-
Configuration File Loading
- The config file is re-read from disk
- YAML parsing is performed
- Validation checks are run
-
Component Updates
- Notification system is updated with new channel definitions
- Threshold checker reloads all threshold configurations
- Alert states are preserved to maintain hysteresis
-
Error Handling
- If reload fails, the previous configuration is kept
- Error messages are logged
- Service continues running with old configuration
-
Logging
- Reload start and completion are logged
- Each component reports its reload status
- Total number of thresholds is reported
Example Reload Session
# Terminal 1: Watch the logs
tail -f /var/log/heartbeat.log
# Terminal 2: Edit configuration
vim /path/to/.hb.yaml
# Make changes to notification channels or thresholds
# Save the file
# Terminal 3: Trigger reload
kill -HUP $(pgrep -f hbd)
# Terminal 1: See reload messages
2026-04-01 12:34:56 INFO: Received SIGHUP, initiating config reload...
2026-04-01 12:34:56 INFO: ============================================================
2026-04-01 12:34:56 INFO: Starting configuration reload...
2026-04-01 12:34:56 INFO: ============================================================
2026-04-01 12:34:56 INFO: Configuration reloaded from /path/to/.hb.yaml
2026-04-01 12:34:56 INFO: Notification configuration reloaded
2026-04-01 12:34:56 INFO: Reloading threshold configuration...
2026-04-01 12:34:56 INFO: Threshold configuration reloaded: 42 total thresholds
2026-04-01 12:34:56 INFO: ============================================================
2026-04-01 12:34:56 INFO: Configuration reload completed successfully
2026-04-01 12:34:56 INFO: ============================================================
Common Use Cases
1. Update Notification Credentials
If you need to rotate API keys or update SMTP passwords:
notification_channels:
pushover_standard:
type: pushover
token: new-token-here # Updated
user: new-user-key-here # Updated
Just edit the config file and send SIGHUP - no restart needed.
2. Adjust Threshold Values
Fine-tune alerting thresholds based on observed behavior:
threshold_configs:
default:
thresholds:
cpu_monitor:
cpu_percent:
warning: 85.0 # Increased from 80.0
critical: 95.0 # Increased from 90.0
Send SIGHUP to apply the new thresholds immediately.
3. Add New Notification Channels
Add a new notification destination:
notification_channels:
email_oncall:
type: email
recipients: [oncall@example.com]
sender: alerts@example.com
smtp_server: smtp.example.com
hosts:
critical_server:
threshold_config: default
watch: true
notification_channels: [pushover_standard, email_oncall] # Added
The new channel becomes active immediately after SIGHUP.
4. Update Watch List
Start or stop monitoring hosts without restart:
hosts:
new_server:
threshold_config: default
watch: true # Start watching
notification_channels: [pushover_standard]
Best Practices
-
Test Configuration Before Reload
- Validate YAML syntax before sending SIGHUP
- Check for typos in channel names
- Verify threshold values are reasonable
-
Monitor Reload Logs
- Always check logs after reload to confirm success
- Look for error messages if reload fails
- Verify expected number of thresholds loaded
-
Backup Before Changes
- Keep a backup of working configuration
- Use version control (git) for config files
- Document why changes were made
-
Gradual Rollout
- Test changes on development server first
- Apply to one production server at a time
- Verify behavior before applying everywhere
-
Plan for Restart-Required Changes
- Schedule downtime for port or SSL changes
- Use blue-green deployment if possible
- Keep service downtime minimal
Troubleshooting
Reload Doesn't Apply Changes
Check:
- Is the config file path correct?
- Did you save the file after editing?
- Are there YAML syntax errors?
- Check the logs for error messages
Solution:
# Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('.hb.yaml'))"
# Check file modification time
ls -l .hb.yaml
# View logs
journalctl -u heartbeat -f
Partial Configuration Applied
Cause: Some sections reloaded, others didn't.
Solution: Check logs to see which components failed. Common issues:
- Invalid channel type
- Missing required threshold fields
- Invalid host references
Service Becomes Unresponsive
Cause: Malformed configuration caused an exception.
Solution:
- Revert to backup configuration
- Send SIGHUP again to reload the good config
- If service is completely stuck, restart it
Implementation Details
The reload mechanism uses:
- Signal Handling: SIGHUP triggers reload event
- Async-Safe Reloading: Configuration is loaded asynchronously
- Component Coordination: All affected components are updated atomically
- State Preservation: Alert states and hysteresis information are maintained
- Error Recovery: Failed reloads don't affect running configuration
See Also
- NOTIFICATIONS.md - Notification channel configuration
- THRESHOLD_ALERTING.md - Threshold configuration details
- Configuration examples in
hbd/config_*.yaml