# Configuration Reload The heartbeat daemon (hbd) supports runtime configuration reloading without requiring a full restart. This allows you to update certain configuration settings while the service continues running. ## How to Reload Configuration Send a SIGHUP signal to the running hbd process: ```bash # Find the process ID ps aux | grep hbd # Or use pidof/pgrep pidof hbd pgrep -f hbd # Send SIGHUP signal kill -HUP # Or if using systemd systemctl reload heartbeat ``` ## What Can Be Reloaded The following configuration sections can be reloaded without restarting: ### ✅ Fully Reloadable - **Notification Channels** (`notification_channels`) - Add, remove, or modify notification channel definitions - Update tokens, API keys, SMTP credentials - Change recipient lists - **Threshold Configurations** (`threshold_configs`) - Modify warning and critical thresholds - Add or remove threshold rules - Change operators and hysteresis values - Update display formats - **Host Configuration** (`hosts`) - Change watch status - Update notification channel assignments - Modify threshold config assignments - Change dyndns status - **Host Lists** - `watchhosts` - hosts to monitor - `dyndnshosts` - hosts with dynamic DNS - `drophosts` - hosts to ignore - **Runtime Settings** - `grace` - grace period multiplier - `interval` - expected heartbeat interval - `threshold_renotify_interval` - re-notification interval - `debug` - debug level - `verbose` - verbose output - **DNS Settings** - `dyndomains` - dynamic DNS domains - `nsupdate_bin` - nsupdate binary path - `rndc_key` - RNDC key path ### ⚠️ Requires Restart The following settings **cannot** be reloaded and require a service restart: - **Network Ports** - `hb_port` - UDP heartbeat port - `hbd_port` - HTTP API port - `ws_port` - WebSocket port - `wss_port` - Secure WebSocket port - **SSL/TLS Settings** - `cert_path` - SSL certificate path - `wss_pem` - SSL certificate file - `wss_key` - SSL key file - **Persistence** - `pickfile` - Pickle file path - **Logging** - `logfile` - Log file path - **Journal Settings** - `journal_enabled` - Enable/disable journaling - `journal_dir` - Journal directory - `journal_file` - Journal filename - `journal_max_size` - Maximum journal size - `journal_max_backups` - Number of backup files ## Reload Process When a SIGHUP signal is received: 1. **Configuration File Loading** - The config file is re-read from disk - YAML parsing is performed - Validation checks are run 2. **Component Updates** - Notification system is updated with new channel definitions - Threshold checker reloads all threshold configurations - Alert states are preserved to maintain hysteresis 3. **Error Handling** - If reload fails, the previous configuration is kept - Error messages are logged - Service continues running with old configuration 4. **Logging** - Reload start and completion are logged - Each component reports its reload status - Total number of thresholds is reported ## Example Reload Session ```bash # Terminal 1: Watch the logs tail -f /var/log/heartbeat.log # Terminal 2: Edit configuration vim /path/to/.hb.yaml # Make changes to notification channels or thresholds # Save the file # Terminal 3: Trigger reload kill -HUP $(pgrep -f hbd) # Terminal 1: See reload messages 2026-04-01 12:34:56 INFO: Received SIGHUP, initiating config reload... 2026-04-01 12:34:56 INFO: ============================================================ 2026-04-01 12:34:56 INFO: Starting configuration reload... 2026-04-01 12:34:56 INFO: ============================================================ 2026-04-01 12:34:56 INFO: Configuration reloaded from /path/to/.hb.yaml 2026-04-01 12:34:56 INFO: Notification configuration reloaded 2026-04-01 12:34:56 INFO: Reloading threshold configuration... 2026-04-01 12:34:56 INFO: Threshold configuration reloaded: 42 total thresholds 2026-04-01 12:34:56 INFO: ============================================================ 2026-04-01 12:34:56 INFO: Configuration reload completed successfully 2026-04-01 12:34:56 INFO: ============================================================ ``` ## Common Use Cases ### 1. Update Notification Credentials If you need to rotate API keys or update SMTP passwords: ```yaml notification_channels: pushover_standard: type: pushover token: new-token-here # Updated user: new-user-key-here # Updated ``` Just edit the config file and send SIGHUP - no restart needed. ### 2. Adjust Threshold Values Fine-tune alerting thresholds based on observed behavior: ```yaml threshold_configs: default: thresholds: cpu_monitor: cpu_percent: warning: 85.0 # Increased from 80.0 critical: 95.0 # Increased from 90.0 ``` Send SIGHUP to apply the new thresholds immediately. ### 3. Add New Notification Channels Add a new notification destination: ```yaml notification_channels: email_oncall: type: email recipients: [oncall@example.com] sender: alerts@example.com smtp_server: smtp.example.com hosts: critical_server: threshold_config: default watch: true notification_channels: [pushover_standard, email_oncall] # Added ``` The new channel becomes active immediately after SIGHUP. ### 4. Update Watch List Start or stop monitoring hosts without restart: ```yaml hosts: new_server: threshold_config: default watch: true # Start watching notification_channels: [pushover_standard] ``` ## Best Practices 1. **Test Configuration Before Reload** - Validate YAML syntax before sending SIGHUP - Check for typos in channel names - Verify threshold values are reasonable 2. **Monitor Reload Logs** - Always check logs after reload to confirm success - Look for error messages if reload fails - Verify expected number of thresholds loaded 3. **Backup Before Changes** - Keep a backup of working configuration - Use version control (git) for config files - Document why changes were made 4. **Gradual Rollout** - Test changes on development server first - Apply to one production server at a time - Verify behavior before applying everywhere 5. **Plan for Restart-Required Changes** - Schedule downtime for port or SSL changes - Use blue-green deployment if possible - Keep service downtime minimal ## Troubleshooting ### Reload Doesn't Apply Changes **Check:** - Is the config file path correct? - Did you save the file after editing? - Are there YAML syntax errors? - Check the logs for error messages **Solution:** ```bash # Validate YAML syntax python -c "import yaml; yaml.safe_load(open('.hb.yaml'))" # Check file modification time ls -l .hb.yaml # View logs journalctl -u heartbeat -f ``` ### Partial Configuration Applied **Cause:** Some sections reloaded, others didn't. **Solution:** Check logs to see which components failed. Common issues: - Invalid channel type - Missing required threshold fields - Invalid host references ### Service Becomes Unresponsive **Cause:** Malformed configuration caused an exception. **Solution:** 1. Revert to backup configuration 2. Send SIGHUP again to reload the good config 3. If service is completely stuck, restart it ## Implementation Details The reload mechanism uses: - **Signal Handling**: SIGHUP triggers reload event - **Async-Safe Reloading**: Configuration is loaded asynchronously - **Component Coordination**: All affected components are updated atomically - **State Preservation**: Alert states and hysteresis information are maintained - **Error Recovery**: Failed reloads don't affect running configuration ## See Also - [NOTIFICATIONS.md](NOTIFICATIONS.md) - Notification channel configuration - [THRESHOLD_ALERTING.md](THRESHOLD_ALERTING.md) - Threshold configuration details - Configuration examples in `hbd/config_*.yaml`