Files
heartbeat/docs/CONFIG_RELOAD.md

7.8 KiB

Configuration Reload

The heartbeat daemon (hbd) supports runtime configuration reloading without requiring a full restart. This allows you to update certain configuration settings while the service continues running.

How to Reload Configuration

Send a SIGHUP signal to the running hbd process:

# Find the process ID
ps aux | grep hbd

# Or use pidof/pgrep
pidof hbd
pgrep -f hbd

# Send SIGHUP signal
kill -HUP <pid>

# Or if using systemd
systemctl reload heartbeat

What Can Be Reloaded

The following configuration sections can be reloaded without restarting:

Fully Reloadable

  • Notification Channels (notification_channels)

    • Add, remove, or modify notification channel definitions
    • Update tokens, API keys, SMTP credentials
    • Change recipient lists
  • Threshold Configurations (threshold_configs)

    • Modify warning and critical thresholds
    • Add or remove threshold rules
    • Change operators and hysteresis values
    • Update display formats
  • Host Configuration (hosts)

    • Change watch status
    • Update notification channel assignments
    • Modify threshold config assignments
    • Change dyndns status
  • Host Lists

    • watchhosts - hosts to monitor
    • dyndnshosts - hosts with dynamic DNS
    • drophosts - hosts to ignore
  • Runtime Settings

    • grace - grace period multiplier
    • interval - expected heartbeat interval
    • threshold_renotify_interval - re-notification interval
    • debug - debug level
    • verbose - verbose output
  • DNS Settings

    • dyndomains - dynamic DNS domains
    • nsupdate_bin - nsupdate binary path
    • rndc_key - RNDC key path

⚠️ Requires Restart

The following settings cannot be reloaded and require a service restart:

  • Network Ports

    • hb_port - UDP heartbeat port
    • hbd_port - HTTP API port
    • ws_port - WebSocket port
    • wss_port - Secure WebSocket port
  • SSL/TLS Settings

    • cert_path - SSL certificate path
    • wss_pem - SSL certificate file
    • wss_key - SSL key file
  • Persistence

    • pickfile - Pickle file path
  • Logging

    • logfile - Log file path
  • Journal Settings

    • journal_enabled - Enable/disable journaling
    • journal_dir - Journal directory
    • journal_file - Journal filename
    • journal_max_size - Maximum journal size
    • journal_max_backups - Number of backup files

Reload Process

When a SIGHUP signal is received:

  1. Configuration File Loading

    • The config file is re-read from disk
    • YAML parsing is performed
    • Validation checks are run
  2. Component Updates

    • Notification system is updated with new channel definitions
    • Threshold checker reloads all threshold configurations
    • Alert states are preserved to maintain hysteresis
  3. Error Handling

    • If reload fails, the previous configuration is kept
    • Error messages are logged
    • Service continues running with old configuration
  4. Logging

    • Reload start and completion are logged
    • Each component reports its reload status
    • Total number of thresholds is reported

Example Reload Session

# Terminal 1: Watch the logs
tail -f /var/log/heartbeat.log

# Terminal 2: Edit configuration
vim /path/to/.hb.yaml

# Make changes to notification channels or thresholds
# Save the file

# Terminal 3: Trigger reload
kill -HUP $(pgrep -f hbd)

# Terminal 1: See reload messages
2026-04-01 12:34:56 INFO: Received SIGHUP, initiating config reload...
2026-04-01 12:34:56 INFO: ============================================================
2026-04-01 12:34:56 INFO: Starting configuration reload...
2026-04-01 12:34:56 INFO: ============================================================
2026-04-01 12:34:56 INFO: Configuration reloaded from /path/to/.hb.yaml
2026-04-01 12:34:56 INFO: Notification configuration reloaded
2026-04-01 12:34:56 INFO: Reloading threshold configuration...
2026-04-01 12:34:56 INFO: Threshold configuration reloaded: 42 total thresholds
2026-04-01 12:34:56 INFO: ============================================================
2026-04-01 12:34:56 INFO: Configuration reload completed successfully
2026-04-01 12:34:56 INFO: ============================================================

Common Use Cases

1. Update Notification Credentials

If you need to rotate API keys or update SMTP passwords:

notification_channels:
  pushover_standard:
    type: pushover
    token: new-token-here    # Updated
    user: new-user-key-here  # Updated

Just edit the config file and send SIGHUP - no restart needed.

2. Adjust Threshold Values

Fine-tune alerting thresholds based on observed behavior:

threshold_configs:
  default:
    thresholds:
      cpu_monitor:
        cpu_percent:
          warning: 85.0   # Increased from 80.0
          critical: 95.0  # Increased from 90.0

Send SIGHUP to apply the new thresholds immediately.

3. Add New Notification Channels

Add a new notification destination:

notification_channels:
  email_oncall:
    type: email
    recipients: [oncall@example.com]
    sender: alerts@example.com
    smtp_server: smtp.example.com

hosts:
  critical_server:
    threshold_config: default
    watch: true
    notification_channels: [pushover_standard, email_oncall]  # Added

The new channel becomes active immediately after SIGHUP.

4. Update Watch List

Start or stop monitoring hosts without restart:

hosts:
  new_server:
    threshold_config: default
    watch: true           # Start watching
    notification_channels: [pushover_standard]

Best Practices

  1. Test Configuration Before Reload

    • Validate YAML syntax before sending SIGHUP
    • Check for typos in channel names
    • Verify threshold values are reasonable
  2. Monitor Reload Logs

    • Always check logs after reload to confirm success
    • Look for error messages if reload fails
    • Verify expected number of thresholds loaded
  3. Backup Before Changes

    • Keep a backup of working configuration
    • Use version control (git) for config files
    • Document why changes were made
  4. Gradual Rollout

    • Test changes on development server first
    • Apply to one production server at a time
    • Verify behavior before applying everywhere
  5. Plan for Restart-Required Changes

    • Schedule downtime for port or SSL changes
    • Use blue-green deployment if possible
    • Keep service downtime minimal

Troubleshooting

Reload Doesn't Apply Changes

Check:

  • Is the config file path correct?
  • Did you save the file after editing?
  • Are there YAML syntax errors?
  • Check the logs for error messages

Solution:

# Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('.hb.yaml'))"

# Check file modification time
ls -l .hb.yaml

# View logs
journalctl -u heartbeat -f

Partial Configuration Applied

Cause: Some sections reloaded, others didn't.

Solution: Check logs to see which components failed. Common issues:

  • Invalid channel type
  • Missing required threshold fields
  • Invalid host references

Service Becomes Unresponsive

Cause: Malformed configuration caused an exception.

Solution:

  1. Revert to backup configuration
  2. Send SIGHUP again to reload the good config
  3. If service is completely stuck, restart it

Implementation Details

The reload mechanism uses:

  • Signal Handling: SIGHUP triggers reload event
  • Async-Safe Reloading: Configuration is loaded asynchronously
  • Component Coordination: All affected components are updated atomically
  • State Preservation: Alert states and hysteresis information are maintained
  • Error Recovery: Failed reloads don't affect running configuration

See Also