Files
heartbeat/docs/THRESHOLD_ALERTING.md
T
andreas a534c06b26 feat: nagios operator for direct exit-code severity mapping
Add ComparisonOperator.NAGIOS ("nagios") that maps Nagios exit codes
directly to alert levels (0=OK 1=WARNING 2=CRITICAL 3=UNKNOWN) without
requiring numeric warning/critical thresholds. Hysteresis is bypassed for
discrete codes. Display template defaults to "{check_name}: {output}".
_format_display() handles None threshold_value gracefully.

Add nagios_runner.status_code as a built-in default threshold config so
nagios checks alert out of the box.

Also: fix alerts.html scrolling (override html,body), make hostname a link
to /plugins#<hostname>, remove overall_status/overall_status_code/plugin_count
from nagios_runner and hbc_mini, replace with computed worst-status in
plugins.html via nagiosWorstStatus() helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 12:26:56 -04:00

26 KiB

Threshold Alerting System

Overview

The Heartbeat Monitoring System includes a comprehensive threshold alerting system that monitors plugin metrics and triggers notifications when values exceed configured thresholds. This system is designed to:

  • Detect anomalies: Automatically identify when system metrics exceed safe operating ranges
  • Prevent alert fatigue: Use hysteresis to prevent notification flapping
  • Escalate appropriately: Support WARNING and CRITICAL severity levels
  • Track state: Maintain alert history and state transitions per host
  • Integrate seamlessly: Work with existing notification infrastructure (email, pushover, etc.)

Architecture

Components

  1. ThresholdChecker (hbd/threshold.py)

    • Main threshold checking engine
    • Parses configuration
    • Evaluates metrics against thresholds
    • Triggers notifications on state changes
  2. ThresholdConfig

    • Individual threshold configuration
    • Supports multiple comparison operators
    • Implements hysteresis logic
  3. AlertState

    • Tracks current alert state per metric
    • Records state transitions
    • Manages notification timing
  4. Integration Points

    • UDP handler: Checks thresholds when plugin data arrives
    • Host objects: Store alert states per host
    • Notification system: Sends alerts via configured channels

Alert Levels

  • OK: Metric is within normal range
  • WARNING: Metric has exceeded warning threshold (first-level concern)
  • CRITICAL: Metric has exceeded critical threshold (requires immediate attention)
  • UNKNOWN: Metric value cannot be evaluated (e.g., non-numeric data)

Configuration

Basic Structure

Thresholds are configured in the YAML configuration file under the thresholds section:

thresholds:
  plugin_name:
    metric_name:
      warning: 80.0
      critical: 90.0
      operator: ">"
      hysteresis: 0.1
      display: "display format"
      enabled: true

Configuration Parameters

Required Parameters

  • warning: Warning threshold value (numeric)
  • critical: Critical threshold value (numeric)

Note: At least one of warning or critical must be specified.

Optional Parameters

  • operator: Comparison operator (default: ">")

    • ">" - Greater than
    • ">=" - Greater than or equal
    • "<" - Less than
    • "<=" - Less than or equal
    • "==" - Equal to
    • "!=" - Not equal to
  • hysteresis: Hysteresis percentage to prevent flapping (default: 0.1 = 10%)

    • Range: 0.0 to 1.0
    • Prevents rapid state transitions when value hovers near threshold
  • display: f-string to hold the display format for alert messages

    • defaults to "(threshold: {op_symbol} {threshold_value})"
  • enabled: Whether this threshold is active (default: true)

Comparison Operators

Greater Than (>, >=)

Used for metrics where higher values are problematic:

cpu_monitor:
  cpu_percent:
    warning: 80.0      # Alert when CPU > 80%
    critical: 90.0     # Alert when CPU > 90%
    operator: ">"

Examples:

  • CPU usage percentage
  • Memory usage percentage
  • Disk usage percentage
  • Load average
  • Error counters

Less Than (<, <=)

Used for metrics where lower values are problematic:

memory_monitor:
  available_mb:
    warning: 1000      # Alert when available memory < 1GB
    critical: 500      # Alert when available memory < 500MB
    operator: "<"

Examples:

  • Available memory
  • Free disk space
  • Connection pool availability
  • Battery level

Hysteresis

Hysteresis prevents alert flapping by requiring values to improve by a certain amount before recovering from an alert state.

How It Works

When a metric crosses a threshold (e.g., CPU goes from 85% to 91%, triggering CRITICAL), hysteresis is applied when the value improves:

Threshold: 90
Hysteresis: 0.1 (10%)
Recovery threshold: 90 - (90 * 0.1) = 81

Value 91 -> CRITICAL (threshold crossed)
Value 89 -> CRITICAL (still above recovery threshold of 81)
Value 85 -> CRITICAL (still above recovery threshold)
Value 80 -> WARNING or OK (below recovery threshold, re-evaluated normally)

Configuration Recommendations

  • Stable metrics (CPU, memory): 10-15% hysteresis

    hysteresis: 0.1
    
  • Very stable metrics (disk usage): 5% hysteresis

    hysteresis: 0.05
    
  • Counter metrics (errors, packets): 20% hysteresis

    hysteresis: 0.2
    
  • Binary states (exit codes): No hysteresis

    hysteresis: 0.0
    

Plugin-Specific Configuration

CPU Monitor

cpu_monitor:
  cpu_percent:
    warning: 80.0
    critical: 90.0
    operator: ">"
    hysteresis: 0.1
  
  load_1min:
    warning: 4.0
    critical: 8.0
    operator: ">"
    hysteresis: 0.15
  
  load_5min:
    warning: 3.0
    critical: 6.0
    operator: ">"
  
  load_15min:
    warning: 2.0
    critical: 4.0
    operator: ">"

Memory Monitor

memory_monitor:
  # Percentage-based threshold
  percent:
    warning: 85.0
    critical: 95.0
    operator: ">"
  
  # Absolute value threshold (inverse - alert when LOW)
  available_mb:
    warning: 1000
    critical: 500
    operator: "<"
  
  # Swap usage
  swap_percent:
    warning: 50.0
    critical: 80.0
    operator: ">"

Disk Monitor

Disk thresholds support partition-specific configuration:

disk_monitor:
  partitions:
    /:
      percent:
        warning: 80.0
        critical: 90.0
        operator: ">"
        hysteresis: 0.05
      
      free_gb:
        warning: 10.0
        critical: 5.0
        operator: "<"
    
    /home:
      percent:
        warning: 85.0
        critical: 95.0
        operator: ">"
    
    /var:
      percent:
        warning: 80.0
        critical: 90.0
        operator: ">"
      
      free_gb:
        warning: 5.0
        critical: 2.0
        operator: "<"

Network Monitor

network_monitor:
  # Error counters
  errors_total:
    warning: 100
    critical: 1000
    operator: ">"
    hysteresis: 0.2
  
  # Dropped packets
  dropin_total:
    warning: 50
    critical: 200
    operator: ">"
  
  dropout_total:
    warning: 50
    critical: 200
    operator: ">"
  
  # Connection states
  connections_TIME_WAIT:
    warning: 1000
    critical: 5000
    operator: ">"
  
  connections_ESTABLISHED:
    warning: 500
    critical: 1000
    operator: ">"

Nagios Runner

The Nagios plugin runner reports exit codes that can be thresholded:

nagios_runner:
  exit_code:
    warning: 1       # Map Nagios WARNING to our WARNING
    critical: 2      # Map Nagios CRITICAL to our CRITICAL
    operator: ">="
    hysteresis: 0.0  # No hysteresis for exit codes

Notification Behavior

When Notifications Are Sent

Notifications are triggered on state changes:

  1. Escalation: OK → WARNING, OK → CRITICAL, WARNING → CRITICAL

    WARNING: webserver01 - cpu_monitor.cpu_percent = 85.0
    
  2. Recovery: CRITICAL → WARNING, CRITICAL → OK, WARNING → OK

    RECOVERED: webserver01 - cpu_monitor.cpu_percent = 70.0 (CRITICAL -> OK)
    
  3. Re-notifications: Periodic reminders for ongoing alerts

    REMINDER (CRITICAL): webserver01 - cpu_monitor.cpu_percent = 95.0 (ongoing for 3600s)
    

Notification Frequency

  • State changes: Immediate notification
  • Re-notifications: Controlled by threshold_renotify_interval (default: 3600 seconds = 1 hour)
threshold_renotify_interval: 3600  # Re-notify every hour for ongoing alerts

Notification Channels

The system supports centralized notification channel definitions, allowing different hosts to use different notification providers and credentials. This provides fine-grained control over who gets notified about what.

Supported Channel Types

  • Email (via SMTP)
  • Pushover (mobile notifications)
  • Signal (via signal-cli)
  • Mattermost (team chat webhooks)

Centralized Channel Configuration

Define notification channels once in the configuration file:

notification_channels:
  # Signal notifications
  signal_ops:
    type: signal
    cli_path: /usr/local/bin/signal-cli
    user: +1234567890
    recipient: +1234567890
  
  # Email notifications
  email_ops:
    type: email
    recipients: [ops@example.com, alerts@example.com]
    sender: heartbeat@example.com
    smtp_server: smtp.example.com
    smtp_port: 587
    smtp_user: heartbeat@example.com
    smtp_password: your-smtp-password
  
  # Pushover notifications
  pushover_urgent:
    type: pushover
    token: your-pushover-app-token
    user: your-pushover-user-key
  
  # Mattermost notifications
  mattermost_devops:
    type: mattermost
    host: mattermost.example.com
    token: your-webhook-token
    channel: devops-alerts
    username: heartbeat-bot
    icon: https://example.com/heartbeat-icon.png

# Default channels for hosts that don't specify channels
default_notification_channels: [email_ops]

Per-Host Channel Assignment

Assign notification channels to specific hosts in the hosts section:

hosts:
  # Critical server - multiple notification channels
  prod-web-01:
    threshold_config: high_sensitivity
    watch: true
    notification_channels: [signal_ops, pushover_urgent, email_ops]
    dyndns: false
  
  # Database server - ops team only
  prod-db-01:
    threshold_config: database
    watch: true
    notification_channels: [signal_ops, email_ops]
    dyndns: false
  
  # Development server - email only
  dev-server-01:
    threshold_config: low_sensitivity
    watch: false
    notification_channels: [email_ops]
    dyndns: false
  
  # Uses default_notification_channels if not specified
  test-server-01:
    threshold_config: default
    watch: false
    dyndns: false

Watched Hosts

Only hosts with watch: true in the hosts section will trigger notifications:

hosts:
  webserver01:
    watch: true
    notification_channels: [email_ops]
  
  database01:
    watch: true
    notification_channels: [signal_ops, email_ops]
  
  mailserver:
    watch: true
    notification_channels: [pushover_urgent]

Hosts not marked for watching will still have thresholds checked and alert states tracked, but won't send notifications.

Alert State Tracking

Each host maintains alert states for all monitored metrics:

host.alert_states = {
    "cpu_monitor.cpu_percent": AlertState(level=WARNING, since=1234567890),
    "memory_monitor.percent": AlertState(level=CRITICAL, since=1234567800),
    "disk_monitor./.percent": AlertState(level=OK, since=1234567700),
}

Alert states persist in memory and are saved with host data (pickle).

Alert State Information

Each AlertState tracks:

  • level: Current alert level (OK, WARNING, CRITICAL, UNKNOWN)
  • since: Timestamp when current state started
  • last_value: Most recent metric value
  • last_check: Timestamp of last threshold check
  • notification_count: Number of notifications sent for this alert
  • last_notification: Timestamp of last notification

Querying Alert States

Via HTTP API (future enhancement):

GET /api/hosts/webserver01/alerts

Response:

{
  "active_alerts": [
    {
      "metric": "cpu_monitor.cpu_percent",
      "level": "WARNING",
      "since": 1234567890,
      "value": 85.0,
      "duration": 300
    }
  ],
  "summary": {
    "ok": 15,
    "warning": 1,
    "critical": 0
  }
}

Testing

A comprehensive test suite is provided in test_threshold.py:

python test_threshold.py

Tests cover:

  • Threshold configuration and parsing
  • All comparison operators
  • Hysteresis functionality
  • Alert state tracking
  • State change detection
  • Notification triggering
  • Nested metrics (partitions)
  • Alert summaries

Best Practices

1. Start Conservative

Begin with higher thresholds to avoid alert fatigue:

cpu_monitor:
  cpu_percent:
    warning: 85.0    # Start higher
    critical: 95.0   # Very high for critical

Adjust downward based on observed behavior.

2. Consider Workload Patterns

Different systems have different normal ranges:

Web servers (bursty traffic):

cpu_percent:
  warning: 80.0
  critical: 90.0
  hysteresis: 0.15  # Higher hysteresis for burstiness

Database servers (steady load):

cpu_percent:
  warning: 70.0
  critical: 85.0
  hysteresis: 0.1   # Lower hysteresis for steady metrics

3. Use Appropriate Operators

Match the operator to the metric:

Metric Type Example Operator Reason
Resource usage CPU%, Memory% > Alert when high
Available resources Free memory, Free disk < Alert when low
Error counters Network errors > Alert when increasing
Health checks Nagios exit code >= Map to standard codes

4. Align with Monitoring Intervals

Ensure threshold checks align with plugin collection intervals:

plugins:
  cpu_monitor:
    interval: 300    # Check every 5 minutes

thresholds:
  cpu_monitor:
    cpu_percent:
      warning: 80.0
      # Will be checked every 5 minutes

5. Test Before Production

  1. Start with disabled thresholds:

    enabled: false
    
  2. Observe metric ranges over a week

  3. Set thresholds based on observed data

  4. Enable gradually:

    enabled: true
    
  5. Monitor for false positives

6. Document Baseline Values

Keep a record of normal operating ranges:

# Production web server baseline (observed over 30 days):
# CPU: 20-40% normal, 60% peak
# Memory: 60-70% normal, 80% peak
# Disk /: 40-50% usage, growing 2%/month

cpu_monitor:
  cpu_percent:
    warning: 75.0   # Above peak + margin
    critical: 90.0  # Danger zone

7. Layer Alerts

Use WARNING for early notification, CRITICAL for immediate action:

disk_monitor:
  partitions:
    /:
      percent:
        warning: 75.0    # Early warning: "check in next few days"
        critical: 90.0   # Critical: "act now before outage"

Troubleshooting

No Notifications Being Sent

  1. Check if host is watched:

    watchhosts:
      - your-host-name
    
  2. Verify notification configuration:

    toemail:
      - admin@example.com
    smtpserver: smtp.example.com
    
  3. Check threshold configuration:

    # Look for parsing errors in server logs
    grep "threshold" /var/log/heartbeat/hbd.log
    
  4. Verify metric names:

    • Metric names must match exactly (case-sensitive)
    • Check journal or logs for actual metric names

Too Many Alerts (Flapping)

  1. Increase hysteresis:

    hysteresis: 0.2  # Increase from 0.1 to 0.2 (20%)
    
  2. Adjust thresholds:

    warning: 85.0  # Increase from 80.0
    
  3. Increase renotification interval:

    threshold_renotify_interval: 7200  # 2 hours instead of 1
    

Alerts Not Triggering

  1. Check threshold operator:

    # For available memory (alert when LOW):
    operator: "<"   # NOT ">"
    
  2. Verify numeric values:

    • Ensure metric values are numeric
    • Check for unit mismatches (MB vs GB)
  3. Check if threshold is enabled:

    enabled: true  # NOT false
    
  4. Review hysteresis settings:

    • Very high hysteresis may prevent state changes
    • Try reducing or disabling temporarily

Alert State Not Recovering

  1. Check recovery threshold calculation:

    Threshold: 90
    Hysteresis: 0.1
    Recovery: 90 - (90 * 0.1) = 81
    
    Value must drop below 81 to recover
    
  2. Temporarily disable hysteresis:

    hysteresis: 0.0
    
  3. Monitor actual metric values:

    # Check journal for actual values
    grep "cpu_percent" /var/log/heartbeat/messages.journal | tail -20
    

Advanced Topics

Custom Notification Callbacks

The ThresholdChecker supports custom notification functions:

def custom_notifier(message):
    # Send to incident management system
    pagerduty.trigger(message)
    
    # Log to custom system
    logger.critical(message)
    
    # Update dashboard
    metrics.alert_count.inc()

checker = ThresholdChecker(
    config=config,
    notification_callback=custom_notifier
)

Programmatic Access

Query alert states programmatically:

# Get all active alerts for a host
active = threshold_checker.get_active_alerts(host.alert_states)

for alert in active:
    print(f"{alert.metric_path}: {alert.level.name} for {time.time() - alert.since}s")

# Get alert summary
summary = threshold_checker.get_alert_summary(host.alert_states)
print(f"WARNING: {summary['warning']}, CRITICAL: {summary['critical']}")

Integration with External Systems

Threshold violations can be integrated with:

  • PagerDuty: Incident creation and escalation
  • OpsGenie: On-call scheduling and routing
  • ServiceNow: Ticket creation
  • Grafana: Dashboard annotations
  • Elasticsearch: Alert indexing and analysis

Future Enhancements

Planned features:

  1. Composite thresholds: Alert based on multiple metrics

    composite:
      high_load_with_low_memory:
        conditions:
          - cpu_monitor.load_1min > 8.0
          - memory_monitor.available_mb < 500
    
  2. Time-based thresholds: Different thresholds by time of day

    schedule:
      business_hours:
        warning: 70.0
      off_hours:
        warning: 85.0
    
  3. Rate-of-change thresholds: Alert on rapid changes

    rate_of_change:
      metric: cpu_percent
      period: 300
      threshold: 30.0  # Alert if changes >30% in 5 minutes
    
  4. Alert grouping: Combine related alerts

    groups:
      disk_critical:
        metrics:
          - disk_monitor./.percent
          - disk_monitor./var.percent
        action: single_notification
    
  5. Maintenance windows: Suppress alerts during planned maintenance

    maintenance:
      - host: webserver01
        start: 2024-01-15T02:00:00Z
        end: 2024-01-15T04:00:00Z
    

See Also

Multi-Threshold Configuration

Support for multiple named threshold configurations with per-host mapping and composable layering.

Overview

The multi-threshold feature allows you to:

  • Define multiple named threshold configurations
  • Assign one or more configurations to each host
  • Compose configurations by layering — each named config's overrides are applied in order on top of the defaults
  • Use different sensitivity levels for different environments

Configuration Structure

Named configurations are defined under threshold_configs. Each host selects which ones to use via threshold_config in the hosts section (a string for a single config, or a list to layer multiple):

# Optional: set the default configuration name (defaults to "default")
default_threshold_config: "default"

threshold_configs:
  default:
    thresholds:
      cpu_monitor:
        cpu_percent:
          warning: 80.0
          critical: 90.0

  high_sensitivity:
    thresholds:
      cpu_monitor:
        cpu_percent:
          warning: 60.0
          critical: 75.0

  low_sensitivity:
    thresholds:
      cpu_monitor:
        cpu_percent:
          warning: 90.0
          critical: 95.0

hosts:
  prod-web-01:
    threshold_config: high_sensitivity   # single config

  dev-server-01:
    threshold_config: low_sensitivity

  # Hosts with no threshold_config use default_threshold_config

Composable Configurations (list form)

threshold_config can be a list. Configs are applied left to right: the defaults are the base, then each named config's overrides are layered on top. Later entries in the list win on any metric they define.

threshold_configs:
  default:
    thresholds:
      cpu_monitor:
        cpu_percent: {warning: 80, critical: 90}
      memory_monitor:
        memory_percent: {warning: 85, critical: 95}
      disk_monitor:
        partitions:
          /:
            percent: {warning: 80, critical: 90}

  # Tighter CPU limits for busy servers
  high_cpu_load:
    thresholds:
      cpu_monitor:
        cpu_percent: {warning: 60, critical: 75}

  # Tighter disk limits for data-heavy servers
  busy_disk:
    thresholds:
      disk_monitor:
        partitions:
          /:
            percent: {warning: 70, critical: 85}

hosts:
  # Gets default thresholds only
  web-01:
    threshold_config: default

  # Gets tighter CPU limits, default memory and disk
  build-server:
    threshold_config: high_cpu_load

  # Layers both: tighter CPU AND tighter disk, default memory
  db-01:
    threshold_config: [high_cpu_load, busy_disk]

  # Three layers: busy_disk overrides high_cpu_load if they conflict
  storage-01:
    threshold_config: [default, high_cpu_load, busy_disk]

How layering works:

Starting from the default thresholds:

Layer Applied config Effect
Base default all default thresholds
+1 high_cpu_load cpu_percent overridden to 60/75
+2 busy_disk disk percent overridden to 70/85; cpu_percent stays at 60/75

Each named config only overrides the metrics it explicitly defines. Metrics not mentioned in a config inherit from the layers beneath.

Use Cases

1. Environment-Based Thresholds

Different thresholds for production vs. development:

threshold_configs:
  production:
    thresholds:
      cpu_monitor:
        cpu_percent:
          warning: 70.0   # Alert earlier in production
          critical: 85.0

  development:
    thresholds:
      cpu_monitor:
        cpu_percent:
          warning: 90.0   # More relaxed for dev
          critical: 98.0

hosts:
  prod-web-01:
    threshold_config: production
  prod-web-02:
    threshold_config: production
  dev-web-01:
    threshold_config: development
  dev-web-02:
    threshold_config: development

2. Server Role-Based Thresholds

Different thresholds based on server function:

threshold_configs:
  webserver:
    thresholds:
      cpu_monitor:
        cpu_percent:
          warning: 80.0
          critical: 90.0

  database:
    thresholds:
      cpu_monitor:
        cpu_percent:
          warning: 70.0
          critical: 85.0
      memory_monitor:
        memory_percent:
          warning: 90.0   # Databases can use high memory
          critical: 97.0
      disk_monitor:
        partitions:
          /var/lib/mysql:
            percent:
              warning: 75.0
              critical: 85.0

  cache:
    thresholds:
      memory_monitor:
        memory_percent:
          warning: 95.0   # Redis/Memcached can use very high memory
          critical: 99.0

hosts:
  web-01:
    threshold_config: webserver
  web-02:
    threshold_config: webserver
  db-01:
    threshold_config: database
  db-02:
    threshold_config: database
  redis-01:
    threshold_config: cache
  memcached-01:
    threshold_config: cache

3. Sensitivity Levels

Different sensitivity for critical vs. non-critical systems:

threshold_configs:
  critical:
    thresholds:
      disk_monitor:
        partitions:
          /:
            percent:
              warning: 70.0
              critical: 80.0
              hysteresis: 0.15

  standard:
    thresholds:
      disk_monitor:
        partitions:
          /:
            percent:
              warning: 85.0
              critical: 95.0
              hysteresis: 0.1

  relaxed:
    thresholds:
      disk_monitor:
        partitions:
          /:
            percent:
              warning: 90.0
              critical: 98.0
              hysteresis: 0.05

hosts:
  payment-gateway:
    threshold_config: critical
  auth-server:
    threshold_config: critical
  web-01:
    threshold_config: standard
  web-02:
    threshold_config: standard
  test-server:
    threshold_config: relaxed

4. Composable Profiles

Build host-specific thresholds by combining small, focused configs:

threshold_configs:
  # Baseline — everything at default levels
  default:
    thresholds:
      cpu_monitor:
        cpu_percent: {warning: 80, critical: 90}
      memory_monitor:
        memory_percent: {warning: 85, critical: 95}

  # Overlay: tighter CPU only
  tight_cpu:
    thresholds:
      cpu_monitor:
        cpu_percent: {warning: 60, critical: 75}

  # Overlay: tighter memory only
  tight_memory:
    thresholds:
      memory_monitor:
        memory_percent: {warning: 70, critical: 85}

  # Overlay: extra disk partition for database servers
  db_disk:
    thresholds:
      disk_monitor:
        partitions:
          /var/lib/postgresql:
            percent: {warning: 75, critical: 88}

hosts:
  # Plain web server
  web-01:
    threshold_config: default

  # Build server: tight CPU, default memory and disk
  build-01:
    threshold_config: tight_cpu

  # Database: tight CPU + tight memory + extra disk partition
  db-01:
    threshold_config: [tight_cpu, tight_memory, db_disk]

  # Replica database: tight memory + extra disk, normal CPU
  db-02:
    threshold_config: [tight_memory, db_disk]

Configuration Priority

  1. Host threshold_config (list): Layer each named config's overrides left-to-right on top of the defaults
  2. Host threshold_config (string): Use that single named config directly
  3. host_threshold_mapping (legacy): Same as above, string only
  4. default_threshold_config: Used for hosts with no mapping
  5. First alphabetically: If the default config is not found, use the first config alphabetically
  6. Legacy thresholds section: Used when threshold_configs is absent entirely

Backward Compatibility

The legacy host_threshold_mapping top-level key and the flat thresholds section are still fully supported:

# Still works — equivalent to hosts: {prod-web-01: {threshold_config: high_sensitivity}}
host_threshold_mapping:
  prod-web-01: high_sensitivity

# Still works — equivalent to threshold_configs: {default: {thresholds: ...}}
thresholds:
  cpu_monitor:
    cpu_percent: {warning: 80, critical: 90}