Files
heartbeat/docs/HTTP_API.md
T
Andreas Wrede 0543266c92 Major refactoring of the codebase, including restructuring of files and directories, renaming of modules and classes, and improvements to the overall organization and readability of the code. This refactoring aims to enhance maintainability, scalability, and clarity of the codebase while preserving existing functionality. The changes include:
- Restructuring of the project directory into client and server components
- Renaming of modules and classes to better reflect their purpose and functionality
- Moving common utilities and configurations to a shared location
- Updating import statements to reflect the new structure
- Adding new documentation files for better clarity on various aspects of the project
- Removing deprecated or unused code to streamline the codebase
- Ensuring that all existing functionality is preserved and that the codebase remains functional after the refactoring.
2026-03-29 11:13:40 -04:00

11 KiB

HTTP API and Web UI Documentation

Overview

The Heartbeat Daemon provides a comprehensive HTTP API and web-based UI for monitoring plugin data and alert states. The API follows RESTful conventions and returns JSON responses.

Base URL

All API endpoints are relative to the server base URL:

http://your-server:50004

Default port is 50004 (configurable via hbd_port in configuration).


API Endpoints

Host Management

GET /api/0/hosts

Get list of all monitored hosts with their state information.

Response:

[
  {
    "name": "webserver01",
    "dyn": false,
    "ver": 6,
    "connections": [...]
  }
]

GET /api/0/messages

Get recent heartbeat messages (last 30).

Response:

[
  {
    "time": 1711234567.123,
    "host": "webserver01",
    "msg": "heartbeat received"
  }
]

Plugin Data Endpoints

GET /api/0/hosts/{hostname}/plugins

Get all plugin data for a specific host.

Parameters:

  • hostname (path): Name of the host

Response:

{
  "hostname": "webserver01",
  "plugins": {
    "cpu_monitor": {
      "timestamp": 1711234567.123,
      "data": {
        "cpu_percent": 45.2,
        "load_1min": 2.5,
        "load_5min": 2.1,
        "load_15min": 1.8
      },
      "sample_count": 100
    },
    "memory_monitor": {
      "timestamp": 1711234568.456,
      "data": {
        "percent": 65.4,
        "available_mb": 4096,
        "total_mb": 16384
      },
      "sample_count": 100
    }
  }
}

Example:

curl http://localhost:50004/api/0/hosts/webserver01/plugins

GET /api/0/hosts/{hostname}/plugins/{plugin_name}

Get detailed historical data for a specific plugin.

Parameters:

  • hostname (path): Name of the host
  • plugin_name (path): Name of the plugin
  • limit (query, optional): Number of recent samples to return (default: 10)

Response:

{
  "hostname": "webserver01",
  "plugin": "cpu_monitor",
  "samples": [
    {
      "timestamp": 1711234567.123,
      "data": {
        "cpu_percent": 45.2,
        "load_1min": 2.5
      }
    },
    {
      "timestamp": 1711234267.123,
      "data": {
        "cpu_percent": 42.1,
        "load_1min": 2.3
      }
    }
  ],
  "sample_count": 2
}

Examples:

# Get last 1 sample (most recent)
curl http://localhost:50004/api/0/hosts/webserver01/plugins/cpu_monitor?limit=1

# Get last 50 samples
curl http://localhost:50004/api/0/hosts/webserver01/plugins/memory_monitor?limit=50

# Get disk monitor data
curl http://localhost:50004/api/0/hosts/database01/plugins/disk_monitor

Alert Endpoints

GET /api/0/hosts/{hostname}/alerts

Get alert states for a specific host.

Parameters:

  • hostname (path): Name of the host

Response:

{
  "hostname": "webserver01",
  "alerts": [
    {
      "metric_path": "cpu_monitor.cpu_percent",
      "level": "WARNING",
      "since": 1711234000.0,
      "last_value": 85.5,
      "last_check": 1711234567.123,
      "notification_count": 2
    },
    {
      "metric_path": "disk_monitor./.percent",
      "level": "OK",
      "since": 1711230000.0,
      "last_value": 65.0,
      "last_check": 1711234567.123,
      "notification_count": 0
    }
  ],
  "summary": {
    "ok": 15,
    "warning": 1,
    "critical": 0,
    "unknown": 0
  }
}

Example:

curl http://localhost:50004/api/0/hosts/webserver01/alerts

GET /api/0/alerts

Get all active alerts across all monitored hosts.

Response:

{
  "alerts": [
    {
      "hostname": "webserver01",
      "metric_path": "cpu_monitor.cpu_percent",
      "level": "CRITICAL",
      "since": 1711234000.0,
      "last_value": 95.5,
      "last_check": 1711234567.123,
      "notification_count": 3
    },
    {
      "hostname": "database01",
      "metric_path": "memory_monitor.percent",
      "level": "WARNING",
      "since": 1711233000.0,
      "last_value": 88.2,
      "last_check": 1711234567.123,
      "notification_count": 1
    }
  ],
  "summary": {
    "critical": 1,
    "warning": 1,
    "unknown": 0,
    "total": 2
  },
  "host_count": 5
}

Example:

curl http://localhost:50004/api/0/alerts | jq .

Web UI Pages

Live Dashboard

URL: /live

Real-time dashboard showing:

  • Host connection states
  • IPv4/IPv6 connectivity
  • Latency metrics
  • Recent messages

Features:

  • WebSocket-powered live updates
  • Sortable columns
  • Color-coded status indicators

Plugin Metrics

URL: /plugins

Interactive visualization of plugin metrics:

  • Select host and plugin from dropdown
  • View current metric values
  • Automatic refresh every 30 seconds
  • Support for nested metrics (e.g., per-partition disk stats)

Features:

  • Card-based metric display
  • Unit formatting (%, MB, GB)
  • Nested object visualization
  • Auto-refresh

Screenshots of available data:

  • CPU usage, load average, frequency
  • Memory usage, available memory, swap
  • Disk usage per partition, I/O statistics
  • Network interface statistics, connection counts
  • Custom plugin data

Alerts Dashboard

URL: /alerts

Comprehensive alert monitoring:

  • Summary cards (Critical, Warning, Total Hosts)
  • Filter by severity (All, Critical, Warning)
  • Alert details with duration
  • Auto-refresh every 15 seconds

Features:

  • Color-coded alert levels
  • Duration tracking
  • Filterable list
  • Real-time updates
  • Summary statistics

Integration Examples

Monitoring Script

#!/bin/bash
# Check for critical alerts and send notification

RESPONSE=$(curl -s http://localhost:50004/api/0/alerts)
CRITICAL_COUNT=$(echo "$RESPONSE" | jq '.summary.critical')

if [ "$CRITICAL_COUNT" -gt 0 ]; then
    echo "CRITICAL: $CRITICAL_COUNT critical alerts detected!"
    echo "$RESPONSE" | jq '.alerts[] | select(.level=="CRITICAL")'
    # Send notification
    # mail -s "Critical Alerts" admin@example.com < alert_details.txt
fi

Python Client

import requests
import json

# Get all plugin data for a host
response = requests.get('http://localhost:50004/api/0/hosts/webserver01/plugins')
data = response.json()

print(f"Host: {data['hostname']}")
print(f"Plugins: {', '.join(data['plugins'].keys())}")

for plugin, info in data['plugins'].items():
    print(f"\n{plugin}:")
    for metric, value in info['data'].items():
        print(f"  {metric}: {value}")

# Check for alerts
response = requests.get('http://localhost:50004/api/0/alerts')
alerts = response.json()

if alerts['summary']['critical'] > 0:
    print(f"\n⚠️  {alerts['summary']['critical']} CRITICAL ALERTS!")
    for alert in alerts['alerts']:
        if alert['level'] == 'CRITICAL':
            print(f"  - {alert['hostname']}: {alert['metric_path']} = {alert['last_value']}")

Grafana Integration

The API endpoints can be used with Grafana's JSON datasource plugin:

  1. Install the SimpleJSON datasource plugin
  2. Configure datasource URL: http://your-server:50004
  3. Create queries:
    • Metrics: /api/0/hosts/webserver01/plugins/cpu_monitor?limit=100
    • Alerts: /api/0/alerts

Prometheus Integration

Export metrics in Prometheus format (future enhancement):

# Example prometheus exporter
from prometheus_client import Gauge, generate_latest
import requests

cpu_usage = Gauge('heartbeat_cpu_percent', 'CPU usage percentage', ['hostname'])
memory_usage = Gauge('heartbeat_memory_percent', 'Memory usage percentage', ['hostname'])

def collect_metrics():
    hosts = requests.get('http://localhost:50004/api/0/hosts').json()
    for host in hosts:
        hostname = host['name']
        plugins = requests.get(f'http://localhost:50004/api/0/hosts/{hostname}/plugins').json()
        
        if 'cpu_monitor' in plugins['plugins']:
            cpu_data = plugins['plugins']['cpu_monitor']['data']
            cpu_usage.labels(hostname=hostname).set(cpu_data.get('cpu_percent', 0))
        
        if 'memory_monitor' in plugins['plugins']:
            mem_data = plugins['plugins']['memory_monitor']['data']
            memory_usage.labels(hostname=hostname).set(mem_data.get('percent', 0))

Response Formats

Success Response

All successful API calls return HTTP 200 with JSON body:

{
  "field": "value",
  ...
}

Error Response

API errors return appropriate HTTP status codes with JSON:

{
  "error": "Host 'unknown-host' not found"
}

Common Status Codes:

  • 200 OK - Success
  • 400 Bad Request - Invalid parameters
  • 404 Not Found - Resource not found
  • 500 Internal Server Error - Server error

WebSocket API

For real-time updates, connect to the WebSocket endpoint:

URL: ws://your-server:50005/hbd (or wss:// for secure)

Messages:

{
  "type": "host",
  "data": {
    "name": "webserver01",
    "state": "UP"
  }
}
{
  "type": "plugin",
  "data": {
    "host": "webserver01",
    "plugin": "cpu_monitor",
    "data": {...},
    "timestamp": 1711234567.123
  }
}

Configuration

Enable HTTP Server

# In your hbd configuration file
hbd_host: ""           # Listen on all interfaces
hbd_port: 50004        # HTTP port
ws_port: 50005         # WebSocket port (optional)
# wss_port: 50006      # Secure WebSocket (requires SSL)

SSL/TLS Configuration

For secure WebSocket connections:

wss_port: 50006
cert_path: /etc/heartbeat/certs/
wss_pem: server.pem
wss_key: server.key

Rate Limiting

The API currently does not implement rate limiting. For production use, consider:

  • Placing behind a reverse proxy (nginx, Apache)
  • Using API gateway for rate limiting
  • Implementing caching for frequently accessed endpoints

CORS Support

By default, CORS is not enabled. To enable for web applications:

# In http.py, add CORS middleware
from aiohttp_cors import setup as cors_setup

app = web.Application()
cors = cors_setup(app)

# Configure CORS for all routes
for route in list(app.router.routes()):
    cors.add(route, {
        "*": aiohttp_cors.ResourceOptions(
            allow_credentials=True,
            expose_headers="*",
            allow_headers="*",
        )
    })

Performance Considerations

Caching

  • Plugin data is cached in memory (last 100 samples per plugin)
  • No database queries required
  • Responses are fast (<10ms typical)

Scalability

  • Each host stores its own data independently
  • Memory usage: ~1KB per host + ~1KB per plugin sample
  • For 100 hosts with 5 plugins: ~50MB memory

Best Practices

  1. Use limit parameter to control response size
  2. Cache responses on client side when appropriate
  3. Use WebSocket for real-time updates instead of polling
  4. Consider pagination for large deployments (future enhancement)

Troubleshooting

API Returns 404

  • Verify hostname in URL matches actual host name
  • Check host is sending heartbeats: curl http://localhost:50004/api/0/hosts

No Plugin Data

  • Verify client is configured with plugins
  • Check client logs for plugin errors
  • Ensure plugins are sending data (check journal logs)

Empty Alerts

  • Verify thresholds are configured
  • Check host is in watchhosts list
  • Ensure plugins are collecting metrics
  • Review server logs for threshold checker errors

See Also