- Restructuring of the project directory into client and server components - Renaming of modules and classes to better reflect their purpose and functionality - Moving common utilities and configurations to a shared location - Updating import statements to reflect the new structure - Adding new documentation files for better clarity on various aspects of the project - Removing deprecated or unused code to streamline the codebase - Ensuring that all existing functionality is preserved and that the codebase remains functional after the refactoring.
11 KiB
HTTP API and Web UI Documentation
Overview
The Heartbeat Daemon provides a comprehensive HTTP API and web-based UI for monitoring plugin data and alert states. The API follows RESTful conventions and returns JSON responses.
Base URL
All API endpoints are relative to the server base URL:
http://your-server:50004
Default port is 50004 (configurable via hbd_port in configuration).
API Endpoints
Host Management
GET /api/0/hosts
Get list of all monitored hosts with their state information.
Response:
[
{
"name": "webserver01",
"dyn": false,
"ver": 6,
"connections": [...]
}
]
GET /api/0/messages
Get recent heartbeat messages (last 30).
Response:
[
{
"time": 1711234567.123,
"host": "webserver01",
"msg": "heartbeat received"
}
]
Plugin Data Endpoints
GET /api/0/hosts/{hostname}/plugins
Get all plugin data for a specific host.
Parameters:
hostname(path): Name of the host
Response:
{
"hostname": "webserver01",
"plugins": {
"cpu_monitor": {
"timestamp": 1711234567.123,
"data": {
"cpu_percent": 45.2,
"load_1min": 2.5,
"load_5min": 2.1,
"load_15min": 1.8
},
"sample_count": 100
},
"memory_monitor": {
"timestamp": 1711234568.456,
"data": {
"percent": 65.4,
"available_mb": 4096,
"total_mb": 16384
},
"sample_count": 100
}
}
}
Example:
curl http://localhost:50004/api/0/hosts/webserver01/plugins
GET /api/0/hosts/{hostname}/plugins/{plugin_name}
Get detailed historical data for a specific plugin.
Parameters:
hostname(path): Name of the hostplugin_name(path): Name of the pluginlimit(query, optional): Number of recent samples to return (default: 10)
Response:
{
"hostname": "webserver01",
"plugin": "cpu_monitor",
"samples": [
{
"timestamp": 1711234567.123,
"data": {
"cpu_percent": 45.2,
"load_1min": 2.5
}
},
{
"timestamp": 1711234267.123,
"data": {
"cpu_percent": 42.1,
"load_1min": 2.3
}
}
],
"sample_count": 2
}
Examples:
# Get last 1 sample (most recent)
curl http://localhost:50004/api/0/hosts/webserver01/plugins/cpu_monitor?limit=1
# Get last 50 samples
curl http://localhost:50004/api/0/hosts/webserver01/plugins/memory_monitor?limit=50
# Get disk monitor data
curl http://localhost:50004/api/0/hosts/database01/plugins/disk_monitor
Alert Endpoints
GET /api/0/hosts/{hostname}/alerts
Get alert states for a specific host.
Parameters:
hostname(path): Name of the host
Response:
{
"hostname": "webserver01",
"alerts": [
{
"metric_path": "cpu_monitor.cpu_percent",
"level": "WARNING",
"since": 1711234000.0,
"last_value": 85.5,
"last_check": 1711234567.123,
"notification_count": 2
},
{
"metric_path": "disk_monitor./.percent",
"level": "OK",
"since": 1711230000.0,
"last_value": 65.0,
"last_check": 1711234567.123,
"notification_count": 0
}
],
"summary": {
"ok": 15,
"warning": 1,
"critical": 0,
"unknown": 0
}
}
Example:
curl http://localhost:50004/api/0/hosts/webserver01/alerts
GET /api/0/alerts
Get all active alerts across all monitored hosts.
Response:
{
"alerts": [
{
"hostname": "webserver01",
"metric_path": "cpu_monitor.cpu_percent",
"level": "CRITICAL",
"since": 1711234000.0,
"last_value": 95.5,
"last_check": 1711234567.123,
"notification_count": 3
},
{
"hostname": "database01",
"metric_path": "memory_monitor.percent",
"level": "WARNING",
"since": 1711233000.0,
"last_value": 88.2,
"last_check": 1711234567.123,
"notification_count": 1
}
],
"summary": {
"critical": 1,
"warning": 1,
"unknown": 0,
"total": 2
},
"host_count": 5
}
Example:
curl http://localhost:50004/api/0/alerts | jq .
Web UI Pages
Live Dashboard
URL: /live
Real-time dashboard showing:
- Host connection states
- IPv4/IPv6 connectivity
- Latency metrics
- Recent messages
Features:
- WebSocket-powered live updates
- Sortable columns
- Color-coded status indicators
Plugin Metrics
URL: /plugins
Interactive visualization of plugin metrics:
- Select host and plugin from dropdown
- View current metric values
- Automatic refresh every 30 seconds
- Support for nested metrics (e.g., per-partition disk stats)
Features:
- Card-based metric display
- Unit formatting (%, MB, GB)
- Nested object visualization
- Auto-refresh
Screenshots of available data:
- CPU usage, load average, frequency
- Memory usage, available memory, swap
- Disk usage per partition, I/O statistics
- Network interface statistics, connection counts
- Custom plugin data
Alerts Dashboard
URL: /alerts
Comprehensive alert monitoring:
- Summary cards (Critical, Warning, Total Hosts)
- Filter by severity (All, Critical, Warning)
- Alert details with duration
- Auto-refresh every 15 seconds
Features:
- Color-coded alert levels
- Duration tracking
- Filterable list
- Real-time updates
- Summary statistics
Integration Examples
Monitoring Script
#!/bin/bash
# Check for critical alerts and send notification
RESPONSE=$(curl -s http://localhost:50004/api/0/alerts)
CRITICAL_COUNT=$(echo "$RESPONSE" | jq '.summary.critical')
if [ "$CRITICAL_COUNT" -gt 0 ]; then
echo "CRITICAL: $CRITICAL_COUNT critical alerts detected!"
echo "$RESPONSE" | jq '.alerts[] | select(.level=="CRITICAL")'
# Send notification
# mail -s "Critical Alerts" admin@example.com < alert_details.txt
fi
Python Client
import requests
import json
# Get all plugin data for a host
response = requests.get('http://localhost:50004/api/0/hosts/webserver01/plugins')
data = response.json()
print(f"Host: {data['hostname']}")
print(f"Plugins: {', '.join(data['plugins'].keys())}")
for plugin, info in data['plugins'].items():
print(f"\n{plugin}:")
for metric, value in info['data'].items():
print(f" {metric}: {value}")
# Check for alerts
response = requests.get('http://localhost:50004/api/0/alerts')
alerts = response.json()
if alerts['summary']['critical'] > 0:
print(f"\n⚠️ {alerts['summary']['critical']} CRITICAL ALERTS!")
for alert in alerts['alerts']:
if alert['level'] == 'CRITICAL':
print(f" - {alert['hostname']}: {alert['metric_path']} = {alert['last_value']}")
Grafana Integration
The API endpoints can be used with Grafana's JSON datasource plugin:
- Install the SimpleJSON datasource plugin
- Configure datasource URL:
http://your-server:50004 - Create queries:
- Metrics:
/api/0/hosts/webserver01/plugins/cpu_monitor?limit=100 - Alerts:
/api/0/alerts
- Metrics:
Prometheus Integration
Export metrics in Prometheus format (future enhancement):
# Example prometheus exporter
from prometheus_client import Gauge, generate_latest
import requests
cpu_usage = Gauge('heartbeat_cpu_percent', 'CPU usage percentage', ['hostname'])
memory_usage = Gauge('heartbeat_memory_percent', 'Memory usage percentage', ['hostname'])
def collect_metrics():
hosts = requests.get('http://localhost:50004/api/0/hosts').json()
for host in hosts:
hostname = host['name']
plugins = requests.get(f'http://localhost:50004/api/0/hosts/{hostname}/plugins').json()
if 'cpu_monitor' in plugins['plugins']:
cpu_data = plugins['plugins']['cpu_monitor']['data']
cpu_usage.labels(hostname=hostname).set(cpu_data.get('cpu_percent', 0))
if 'memory_monitor' in plugins['plugins']:
mem_data = plugins['plugins']['memory_monitor']['data']
memory_usage.labels(hostname=hostname).set(mem_data.get('percent', 0))
Response Formats
Success Response
All successful API calls return HTTP 200 with JSON body:
{
"field": "value",
...
}
Error Response
API errors return appropriate HTTP status codes with JSON:
{
"error": "Host 'unknown-host' not found"
}
Common Status Codes:
200 OK- Success400 Bad Request- Invalid parameters404 Not Found- Resource not found500 Internal Server Error- Server error
WebSocket API
For real-time updates, connect to the WebSocket endpoint:
URL: ws://your-server:50005/hbd (or wss:// for secure)
Messages:
{
"type": "host",
"data": {
"name": "webserver01",
"state": "UP"
}
}
{
"type": "plugin",
"data": {
"host": "webserver01",
"plugin": "cpu_monitor",
"data": {...},
"timestamp": 1711234567.123
}
}
Configuration
Enable HTTP Server
# In your hbd configuration file
hbd_host: "" # Listen on all interfaces
hbd_port: 50004 # HTTP port
ws_port: 50005 # WebSocket port (optional)
# wss_port: 50006 # Secure WebSocket (requires SSL)
SSL/TLS Configuration
For secure WebSocket connections:
wss_port: 50006
cert_path: /etc/heartbeat/certs/
wss_pem: server.pem
wss_key: server.key
Rate Limiting
The API currently does not implement rate limiting. For production use, consider:
- Placing behind a reverse proxy (nginx, Apache)
- Using API gateway for rate limiting
- Implementing caching for frequently accessed endpoints
CORS Support
By default, CORS is not enabled. To enable for web applications:
# In http.py, add CORS middleware
from aiohttp_cors import setup as cors_setup
app = web.Application()
cors = cors_setup(app)
# Configure CORS for all routes
for route in list(app.router.routes()):
cors.add(route, {
"*": aiohttp_cors.ResourceOptions(
allow_credentials=True,
expose_headers="*",
allow_headers="*",
)
})
Performance Considerations
Caching
- Plugin data is cached in memory (last 100 samples per plugin)
- No database queries required
- Responses are fast (<10ms typical)
Scalability
- Each host stores its own data independently
- Memory usage: ~1KB per host + ~1KB per plugin sample
- For 100 hosts with 5 plugins: ~50MB memory
Best Practices
- Use
limitparameter to control response size - Cache responses on client side when appropriate
- Use WebSocket for real-time updates instead of polling
- Consider pagination for large deployments (future enhancement)
Troubleshooting
API Returns 404
- Verify hostname in URL matches actual host name
- Check host is sending heartbeats:
curl http://localhost:50004/api/0/hosts
No Plugin Data
- Verify client is configured with plugins
- Check client logs for plugin errors
- Ensure plugins are sending data (check journal logs)
Empty Alerts
- Verify thresholds are configured
- Check host is in
watchhostslist - Ensure plugins are collecting metrics
- Review server logs for threshold checker errors
See Also
- Plugin Development Guide
- Threshold Alerting Documentation
- Message Journal Documentation
- Configuration examples:
hbd/config_example.yaml