Files
heartbeat/docs/HTTP_API.md
T
2026-04-02 07:17:00 -04:00

532 lines
11 KiB
Markdown

# HTTP API and Web UI Documentation
## Overview
The Heartbeat Daemon provides a comprehensive HTTP API and web-based UI for monitoring plugin data and alert states. The API follows RESTful conventions and returns JSON responses.
## Base URL
All API endpoints are relative to the server base URL:
```
http://your-server:50004
```
Default port is `50004` (configurable via `hbd_port` in configuration).
---
## API Endpoints
### Host Management
#### GET /api/0/hosts
Get list of all monitored hosts with their state information.
**Response:**
```json
[
{
"name": "webserver01",
"dyn": false,
"connections": [...]
}
]
```
#### GET /api/0/messages
Get recent heartbeat messages (last 30).
**Response:**
```json
[
{
"time": 1711234567.123,
"host": "webserver01",
"msg": "heartbeat received"
}
]
```
---
### Plugin Data Endpoints
#### GET /api/0/hosts/{hostname}/plugins
Get all plugin data for a specific host.
**Parameters:**
- `hostname` (path): Name of the host
**Response:**
```json
{
"hostname": "webserver01",
"plugins": {
"cpu_monitor": {
"timestamp": 1711234567.123,
"data": {
"cpu_percent": 45.2,
"load_1min": 2.5,
"load_5min": 2.1,
"load_15min": 1.8
},
"sample_count": 100
},
"memory_monitor": {
"timestamp": 1711234568.456,
"data": {
"percent": 65.4,
"available_mb": 4096,
"total_mb": 16384
},
"sample_count": 100
}
}
}
```
**Example:**
```bash
curl http://localhost:50004/api/0/hosts/webserver01/plugins
```
#### GET /api/0/hosts/{hostname}/plugins/{plugin_name}
Get detailed historical data for a specific plugin.
**Parameters:**
- `hostname` (path): Name of the host
- `plugin_name` (path): Name of the plugin
- `limit` (query, optional): Number of recent samples to return (default: 10)
**Response:**
```json
{
"hostname": "webserver01",
"plugin": "cpu_monitor",
"samples": [
{
"timestamp": 1711234567.123,
"data": {
"cpu_percent": 45.2,
"load_1min": 2.5
}
},
{
"timestamp": 1711234267.123,
"data": {
"cpu_percent": 42.1,
"load_1min": 2.3
}
}
],
"sample_count": 2
}
```
**Examples:**
```bash
# Get last 1 sample (most recent)
curl http://localhost:50004/api/0/hosts/webserver01/plugins/cpu_monitor?limit=1
# Get last 50 samples
curl http://localhost:50004/api/0/hosts/webserver01/plugins/memory_monitor?limit=50
# Get disk monitor data
curl http://localhost:50004/api/0/hosts/database01/plugins/disk_monitor
```
---
### Alert Endpoints
#### GET /api/0/hosts/{hostname}/alerts
Get alert states for a specific host.
**Parameters:**
- `hostname` (path): Name of the host
**Response:**
```json
{
"hostname": "webserver01",
"alerts": [
{
"metric_path": "cpu_monitor.cpu_percent",
"level": "WARNING",
"since": 1711234000.0,
"last_value": 85.5,
"last_check": 1711234567.123,
"notification_count": 2
},
{
"metric_path": "disk_monitor./.percent",
"level": "OK",
"since": 1711230000.0,
"last_value": 65.0,
"last_check": 1711234567.123,
"notification_count": 0
}
],
"summary": {
"ok": 15,
"warning": 1,
"critical": 0,
"unknown": 0
}
}
```
**Example:**
```bash
curl http://localhost:50004/api/0/hosts/webserver01/alerts
```
#### GET /api/0/alerts
Get all active alerts across all monitored hosts.
**Response:**
```json
{
"alerts": [
{
"hostname": "webserver01",
"metric_path": "cpu_monitor.cpu_percent",
"level": "CRITICAL",
"since": 1711234000.0,
"last_value": 95.5,
"last_check": 1711234567.123,
"notification_count": 3
},
{
"hostname": "database01",
"metric_path": "memory_monitor.percent",
"level": "WARNING",
"since": 1711233000.0,
"last_value": 88.2,
"last_check": 1711234567.123,
"notification_count": 1
}
],
"summary": {
"critical": 1,
"warning": 1,
"unknown": 0,
"total": 2
},
"host_count": 5
}
```
**Example:**
```bash
curl http://localhost:50004/api/0/alerts | jq .
```
---
## Web UI Pages
### Live Dashboard
**URL:** `/live`
Real-time dashboard showing:
- Host connection states
- IPv4/IPv6 connectivity
- Latency metrics
- Recent messages
**Features:**
- WebSocket-powered live updates
- Sortable columns
- Color-coded status indicators
### Plugin Metrics
**URL:** `/plugins`
Interactive visualization of plugin metrics:
- Select host and plugin from dropdown
- View current metric values
- Automatic refresh every 30 seconds
- Support for nested metrics (e.g., per-partition disk stats)
**Features:**
- Card-based metric display
- Unit formatting (%, MB, GB)
- Nested object visualization
- Auto-refresh
**Screenshots of available data:**
- CPU usage, load average, frequency
- Memory usage, available memory, swap
- Disk usage per partition, I/O statistics
- Network interface statistics, connection counts
- Custom plugin data
### Alerts Dashboard
**URL:** `/alerts`
Comprehensive alert monitoring:
- Summary cards (Critical, Warning, Total Hosts)
- Filter by severity (All, Critical, Warning)
- Alert details with duration
- Auto-refresh every 15 seconds
**Features:**
- Color-coded alert levels
- Duration tracking
- Filterable list
- Real-time updates
- Summary statistics
---
## Integration Examples
### Monitoring Script
```bash
#!/bin/bash
# Check for critical alerts and send notification
RESPONSE=$(curl -s http://localhost:50004/api/0/alerts)
CRITICAL_COUNT=$(echo "$RESPONSE" | jq '.summary.critical')
if [ "$CRITICAL_COUNT" -gt 0 ]; then
echo "CRITICAL: $CRITICAL_COUNT critical alerts detected!"
echo "$RESPONSE" | jq '.alerts[] | select(.level=="CRITICAL")'
# Send notification
# mail -s "Critical Alerts" admin@example.com < alert_details.txt
fi
```
### Python Client
```python
import requests
import json
# Get all plugin data for a host
response = requests.get('http://localhost:50004/api/0/hosts/webserver01/plugins')
data = response.json()
print(f"Host: {data['hostname']}")
print(f"Plugins: {', '.join(data['plugins'].keys())}")
for plugin, info in data['plugins'].items():
print(f"\n{plugin}:")
for metric, value in info['data'].items():
print(f" {metric}: {value}")
# Check for alerts
response = requests.get('http://localhost:50004/api/0/alerts')
alerts = response.json()
if alerts['summary']['critical'] > 0:
print(f"\n⚠️ {alerts['summary']['critical']} CRITICAL ALERTS!")
for alert in alerts['alerts']:
if alert['level'] == 'CRITICAL':
print(f" - {alert['hostname']}: {alert['metric_path']} = {alert['last_value']}")
```
### Grafana Integration
The API endpoints can be used with Grafana's JSON datasource plugin:
1. Install the SimpleJSON datasource plugin
2. Configure datasource URL: `http://your-server:50004`
3. Create queries:
- Metrics: `/api/0/hosts/webserver01/plugins/cpu_monitor?limit=100`
- Alerts: `/api/0/alerts`
### Prometheus Integration
Export metrics in Prometheus format (future enhancement):
```python
# Example prometheus exporter
from prometheus_client import Gauge, generate_latest
import requests
cpu_usage = Gauge('heartbeat_cpu_percent', 'CPU usage percentage', ['hostname'])
memory_usage = Gauge('heartbeat_memory_percent', 'Memory usage percentage', ['hostname'])
def collect_metrics():
hosts = requests.get('http://localhost:50004/api/0/hosts').json()
for host in hosts:
hostname = host['name']
plugins = requests.get(f'http://localhost:50004/api/0/hosts/{hostname}/plugins').json()
if 'cpu_monitor' in plugins['plugins']:
cpu_data = plugins['plugins']['cpu_monitor']['data']
cpu_usage.labels(hostname=hostname).set(cpu_data.get('cpu_percent', 0))
if 'memory_monitor' in plugins['plugins']:
mem_data = plugins['plugins']['memory_monitor']['data']
memory_usage.labels(hostname=hostname).set(mem_data.get('percent', 0))
```
---
## Response Formats
### Success Response
All successful API calls return HTTP 200 with JSON body:
```json
{
"field": "value",
...
}
```
### Error Response
API errors return appropriate HTTP status codes with JSON:
```json
{
"error": "Host 'unknown-host' not found"
}
```
**Common Status Codes:**
- `200 OK` - Success
- `400 Bad Request` - Invalid parameters
- `404 Not Found` - Resource not found
- `500 Internal Server Error` - Server error
---
## WebSocket API
For real-time updates, connect to the WebSocket endpoint:
**URL:** `ws://your-server:50005/hbd` (or `wss://` for secure)
**Messages:**
```json
{
"type": "host",
"data": {
"name": "webserver01",
"state": "UP"
}
}
```
```json
{
"type": "plugin",
"data": {
"host": "webserver01",
"plugin": "cpu_monitor",
"data": {...},
"timestamp": 1711234567.123
}
}
```
---
## Configuration
### Enable HTTP Server
```yaml
# In your hbd configuration file
hbd_host: "" # Listen on all interfaces
hbd_port: 50004 # HTTP port
ws_port: 50005 # WebSocket port (optional)
# wss_port: 50006 # Secure WebSocket (requires SSL)
```
### SSL/TLS Configuration
For secure WebSocket connections:
```yaml
wss_port: 50006
cert_path: /etc/heartbeat/certs/
wss_pem: server.pem
wss_key: server.key
```
---
## Rate Limiting
The API currently does not implement rate limiting. For production use, consider:
- Placing behind a reverse proxy (nginx, Apache)
- Using API gateway for rate limiting
- Implementing caching for frequently accessed endpoints
---
## CORS Support
By default, CORS is not enabled. To enable for web applications:
```python
# In http.py, add CORS middleware
from aiohttp_cors import setup as cors_setup
app = web.Application()
cors = cors_setup(app)
# Configure CORS for all routes
for route in list(app.router.routes()):
cors.add(route, {
"*": aiohttp_cors.ResourceOptions(
allow_credentials=True,
expose_headers="*",
allow_headers="*",
)
})
```
---
## Performance Considerations
### Caching
- Plugin data is cached in memory (last 100 samples per plugin)
- No database queries required
- Responses are fast (<10ms typical)
### Scalability
- Each host stores its own data independently
- Memory usage: ~1KB per host + ~1KB per plugin sample
- For 100 hosts with 5 plugins: ~50MB memory
### Best Practices
1. Use `limit` parameter to control response size
2. Cache responses on client side when appropriate
3. Use WebSocket for real-time updates instead of polling
4. Consider pagination for large deployments (future enhancement)
---
## Troubleshooting
### API Returns 404
- Verify hostname in URL matches actual host name
- Check host is sending heartbeats: `curl http://localhost:50004/api/0/hosts`
### No Plugin Data
- Verify client is configured with plugins
- Check client logs for plugin errors
- Ensure plugins are sending data (check journal logs)
### Empty Alerts
- Verify thresholds are configured
- Check host is in `watchhosts` list
- Ensure plugins are collecting metrics
- Review server logs for threshold checker errors
---
## See Also
- [Plugin Development Guide](PLUGIN_DEVELOPMENT.md)
- [Threshold Alerting Documentation](THRESHOLD_ALERTING.md)
- [Message Journal Documentation](MESSAGE_JOURNAL.md)
- Configuration examples: `hbd/config_example.yaml`