Files
heartbeat/docs/HTTP_API.md
T
Andreas Wrede 0543266c92 Major refactoring of the codebase, including restructuring of files and directories, renaming of modules and classes, and improvements to the overall organization and readability of the code. This refactoring aims to enhance maintainability, scalability, and clarity of the codebase while preserving existing functionality. The changes include:
- Restructuring of the project directory into client and server components
- Renaming of modules and classes to better reflect their purpose and functionality
- Moving common utilities and configurations to a shared location
- Updating import statements to reflect the new structure
- Adding new documentation files for better clarity on various aspects of the project
- Removing deprecated or unused code to streamline the codebase
- Ensuring that all existing functionality is preserved and that the codebase remains functional after the refactoring.
2026-03-29 11:13:40 -04:00

533 lines
11 KiB
Markdown

# HTTP API and Web UI Documentation
## Overview
The Heartbeat Daemon provides a comprehensive HTTP API and web-based UI for monitoring plugin data and alert states. The API follows RESTful conventions and returns JSON responses.
## Base URL
All API endpoints are relative to the server base URL:
```
http://your-server:50004
```
Default port is `50004` (configurable via `hbd_port` in configuration).
---
## API Endpoints
### Host Management
#### GET /api/0/hosts
Get list of all monitored hosts with their state information.
**Response:**
```json
[
{
"name": "webserver01",
"dyn": false,
"ver": 6,
"connections": [...]
}
]
```
#### GET /api/0/messages
Get recent heartbeat messages (last 30).
**Response:**
```json
[
{
"time": 1711234567.123,
"host": "webserver01",
"msg": "heartbeat received"
}
]
```
---
### Plugin Data Endpoints
#### GET /api/0/hosts/{hostname}/plugins
Get all plugin data for a specific host.
**Parameters:**
- `hostname` (path): Name of the host
**Response:**
```json
{
"hostname": "webserver01",
"plugins": {
"cpu_monitor": {
"timestamp": 1711234567.123,
"data": {
"cpu_percent": 45.2,
"load_1min": 2.5,
"load_5min": 2.1,
"load_15min": 1.8
},
"sample_count": 100
},
"memory_monitor": {
"timestamp": 1711234568.456,
"data": {
"percent": 65.4,
"available_mb": 4096,
"total_mb": 16384
},
"sample_count": 100
}
}
}
```
**Example:**
```bash
curl http://localhost:50004/api/0/hosts/webserver01/plugins
```
#### GET /api/0/hosts/{hostname}/plugins/{plugin_name}
Get detailed historical data for a specific plugin.
**Parameters:**
- `hostname` (path): Name of the host
- `plugin_name` (path): Name of the plugin
- `limit` (query, optional): Number of recent samples to return (default: 10)
**Response:**
```json
{
"hostname": "webserver01",
"plugin": "cpu_monitor",
"samples": [
{
"timestamp": 1711234567.123,
"data": {
"cpu_percent": 45.2,
"load_1min": 2.5
}
},
{
"timestamp": 1711234267.123,
"data": {
"cpu_percent": 42.1,
"load_1min": 2.3
}
}
],
"sample_count": 2
}
```
**Examples:**
```bash
# Get last 1 sample (most recent)
curl http://localhost:50004/api/0/hosts/webserver01/plugins/cpu_monitor?limit=1
# Get last 50 samples
curl http://localhost:50004/api/0/hosts/webserver01/plugins/memory_monitor?limit=50
# Get disk monitor data
curl http://localhost:50004/api/0/hosts/database01/plugins/disk_monitor
```
---
### Alert Endpoints
#### GET /api/0/hosts/{hostname}/alerts
Get alert states for a specific host.
**Parameters:**
- `hostname` (path): Name of the host
**Response:**
```json
{
"hostname": "webserver01",
"alerts": [
{
"metric_path": "cpu_monitor.cpu_percent",
"level": "WARNING",
"since": 1711234000.0,
"last_value": 85.5,
"last_check": 1711234567.123,
"notification_count": 2
},
{
"metric_path": "disk_monitor./.percent",
"level": "OK",
"since": 1711230000.0,
"last_value": 65.0,
"last_check": 1711234567.123,
"notification_count": 0
}
],
"summary": {
"ok": 15,
"warning": 1,
"critical": 0,
"unknown": 0
}
}
```
**Example:**
```bash
curl http://localhost:50004/api/0/hosts/webserver01/alerts
```
#### GET /api/0/alerts
Get all active alerts across all monitored hosts.
**Response:**
```json
{
"alerts": [
{
"hostname": "webserver01",
"metric_path": "cpu_monitor.cpu_percent",
"level": "CRITICAL",
"since": 1711234000.0,
"last_value": 95.5,
"last_check": 1711234567.123,
"notification_count": 3
},
{
"hostname": "database01",
"metric_path": "memory_monitor.percent",
"level": "WARNING",
"since": 1711233000.0,
"last_value": 88.2,
"last_check": 1711234567.123,
"notification_count": 1
}
],
"summary": {
"critical": 1,
"warning": 1,
"unknown": 0,
"total": 2
},
"host_count": 5
}
```
**Example:**
```bash
curl http://localhost:50004/api/0/alerts | jq .
```
---
## Web UI Pages
### Live Dashboard
**URL:** `/live`
Real-time dashboard showing:
- Host connection states
- IPv4/IPv6 connectivity
- Latency metrics
- Recent messages
**Features:**
- WebSocket-powered live updates
- Sortable columns
- Color-coded status indicators
### Plugin Metrics
**URL:** `/plugins`
Interactive visualization of plugin metrics:
- Select host and plugin from dropdown
- View current metric values
- Automatic refresh every 30 seconds
- Support for nested metrics (e.g., per-partition disk stats)
**Features:**
- Card-based metric display
- Unit formatting (%, MB, GB)
- Nested object visualization
- Auto-refresh
**Screenshots of available data:**
- CPU usage, load average, frequency
- Memory usage, available memory, swap
- Disk usage per partition, I/O statistics
- Network interface statistics, connection counts
- Custom plugin data
### Alerts Dashboard
**URL:** `/alerts`
Comprehensive alert monitoring:
- Summary cards (Critical, Warning, Total Hosts)
- Filter by severity (All, Critical, Warning)
- Alert details with duration
- Auto-refresh every 15 seconds
**Features:**
- Color-coded alert levels
- Duration tracking
- Filterable list
- Real-time updates
- Summary statistics
---
## Integration Examples
### Monitoring Script
```bash
#!/bin/bash
# Check for critical alerts and send notification
RESPONSE=$(curl -s http://localhost:50004/api/0/alerts)
CRITICAL_COUNT=$(echo "$RESPONSE" | jq '.summary.critical')
if [ "$CRITICAL_COUNT" -gt 0 ]; then
echo "CRITICAL: $CRITICAL_COUNT critical alerts detected!"
echo "$RESPONSE" | jq '.alerts[] | select(.level=="CRITICAL")'
# Send notification
# mail -s "Critical Alerts" admin@example.com < alert_details.txt
fi
```
### Python Client
```python
import requests
import json
# Get all plugin data for a host
response = requests.get('http://localhost:50004/api/0/hosts/webserver01/plugins')
data = response.json()
print(f"Host: {data['hostname']}")
print(f"Plugins: {', '.join(data['plugins'].keys())}")
for plugin, info in data['plugins'].items():
print(f"\n{plugin}:")
for metric, value in info['data'].items():
print(f" {metric}: {value}")
# Check for alerts
response = requests.get('http://localhost:50004/api/0/alerts')
alerts = response.json()
if alerts['summary']['critical'] > 0:
print(f"\n⚠️ {alerts['summary']['critical']} CRITICAL ALERTS!")
for alert in alerts['alerts']:
if alert['level'] == 'CRITICAL':
print(f" - {alert['hostname']}: {alert['metric_path']} = {alert['last_value']}")
```
### Grafana Integration
The API endpoints can be used with Grafana's JSON datasource plugin:
1. Install the SimpleJSON datasource plugin
2. Configure datasource URL: `http://your-server:50004`
3. Create queries:
- Metrics: `/api/0/hosts/webserver01/plugins/cpu_monitor?limit=100`
- Alerts: `/api/0/alerts`
### Prometheus Integration
Export metrics in Prometheus format (future enhancement):
```python
# Example prometheus exporter
from prometheus_client import Gauge, generate_latest
import requests
cpu_usage = Gauge('heartbeat_cpu_percent', 'CPU usage percentage', ['hostname'])
memory_usage = Gauge('heartbeat_memory_percent', 'Memory usage percentage', ['hostname'])
def collect_metrics():
hosts = requests.get('http://localhost:50004/api/0/hosts').json()
for host in hosts:
hostname = host['name']
plugins = requests.get(f'http://localhost:50004/api/0/hosts/{hostname}/plugins').json()
if 'cpu_monitor' in plugins['plugins']:
cpu_data = plugins['plugins']['cpu_monitor']['data']
cpu_usage.labels(hostname=hostname).set(cpu_data.get('cpu_percent', 0))
if 'memory_monitor' in plugins['plugins']:
mem_data = plugins['plugins']['memory_monitor']['data']
memory_usage.labels(hostname=hostname).set(mem_data.get('percent', 0))
```
---
## Response Formats
### Success Response
All successful API calls return HTTP 200 with JSON body:
```json
{
"field": "value",
...
}
```
### Error Response
API errors return appropriate HTTP status codes with JSON:
```json
{
"error": "Host 'unknown-host' not found"
}
```
**Common Status Codes:**
- `200 OK` - Success
- `400 Bad Request` - Invalid parameters
- `404 Not Found` - Resource not found
- `500 Internal Server Error` - Server error
---
## WebSocket API
For real-time updates, connect to the WebSocket endpoint:
**URL:** `ws://your-server:50005/hbd` (or `wss://` for secure)
**Messages:**
```json
{
"type": "host",
"data": {
"name": "webserver01",
"state": "UP"
}
}
```
```json
{
"type": "plugin",
"data": {
"host": "webserver01",
"plugin": "cpu_monitor",
"data": {...},
"timestamp": 1711234567.123
}
}
```
---
## Configuration
### Enable HTTP Server
```yaml
# In your hbd configuration file
hbd_host: "" # Listen on all interfaces
hbd_port: 50004 # HTTP port
ws_port: 50005 # WebSocket port (optional)
# wss_port: 50006 # Secure WebSocket (requires SSL)
```
### SSL/TLS Configuration
For secure WebSocket connections:
```yaml
wss_port: 50006
cert_path: /etc/heartbeat/certs/
wss_pem: server.pem
wss_key: server.key
```
---
## Rate Limiting
The API currently does not implement rate limiting. For production use, consider:
- Placing behind a reverse proxy (nginx, Apache)
- Using API gateway for rate limiting
- Implementing caching for frequently accessed endpoints
---
## CORS Support
By default, CORS is not enabled. To enable for web applications:
```python
# In http.py, add CORS middleware
from aiohttp_cors import setup as cors_setup
app = web.Application()
cors = cors_setup(app)
# Configure CORS for all routes
for route in list(app.router.routes()):
cors.add(route, {
"*": aiohttp_cors.ResourceOptions(
allow_credentials=True,
expose_headers="*",
allow_headers="*",
)
})
```
---
## Performance Considerations
### Caching
- Plugin data is cached in memory (last 100 samples per plugin)
- No database queries required
- Responses are fast (<10ms typical)
### Scalability
- Each host stores its own data independently
- Memory usage: ~1KB per host + ~1KB per plugin sample
- For 100 hosts with 5 plugins: ~50MB memory
### Best Practices
1. Use `limit` parameter to control response size
2. Cache responses on client side when appropriate
3. Use WebSocket for real-time updates instead of polling
4. Consider pagination for large deployments (future enhancement)
---
## Troubleshooting
### API Returns 404
- Verify hostname in URL matches actual host name
- Check host is sending heartbeats: `curl http://localhost:50004/api/0/hosts`
### No Plugin Data
- Verify client is configured with plugins
- Check client logs for plugin errors
- Ensure plugins are sending data (check journal logs)
### Empty Alerts
- Verify thresholds are configured
- Check host is in `watchhosts` list
- Ensure plugins are collecting metrics
- Review server logs for threshold checker errors
---
## See Also
- [Plugin Development Guide](PLUGIN_DEVELOPMENT.md)
- [Threshold Alerting Documentation](THRESHOLD_ALERTING.md)
- [Message Journal Documentation](MESSAGE_JOURNAL.md)
- Configuration examples: `hbd/config_example.yaml`