0543266c92
- Restructuring of the project directory into client and server components - Renaming of modules and classes to better reflect their purpose and functionality - Moving common utilities and configurations to a shared location - Updating import statements to reflect the new structure - Adding new documentation files for better clarity on various aspects of the project - Removing deprecated or unused code to streamline the codebase - Ensuring that all existing functionality is preserved and that the codebase remains functional after the refactoring.
533 lines
11 KiB
Markdown
533 lines
11 KiB
Markdown
# HTTP API and Web UI Documentation
|
|
|
|
## Overview
|
|
|
|
The Heartbeat Daemon provides a comprehensive HTTP API and web-based UI for monitoring plugin data and alert states. The API follows RESTful conventions and returns JSON responses.
|
|
|
|
## Base URL
|
|
|
|
All API endpoints are relative to the server base URL:
|
|
```
|
|
http://your-server:50004
|
|
```
|
|
|
|
Default port is `50004` (configurable via `hbd_port` in configuration).
|
|
|
|
---
|
|
|
|
## API Endpoints
|
|
|
|
### Host Management
|
|
|
|
#### GET /api/0/hosts
|
|
Get list of all monitored hosts with their state information.
|
|
|
|
**Response:**
|
|
```json
|
|
[
|
|
{
|
|
"name": "webserver01",
|
|
"dyn": false,
|
|
"ver": 6,
|
|
"connections": [...]
|
|
}
|
|
]
|
|
```
|
|
|
|
#### GET /api/0/messages
|
|
Get recent heartbeat messages (last 30).
|
|
|
|
**Response:**
|
|
```json
|
|
[
|
|
{
|
|
"time": 1711234567.123,
|
|
"host": "webserver01",
|
|
"msg": "heartbeat received"
|
|
}
|
|
]
|
|
```
|
|
|
|
---
|
|
|
|
### Plugin Data Endpoints
|
|
|
|
#### GET /api/0/hosts/{hostname}/plugins
|
|
Get all plugin data for a specific host.
|
|
|
|
**Parameters:**
|
|
- `hostname` (path): Name of the host
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"hostname": "webserver01",
|
|
"plugins": {
|
|
"cpu_monitor": {
|
|
"timestamp": 1711234567.123,
|
|
"data": {
|
|
"cpu_percent": 45.2,
|
|
"load_1min": 2.5,
|
|
"load_5min": 2.1,
|
|
"load_15min": 1.8
|
|
},
|
|
"sample_count": 100
|
|
},
|
|
"memory_monitor": {
|
|
"timestamp": 1711234568.456,
|
|
"data": {
|
|
"percent": 65.4,
|
|
"available_mb": 4096,
|
|
"total_mb": 16384
|
|
},
|
|
"sample_count": 100
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```bash
|
|
curl http://localhost:50004/api/0/hosts/webserver01/plugins
|
|
```
|
|
|
|
#### GET /api/0/hosts/{hostname}/plugins/{plugin_name}
|
|
Get detailed historical data for a specific plugin.
|
|
|
|
**Parameters:**
|
|
- `hostname` (path): Name of the host
|
|
- `plugin_name` (path): Name of the plugin
|
|
- `limit` (query, optional): Number of recent samples to return (default: 10)
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"hostname": "webserver01",
|
|
"plugin": "cpu_monitor",
|
|
"samples": [
|
|
{
|
|
"timestamp": 1711234567.123,
|
|
"data": {
|
|
"cpu_percent": 45.2,
|
|
"load_1min": 2.5
|
|
}
|
|
},
|
|
{
|
|
"timestamp": 1711234267.123,
|
|
"data": {
|
|
"cpu_percent": 42.1,
|
|
"load_1min": 2.3
|
|
}
|
|
}
|
|
],
|
|
"sample_count": 2
|
|
}
|
|
```
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Get last 1 sample (most recent)
|
|
curl http://localhost:50004/api/0/hosts/webserver01/plugins/cpu_monitor?limit=1
|
|
|
|
# Get last 50 samples
|
|
curl http://localhost:50004/api/0/hosts/webserver01/plugins/memory_monitor?limit=50
|
|
|
|
# Get disk monitor data
|
|
curl http://localhost:50004/api/0/hosts/database01/plugins/disk_monitor
|
|
```
|
|
|
|
---
|
|
|
|
### Alert Endpoints
|
|
|
|
#### GET /api/0/hosts/{hostname}/alerts
|
|
Get alert states for a specific host.
|
|
|
|
**Parameters:**
|
|
- `hostname` (path): Name of the host
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"hostname": "webserver01",
|
|
"alerts": [
|
|
{
|
|
"metric_path": "cpu_monitor.cpu_percent",
|
|
"level": "WARNING",
|
|
"since": 1711234000.0,
|
|
"last_value": 85.5,
|
|
"last_check": 1711234567.123,
|
|
"notification_count": 2
|
|
},
|
|
{
|
|
"metric_path": "disk_monitor./.percent",
|
|
"level": "OK",
|
|
"since": 1711230000.0,
|
|
"last_value": 65.0,
|
|
"last_check": 1711234567.123,
|
|
"notification_count": 0
|
|
}
|
|
],
|
|
"summary": {
|
|
"ok": 15,
|
|
"warning": 1,
|
|
"critical": 0,
|
|
"unknown": 0
|
|
}
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```bash
|
|
curl http://localhost:50004/api/0/hosts/webserver01/alerts
|
|
```
|
|
|
|
#### GET /api/0/alerts
|
|
Get all active alerts across all monitored hosts.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"alerts": [
|
|
{
|
|
"hostname": "webserver01",
|
|
"metric_path": "cpu_monitor.cpu_percent",
|
|
"level": "CRITICAL",
|
|
"since": 1711234000.0,
|
|
"last_value": 95.5,
|
|
"last_check": 1711234567.123,
|
|
"notification_count": 3
|
|
},
|
|
{
|
|
"hostname": "database01",
|
|
"metric_path": "memory_monitor.percent",
|
|
"level": "WARNING",
|
|
"since": 1711233000.0,
|
|
"last_value": 88.2,
|
|
"last_check": 1711234567.123,
|
|
"notification_count": 1
|
|
}
|
|
],
|
|
"summary": {
|
|
"critical": 1,
|
|
"warning": 1,
|
|
"unknown": 0,
|
|
"total": 2
|
|
},
|
|
"host_count": 5
|
|
}
|
|
```
|
|
|
|
**Example:**
|
|
```bash
|
|
curl http://localhost:50004/api/0/alerts | jq .
|
|
```
|
|
|
|
---
|
|
|
|
## Web UI Pages
|
|
|
|
### Live Dashboard
|
|
**URL:** `/live`
|
|
|
|
Real-time dashboard showing:
|
|
- Host connection states
|
|
- IPv4/IPv6 connectivity
|
|
- Latency metrics
|
|
- Recent messages
|
|
|
|
**Features:**
|
|
- WebSocket-powered live updates
|
|
- Sortable columns
|
|
- Color-coded status indicators
|
|
|
|
### Plugin Metrics
|
|
**URL:** `/plugins`
|
|
|
|
Interactive visualization of plugin metrics:
|
|
- Select host and plugin from dropdown
|
|
- View current metric values
|
|
- Automatic refresh every 30 seconds
|
|
- Support for nested metrics (e.g., per-partition disk stats)
|
|
|
|
**Features:**
|
|
- Card-based metric display
|
|
- Unit formatting (%, MB, GB)
|
|
- Nested object visualization
|
|
- Auto-refresh
|
|
|
|
**Screenshots of available data:**
|
|
- CPU usage, load average, frequency
|
|
- Memory usage, available memory, swap
|
|
- Disk usage per partition, I/O statistics
|
|
- Network interface statistics, connection counts
|
|
- Custom plugin data
|
|
|
|
### Alerts Dashboard
|
|
**URL:** `/alerts`
|
|
|
|
Comprehensive alert monitoring:
|
|
- Summary cards (Critical, Warning, Total Hosts)
|
|
- Filter by severity (All, Critical, Warning)
|
|
- Alert details with duration
|
|
- Auto-refresh every 15 seconds
|
|
|
|
**Features:**
|
|
- Color-coded alert levels
|
|
- Duration tracking
|
|
- Filterable list
|
|
- Real-time updates
|
|
- Summary statistics
|
|
|
|
---
|
|
|
|
## Integration Examples
|
|
|
|
### Monitoring Script
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# Check for critical alerts and send notification
|
|
|
|
RESPONSE=$(curl -s http://localhost:50004/api/0/alerts)
|
|
CRITICAL_COUNT=$(echo "$RESPONSE" | jq '.summary.critical')
|
|
|
|
if [ "$CRITICAL_COUNT" -gt 0 ]; then
|
|
echo "CRITICAL: $CRITICAL_COUNT critical alerts detected!"
|
|
echo "$RESPONSE" | jq '.alerts[] | select(.level=="CRITICAL")'
|
|
# Send notification
|
|
# mail -s "Critical Alerts" admin@example.com < alert_details.txt
|
|
fi
|
|
```
|
|
|
|
### Python Client
|
|
|
|
```python
|
|
import requests
|
|
import json
|
|
|
|
# Get all plugin data for a host
|
|
response = requests.get('http://localhost:50004/api/0/hosts/webserver01/plugins')
|
|
data = response.json()
|
|
|
|
print(f"Host: {data['hostname']}")
|
|
print(f"Plugins: {', '.join(data['plugins'].keys())}")
|
|
|
|
for plugin, info in data['plugins'].items():
|
|
print(f"\n{plugin}:")
|
|
for metric, value in info['data'].items():
|
|
print(f" {metric}: {value}")
|
|
|
|
# Check for alerts
|
|
response = requests.get('http://localhost:50004/api/0/alerts')
|
|
alerts = response.json()
|
|
|
|
if alerts['summary']['critical'] > 0:
|
|
print(f"\n⚠️ {alerts['summary']['critical']} CRITICAL ALERTS!")
|
|
for alert in alerts['alerts']:
|
|
if alert['level'] == 'CRITICAL':
|
|
print(f" - {alert['hostname']}: {alert['metric_path']} = {alert['last_value']}")
|
|
```
|
|
|
|
### Grafana Integration
|
|
|
|
The API endpoints can be used with Grafana's JSON datasource plugin:
|
|
|
|
1. Install the SimpleJSON datasource plugin
|
|
2. Configure datasource URL: `http://your-server:50004`
|
|
3. Create queries:
|
|
- Metrics: `/api/0/hosts/webserver01/plugins/cpu_monitor?limit=100`
|
|
- Alerts: `/api/0/alerts`
|
|
|
|
### Prometheus Integration
|
|
|
|
Export metrics in Prometheus format (future enhancement):
|
|
|
|
```python
|
|
# Example prometheus exporter
|
|
from prometheus_client import Gauge, generate_latest
|
|
import requests
|
|
|
|
cpu_usage = Gauge('heartbeat_cpu_percent', 'CPU usage percentage', ['hostname'])
|
|
memory_usage = Gauge('heartbeat_memory_percent', 'Memory usage percentage', ['hostname'])
|
|
|
|
def collect_metrics():
|
|
hosts = requests.get('http://localhost:50004/api/0/hosts').json()
|
|
for host in hosts:
|
|
hostname = host['name']
|
|
plugins = requests.get(f'http://localhost:50004/api/0/hosts/{hostname}/plugins').json()
|
|
|
|
if 'cpu_monitor' in plugins['plugins']:
|
|
cpu_data = plugins['plugins']['cpu_monitor']['data']
|
|
cpu_usage.labels(hostname=hostname).set(cpu_data.get('cpu_percent', 0))
|
|
|
|
if 'memory_monitor' in plugins['plugins']:
|
|
mem_data = plugins['plugins']['memory_monitor']['data']
|
|
memory_usage.labels(hostname=hostname).set(mem_data.get('percent', 0))
|
|
```
|
|
|
|
---
|
|
|
|
## Response Formats
|
|
|
|
### Success Response
|
|
All successful API calls return HTTP 200 with JSON body:
|
|
```json
|
|
{
|
|
"field": "value",
|
|
...
|
|
}
|
|
```
|
|
|
|
### Error Response
|
|
API errors return appropriate HTTP status codes with JSON:
|
|
```json
|
|
{
|
|
"error": "Host 'unknown-host' not found"
|
|
}
|
|
```
|
|
|
|
**Common Status Codes:**
|
|
- `200 OK` - Success
|
|
- `400 Bad Request` - Invalid parameters
|
|
- `404 Not Found` - Resource not found
|
|
- `500 Internal Server Error` - Server error
|
|
|
|
---
|
|
|
|
## WebSocket API
|
|
|
|
For real-time updates, connect to the WebSocket endpoint:
|
|
|
|
**URL:** `ws://your-server:50005/hbd` (or `wss://` for secure)
|
|
|
|
**Messages:**
|
|
```json
|
|
{
|
|
"type": "host",
|
|
"data": {
|
|
"name": "webserver01",
|
|
"state": "UP"
|
|
}
|
|
}
|
|
```
|
|
|
|
```json
|
|
{
|
|
"type": "plugin",
|
|
"data": {
|
|
"host": "webserver01",
|
|
"plugin": "cpu_monitor",
|
|
"data": {...},
|
|
"timestamp": 1711234567.123
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Enable HTTP Server
|
|
|
|
```yaml
|
|
# In your hbd configuration file
|
|
hbd_host: "" # Listen on all interfaces
|
|
hbd_port: 50004 # HTTP port
|
|
ws_port: 50005 # WebSocket port (optional)
|
|
# wss_port: 50006 # Secure WebSocket (requires SSL)
|
|
```
|
|
|
|
### SSL/TLS Configuration
|
|
|
|
For secure WebSocket connections:
|
|
|
|
```yaml
|
|
wss_port: 50006
|
|
cert_path: /etc/heartbeat/certs/
|
|
wss_pem: server.pem
|
|
wss_key: server.key
|
|
```
|
|
|
|
---
|
|
|
|
## Rate Limiting
|
|
|
|
The API currently does not implement rate limiting. For production use, consider:
|
|
|
|
- Placing behind a reverse proxy (nginx, Apache)
|
|
- Using API gateway for rate limiting
|
|
- Implementing caching for frequently accessed endpoints
|
|
|
|
---
|
|
|
|
## CORS Support
|
|
|
|
By default, CORS is not enabled. To enable for web applications:
|
|
|
|
```python
|
|
# In http.py, add CORS middleware
|
|
from aiohttp_cors import setup as cors_setup
|
|
|
|
app = web.Application()
|
|
cors = cors_setup(app)
|
|
|
|
# Configure CORS for all routes
|
|
for route in list(app.router.routes()):
|
|
cors.add(route, {
|
|
"*": aiohttp_cors.ResourceOptions(
|
|
allow_credentials=True,
|
|
expose_headers="*",
|
|
allow_headers="*",
|
|
)
|
|
})
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Considerations
|
|
|
|
### Caching
|
|
- Plugin data is cached in memory (last 100 samples per plugin)
|
|
- No database queries required
|
|
- Responses are fast (<10ms typical)
|
|
|
|
### Scalability
|
|
- Each host stores its own data independently
|
|
- Memory usage: ~1KB per host + ~1KB per plugin sample
|
|
- For 100 hosts with 5 plugins: ~50MB memory
|
|
|
|
### Best Practices
|
|
1. Use `limit` parameter to control response size
|
|
2. Cache responses on client side when appropriate
|
|
3. Use WebSocket for real-time updates instead of polling
|
|
4. Consider pagination for large deployments (future enhancement)
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### API Returns 404
|
|
- Verify hostname in URL matches actual host name
|
|
- Check host is sending heartbeats: `curl http://localhost:50004/api/0/hosts`
|
|
|
|
### No Plugin Data
|
|
- Verify client is configured with plugins
|
|
- Check client logs for plugin errors
|
|
- Ensure plugins are sending data (check journal logs)
|
|
|
|
### Empty Alerts
|
|
- Verify thresholds are configured
|
|
- Check host is in `watchhosts` list
|
|
- Ensure plugins are collecting metrics
|
|
- Review server logs for threshold checker errors
|
|
|
|
---
|
|
|
|
## See Also
|
|
|
|
- [Plugin Development Guide](PLUGIN_DEVELOPMENT.md)
|
|
- [Threshold Alerting Documentation](THRESHOLD_ALERTING.md)
|
|
- [Message Journal Documentation](MESSAGE_JOURNAL.md)
|
|
- Configuration examples: `hbd/config_example.yaml`
|