0543266c92
- Restructuring of the project directory into client and server components - Renaming of modules and classes to better reflect their purpose and functionality - Moving common utilities and configurations to a shared location - Updating import statements to reflect the new structure - Adding new documentation files for better clarity on various aspects of the project - Removing deprecated or unused code to streamline the codebase - Ensuring that all existing functionality is preserved and that the codebase remains functional after the refactoring.
414 lines
11 KiB
Markdown
414 lines
11 KiB
Markdown
# Message Journal
|
|
|
|
The message journal provides persistent logging of all received heartbeat messages with automatic size-based log rotation.
|
|
|
|
## Overview
|
|
|
|
The journal logs every message received by the heartbeat daemon (hbd) in JSON format, making it easy to:
|
|
- Audit message history
|
|
- Debug connection issues
|
|
- Analyze traffic patterns
|
|
- Replay messages for testing
|
|
- Create historical reports
|
|
|
|
## Features
|
|
|
|
- **JSON Format**: Each message is logged as a single JSON line for easy parsing
|
|
- **Size-Based Rotation**: Automatically rotates logs when size threshold is reached
|
|
- **Automatic Cleanup**: Keeps only a configurable number of backup files
|
|
- **Thread-Safe**: Safe for concurrent access from multiple async tasks
|
|
- **Configurable**: All settings controllable via configuration file
|
|
- **Performance**: Non-blocking async operation with minimal overhead
|
|
|
|
## Configuration
|
|
|
|
Add these settings to your hbd configuration file (e.g., `.hb.yaml`):
|
|
|
|
```yaml
|
|
# Message journal configuration
|
|
journal_enabled: true # Enable/disable journaling
|
|
journal_dir: /var/log/heartbeat # Directory for journal files
|
|
journal_file: messages.journal # Base filename
|
|
journal_max_size: 104857600 # Max size in bytes (100MB default)
|
|
journal_max_backups: 10 # Number of backup files to keep
|
|
```
|
|
|
|
### Configuration Options
|
|
|
|
| Option | Default | Description |
|
|
|--------|---------|-------------|
|
|
| `journal_enabled` | `true` | Enable or disable message journaling |
|
|
| `journal_dir` | `/var/log/heartbeat` | Directory where journal files are stored |
|
|
| `journal_file` | `messages.journal` | Base filename for the journal |
|
|
| `journal_max_size` | `104857600` (100MB) | Maximum file size before rotation |
|
|
| `journal_max_backups` | `10` | Number of rotated backup files to keep |
|
|
|
|
## File Format
|
|
|
|
Messages are logged in JSONL (JSON Lines) format - one JSON object per line:
|
|
|
|
```json
|
|
{"timestamp":1711234567.123,"datetime":"2026-03-28T12:34:56","source_ip":"192.168.1.100","source_port":50003,"message":{"ID":"HTB","name":"webserver1","interval":30,"ver":1}}
|
|
{"timestamp":1711234597.456,"datetime":"2026-03-28T12:35:37","source_ip":"192.168.1.101","source_port":50003,"message":{"ID":"PLG","plugin":"cpu_monitor","cpu_percent":45.2,"load_1min":1.5}}
|
|
```
|
|
|
|
### Entry Structure
|
|
|
|
Each journal entry contains:
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `timestamp` | float | Unix timestamp (seconds since epoch) |
|
|
| `datetime` | string | ISO 8601 formatted datetime |
|
|
| `source_ip` | string | Source IP address |
|
|
| `source_port` | integer | Source UDP port |
|
|
| `message` | object | Complete parsed message dictionary |
|
|
|
|
## Log Rotation
|
|
|
|
### How Rotation Works
|
|
|
|
1. Journal writes messages to the current file
|
|
2. When file size exceeds `journal_max_size`, rotation is triggered
|
|
3. Current file is renamed with timestamp: `messages.journal.YYYYMMDD-HHMMSS`
|
|
4. New empty file is created as the current journal
|
|
5. Old backup files exceeding `journal_max_backups` are deleted
|
|
|
|
### Example File Structure
|
|
|
|
```
|
|
/var/log/heartbeat/
|
|
├── messages.journal # Current active journal
|
|
├── messages.journal.20260328-120000 # Rotated backup
|
|
├── messages.journal.20260328-140000 # Rotated backup
|
|
└── messages.journal.20260328-160000 # Rotated backup (oldest)
|
|
```
|
|
|
|
### Rotation Behavior
|
|
|
|
- Rotation is triggered when the next message would exceed the size limit
|
|
- Rotation is automatic and requires no manual intervention
|
|
- Old backups are deleted in FIFO order (oldest first)
|
|
- Rotation is thread-safe and won't lose messages
|
|
|
|
## Usage Examples
|
|
|
|
### Reading Journal Files
|
|
|
|
#### Using Python
|
|
|
|
```python
|
|
import json
|
|
|
|
# Read all entries from current journal
|
|
with open('/var/log/heartbeat/messages.journal', 'r') as f:
|
|
for line in f:
|
|
entry = json.loads(line)
|
|
print(f"{entry['datetime']} - {entry['source_ip']} - {entry['message']['ID']}")
|
|
```
|
|
|
|
#### Using jq (command line)
|
|
|
|
```bash
|
|
# View all messages
|
|
cat /var/log/heartbeat/messages.journal | jq .
|
|
|
|
# Filter by message type
|
|
cat /var/log/heartbeat/messages.journal | jq 'select(.message.ID == "HTB")'
|
|
|
|
# Filter by hostname
|
|
cat /var/log/heartbeat/messages.journal | jq 'select(.message.name == "webserver1")'
|
|
|
|
# Count messages by type
|
|
cat /var/log/heartbeat/messages.journal | jq -r '.message.ID' | sort | uniq -c
|
|
|
|
# Extract timestamps and source IPs
|
|
cat /var/log/heartbeat/messages.journal | jq -r '[.datetime, .source_ip, .message.ID] | @tsv'
|
|
```
|
|
|
|
#### Using shell tools
|
|
|
|
```bash
|
|
# Count total messages
|
|
wc -l /var/log/heartbeat/messages.journal
|
|
|
|
# View recent messages
|
|
tail -n 100 /var/log/heartbeat/messages.journal | jq .
|
|
|
|
# Search for specific host
|
|
grep -F '"name":"webserver1"' /var/log/heartbeat/messages.journal
|
|
|
|
# Check journal file size
|
|
du -h /var/log/heartbeat/messages.journal
|
|
```
|
|
|
|
### Analyzing Historical Data
|
|
|
|
```bash
|
|
# Combine all journal files (current + backups)
|
|
cat /var/log/heartbeat/messages.journal* | jq . > all_messages.json
|
|
|
|
# Count messages per host
|
|
cat /var/log/heartbeat/messages.journal* | jq -r '.message.name // "unknown"' | sort | uniq -c
|
|
|
|
# Find all plugin messages
|
|
cat /var/log/heartbeat/messages.journal* | jq 'select(.message.ID == "PLG")'
|
|
|
|
# Extract CPU metrics from plugin messages
|
|
cat /var/log/heartbeat/messages.journal* | \
|
|
jq 'select(.message.plugin == "cpu_monitor") | {time: .datetime, host: .message.name, cpu: .message.cpu_percent}'
|
|
```
|
|
|
|
## Integration with Log Management
|
|
|
|
### Logrotate
|
|
|
|
While the journal has built-in rotation, you can also use logrotate for additional management:
|
|
|
|
```
|
|
/var/log/heartbeat/messages.journal.* {
|
|
daily
|
|
rotate 30
|
|
compress
|
|
delaycompress
|
|
missingok
|
|
notifempty
|
|
}
|
|
```
|
|
|
|
### Elasticsearch/OpenSearch
|
|
|
|
Import journal data into Elasticsearch for advanced analysis:
|
|
|
|
```python
|
|
from elasticsearch import Elasticsearch
|
|
import json
|
|
|
|
es = Elasticsearch(['localhost:9200'])
|
|
|
|
with open('/var/log/heartbeat/messages.journal', 'r') as f:
|
|
for line in f:
|
|
entry = json.loads(line)
|
|
es.index(index='heartbeat-messages', body=entry)
|
|
```
|
|
|
|
### Splunk
|
|
|
|
Create a Splunk input for the journal:
|
|
|
|
```ini
|
|
[monitor:///var/log/heartbeat/messages.journal*]
|
|
sourcetype = heartbeat_json
|
|
index = heartbeat
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
### Overhead
|
|
|
|
- Journal writing is async and non-blocking
|
|
- Typical overhead: < 1ms per message
|
|
- Minimal impact on heartbeat processing
|
|
|
|
### Disk Usage
|
|
|
|
Calculate expected disk usage:
|
|
|
|
```
|
|
Messages per day = (86400 seconds / interval) * number_of_hosts
|
|
Average message size ≈ 200-500 bytes
|
|
Daily disk usage = Messages per day * Average message size
|
|
|
|
Example:
|
|
- 100 hosts
|
|
- 30 second interval
|
|
- 2880 messages/day per host
|
|
- 288,000 messages/day total
|
|
- ~60-140 MB/day
|
|
```
|
|
|
|
### Recommendations
|
|
|
|
- **Small deployments** (< 50 hosts): Default settings work well
|
|
- **Medium deployments** (50-500 hosts): Increase `journal_max_size` to 500MB, `journal_max_backups` to 20
|
|
- **Large deployments** (> 500 hosts): Consider 1GB+ journal files, 30+ backups, or external log aggregation
|
|
|
|
## Monitoring
|
|
|
|
### Check Journal Status
|
|
|
|
The journal exposes statistics that can be queried:
|
|
|
|
```python
|
|
from hbd.journal import get_journal
|
|
|
|
journal = get_journal()
|
|
stats = journal.get_stats()
|
|
print(f"Current size: {stats['current_size']:,} bytes")
|
|
print(f"Rotation threshold: {stats['rotation_threshold']}")
|
|
```
|
|
|
|
### Log Messages
|
|
|
|
Journal operations are logged at appropriate levels:
|
|
|
|
- `INFO`: Initialization, rotation events, cleanup
|
|
- `DEBUG`: Individual message logging
|
|
- `WARNING`: Non-critical issues
|
|
- `ERROR`: Critical failures
|
|
|
|
Check hbd logs for journal-related messages:
|
|
|
|
```bash
|
|
grep journal /var/log/heartbeat.log
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Journal Files Not Created
|
|
|
|
**Problem**: No journal files appear in the configured directory.
|
|
|
|
**Solutions**:
|
|
- Check `journal_enabled: true` in configuration
|
|
- Verify directory exists and hbd has write permissions
|
|
- Check hbd logs for initialization errors
|
|
- Verify disk space is available
|
|
|
|
### Rotation Not Working
|
|
|
|
**Problem**: Journal file grows beyond `journal_max_size`.
|
|
|
|
**Solutions**:
|
|
- Check that `journal_max_size` is properly configured
|
|
- Verify hbd has permission to rename/create files
|
|
- Check for filesystem issues
|
|
- Review hbd logs for rotation errors
|
|
|
|
### Missing Messages
|
|
|
|
**Problem**: Some messages don't appear in journal.
|
|
|
|
**Solutions**:
|
|
- Verify `journal_enabled: true`
|
|
- Check for write errors in hbd logs
|
|
- Verify sufficient disk space
|
|
- Check if filesystem is read-only
|
|
|
|
### Performance Issues
|
|
|
|
**Problem**: Journal causing slow message processing.
|
|
|
|
**Solutions**:
|
|
- Use faster storage (SSD) for journal directory
|
|
- Increase `journal_max_size` to reduce rotation frequency
|
|
- Disable journal if not needed: `journal_enabled: false`
|
|
- Consider async syslog forwarding instead
|
|
|
|
## Security Considerations
|
|
|
|
### File Permissions
|
|
|
|
Ensure proper permissions on journal files:
|
|
|
|
```bash
|
|
# Journal directory
|
|
chmod 750 /var/log/heartbeat
|
|
chown hbd:hbd /var/log/heartbeat
|
|
|
|
# Journal files
|
|
chmod 640 /var/log/heartbeat/messages.journal*
|
|
```
|
|
|
|
### Sensitive Data
|
|
|
|
Journal files may contain:
|
|
- Hostnames and IP addresses
|
|
- System metrics
|
|
- Custom message content
|
|
|
|
**Recommendations**:
|
|
- Restrict read access to authorized users only
|
|
- Consider encryption for archived journals
|
|
- Implement log retention policies
|
|
- Sanitize data if sharing for debugging
|
|
|
|
## API Reference
|
|
|
|
### MessageJournal Class
|
|
|
|
```python
|
|
class MessageJournal:
|
|
def __init__(self, config: Dict[str, Any])
|
|
async def initialize(self) -> bool
|
|
async def log_message(self, msg: Dict, addr: tuple, timestamp: float)
|
|
async def close(self)
|
|
def get_stats(self) -> Dict[str, Any]
|
|
```
|
|
|
|
### Module Functions
|
|
|
|
```python
|
|
def get_journal(config: Dict = None) -> MessageJournal
|
|
async def log_message(msg: Dict, addr: tuple, timestamp: float = None)
|
|
```
|
|
|
|
## Example: Custom Message Processing
|
|
|
|
Process journal messages in real-time:
|
|
|
|
```python
|
|
import asyncio
|
|
import json
|
|
from pathlib import Path
|
|
|
|
async def tail_journal(journal_path):
|
|
"""Follow journal file and process new messages."""
|
|
path = Path(journal_path)
|
|
|
|
with open(path, 'r') as f:
|
|
# Jump to end
|
|
f.seek(0, 2)
|
|
|
|
while True:
|
|
line = f.readline()
|
|
if line:
|
|
entry = json.loads(line)
|
|
await process_message(entry)
|
|
else:
|
|
await asyncio.sleep(0.1)
|
|
|
|
async def process_message(entry):
|
|
"""Process a journal entry."""
|
|
msg = entry['message']
|
|
|
|
# Alert on boot messages
|
|
if msg.get('boot'):
|
|
print(f"ALERT: {msg['name']} rebooted at {entry['datetime']}")
|
|
|
|
# Track CPU usage
|
|
if msg.get('ID') == 'PLG' and msg.get('plugin') == 'cpu_monitor':
|
|
cpu = msg.get('cpu_percent', 0)
|
|
if cpu > 90:
|
|
print(f"WARNING: {entry['source_ip']} CPU usage: {cpu}%")
|
|
```
|
|
|
|
## Future Enhancements
|
|
|
|
Potential improvements for future versions:
|
|
|
|
- Compression of rotated logs (gzip)
|
|
- Time-based rotation in addition to size-based
|
|
- Filtering to exclude certain message types
|
|
- Structured logging output formats (CEF, GELF)
|
|
- Remote syslog forwarding
|
|
- Message deduplication
|
|
- Journal file encryption
|
|
- Signed journal entries
|
|
|
|
## See Also
|
|
|
|
- [Configuration Guide](../hbd/config.py) - Full configuration options
|
|
- [UDP Protocol](../hbd/udp.py) - Message handling
|
|
- [Server Architecture](../hbd/server.py) - Server initialization
|