Compare commits

...

18 Commits

Author SHA1 Message Date
Andreas Wrede 9af4006097 version 5.1.4
Release / release (push) Successful in 6s
2026-04-30 08:12:15 -04:00
Andreas Wrede ddf7067d13 feat: redesign Plugin Metrics page as Host Overview
Replace pill-tab plugin view with an accordion layout that shows key
metrics (CPU%, MEM%, top disk%, net delta, nagios status) at a glance
in each host card header. Plugin sections expand as structured tables.

- Rename page to "Host Overview" (URL /plugins unchanged)
- Three-wave parallel data loading: glance plugins on host expand,
  on-demand fetch for filesystem_info and extras
- Per-plugin table renderers with inline percent bars and threshold
  colour coding
- Add escHtml() for XSS-safe rendering of all field values
- Remove stale planning docs (REFACTORING.md, hbd/Plan.md)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-04-30 08:12:07 -04:00
andreas 505353a8a8 Update CLAUDE. md 2026-04-29 21:20:28 -04:00
andreas 0402d33c71 Add CLAUDE. md 2026-04-29 21:18:21 -04:00
andreas 7d8ca5d8db version 5.1.3
Release / release (push) Successful in 4s
2026-04-25 16:52:56 +02:00
andreas 56037a036d fix: remove unused pytest import in test_nagios_runner 2026-04-25 16:39:56 +02:00
andreas 65ceb31d8d fix: use os.path.exists check for /dev/log instead of dead-code OSError catch 2026-04-25 16:36:00 +02:00
andreas 1c9b6c1ca9 fix: reconfigure logging to syslog after daemonize() instead of no-op basicConfig
After daemonize() redirects stderr to /dev/null, the existing StreamHandler
writes to /dev/null. logging.basicConfig() is a no-op when handlers are
already configured, so log messages are silently lost.

Replace the daemon block to:
1. Call daemonize() first
2. Explicitly remove existing handlers (pointing to /dev/null)
3. Add SysLogHandler pointing to /dev/log with fallback to UDP localhost:514
4. Log startup message to the new syslog handler

Removes redundant syslog.openlog() call which is no longer needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:29:54 +02:00
andreas d7e6b478e1 fix: use shlex.split() in nagios_runner path validation to handle quoted paths 2026-04-25 16:28:32 +02:00
andreas 535dbda47d feat: validate absolute command paths at nagios_runner init 2026-04-25 16:24:33 +02:00
andreas c9567dddae fix: remove stale shell config key from NagiosRunnerPlugin docstring 2026-04-25 16:23:03 +02:00
andreas b5963badd6 feat: async subprocess in nagios_runner with stderr capture and signal handling
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:18:09 +02:00
andreas a76a39b4a0 fix: remove redundant no-commands log lines; fix skip_reason docstring style
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:15:44 +02:00
andreas 94e1597978 feat: set skip_reason on nagios_runner when no commands configured
When NagiosRunnerPlugin has no commands configured, set skip_reason before
returning False from initialize(). This allows PluginLoader to log INFO
(not WARNING) when the plugin is skipped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:13:03 +02:00
andreas c9c2ed772f fix: document skip_reason in Plugin docstring; remove unused import in test 2026-04-25 16:10:35 +02:00
andreas aeb78dcb8e feat: add skip_reason to Plugin; improve PluginLoader init messaging 2026-04-25 16:08:07 +02:00
andreas 77b337e4dd Add implementation plan for plugin error checking and daemon logging fixes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 16:04:13 +02:00
andreas 293461f3f6 Add design spec for plugin error checking and daemon logging fixes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 15:49:09 +02:00
15 changed files with 1913 additions and 1209 deletions
+4
View File
@@ -0,0 +1,4 @@
1. Don't assume. Don't hide confusion. Surface tradeoffs.
2. Minimum code that solves the problem. Nothing speculative.
3. Touch only what you must. Clean up only your own mess.
4. Define success criteria. Loop until verified.
-234
View File
@@ -1,234 +0,0 @@
# HBD/HBC Separation Refactoring
## Overview
The heartbeat monitoring system has been refactored into a modular package structure with separate client and server components. This allows users to install only what they need and provides clear separation of concerns.
## New Package Structure
```
hbd/
├── __init__.py # Main package (minimal)
├── client/ # HBC - System monitoring client
│ ├── __init__.py
│ ├── main.py # Entry point (was hbc.py)
│ ├── config.py # Client-specific configuration
│ ├── plugin.py # Plugin framework
│ ├── threshold.py # Threshold checking
│ └── plugins/ # Monitoring plugins
│ ├── cpu_monitor.py
│ ├── disk_monitor.py
│ ├── memory_monitor.py
│ ├── network_monitor.py
│ ├── filesystem_info.py
│ ├── os_info.py
│ └── nagios_runner.py
├── server/ # HBD - Heartbeat daemon/server
│ ├── __init__.py
│ ├── main.py # Server runtime (was server.py)
│ ├── cli.py # Command-line interface
│ ├── config.py # Server-specific configuration
│ ├── http.py # HTTP/REST API
│ ├── ws.py # WebSocket server
│ ├── udp.py # UDP heartbeat listener
│ ├── dns.py # DNS update functionality
│ ├── notify.py # Notification handlers
│ ├── monitor.py # Host monitoring
│ ├── hbdclass.py # Host class definitions
│ ├── journal.py # Message journaling
│ ├── templates/ # Jinja2 web templates
│ └── static/ # Web UI assets
└── common/ # Shared utilities
├── __init__.py
├── proto.py # Protocol encoding/decoding
└── utils.py # Common utilities
## Configuration Files
### Client Configuration (hbd/client/config.py)
Client-specific defaults:
- `hb_port`: Port where hbd servers listen (default: 50003)
- `interval`: Heartbeat interval in seconds (default: 10)
- `plugins`: Per-plugin configuration
- `thresholds`: Threshold configuration for monitoring
### Server Configuration (hbd/server/config.py)
Server-specific defaults:
- `hb_port`: Port to listen for heartbeats (default: 50003)
- `hbd_port`: HTTP API port (default: 50004)
- `ws_port`: WebSocket port (default: 50005)
- `logfile`: Log file path
- `pushsrv`, `pushover_token`, etc.: Notification settings
- `watchhosts`, `dyndnshosts`: Host monitoring
- `smtpserver`, etc.: Email settings
- `journal_*`: Message journaling settings
## Installation Options
### Install Core Only (minimal, PyYAML only)
```bash
pip install hbd
```
### Install Client Only (for monitoring)
```bash
pip install hbd[client]
# Installs: PyYAML, psutil
```
### Install Server Only (for daemon)
```bash
pip install hbd[server]
# Installs: PyYAML, websockets, mattermostdriver, aiohttp, Jinja2
```
### Install Everything
```bash
pip install hbd[all]
# Installs all dependencies for both client and server
```
### Development Installation
```bash
pip install -e ".[dev]"
# Includes all dependencies plus testing/linting tools
```
## Command-Line Interfaces
### HBC (Client)
```bash
hbc [options] host1 [host2 ...]
# Entry point: hbd.client.main:main
# Location: hbd/client/main.py
```
### HBD (Server)
```bash
hbd [options]
# Entry point: hbd.server.cli:main
# Location: hbd/server/cli.py → hbd/server/main.py
```
## Import Changes
### Client Code
```python
# Old imports
from .config import load_config
from .proto import dicttos, stodict
from .plugin import PluginRegistry
# New imports
from .config import load_config # Still in client/
from ..common.proto import dicttos # Moved to common/
from .plugin import PluginRegistry # Still in client/
```
### Server Code
```python
# Old imports
from .config import load_config
from .proto import stodict
from .threshold import AlertLevel
# New imports
from .config import load_config # Server-specific config
from ..common.proto import stodict # Moved to common/
from ..client.threshold import AlertLevel # Client module
```
### Plugin Code
```python
# Old import
from hbd.plugin import MonitorPlugin
# New import
from hbd.client.plugin import MonitorPlugin
```
## Benefits
1. **Modular Installation**: Install only what you need
- Client-only systems don't need web server dependencies
- Server-only systems don't need psutil
2. **Clearer Architecture**: Explicit separation of concerns
- Client: System monitoring and data collection
- Server: Heartbeat reception, web UI, notifications
- Common: Shared protocol and utilities
3. **Independent Evolution**: Client and server can evolve separately
- Different release cycles possible
- Clear API boundaries via common/
4. **Smaller Footprint**: Reduced dependency installation
- Client: ~1 dependency (psutil)
- Server: ~4 dependencies (websockets, aiohttp, Jinja2, mattermostdriver)
## Migration Guide
### For Existing Installations
1. **Reinstall the package**:
```bash
pip install -e ".[all]" # For development
# or
pip install hbd[all] # For production
```
2. **Configuration files remain unchanged**:
- Both client and server read from `~/.hb.yaml`
- All existing config keys are supported in both configs
- Server has additional keys (journal, websocket, email, etc.)
- Client has minimal keys (interval, plugins, thresholds)
3. **Commands remain the same**:
- `hbc` command works identically
- `hbd` command works identically
### For New Deployments
1. **Client-only system** (monitoring host):
```bash
pip install hbd[client]
hbc server1.example.com server2.example.com
```
2. **Server-only system** (monitoring daemon):
```bash
pip install hbd[server]
hbd -c /etc/hbd.yaml -f
```
3. **Combined system** (dev/test):
```bash
pip install hbd[all]
```
## Testing
All imports and entry points have been tested and validated:
- ✅ Package imports work correctly
- ✅ `hbc` command entry point functional
- ✅ `hbd` command entry point functional
- ✅ Optional dependencies properly configured
- ✅ All internal imports updated
## Files Archived
The following files were renamed to avoid conflicts:
- `hbd/config.py` → `hbd/config.py.old` (split into client/server configs)
- `hbd/hbc_old.py` → `hbd/hbc_old.py.bak` (backup file)
## Next Steps
1. Test client functionality with a monitoring host
2. Test server functionality with web UI and notifications
3. Update documentation (README.md) with new structure
4. Consider publishing to PyPI with new structure
5. Update any deployment scripts/Dockerfiles to use optional dependencies
@@ -0,0 +1,602 @@
# Plugin Error Checking Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Improve plugin error checking in hbc, especially for nagios_runner, and fix logger messages silently discarded in daemon mode.
**Architecture:** Three focused changes across three files: (1) `hbd/client/plugin.py` gains a `skip_reason` attribute on Plugin and updated PluginLoader messaging; (2) `hbd/client/plugins/nagios_runner.py` gains async subprocess execution, stderr capture, signal-killed process handling, and init-time command path validation; (3) `hbd/client/main.py` gains proper post-fork logging reconfiguration to syslog.
**Tech Stack:** Python 3.11+, asyncio, `logging.handlers.SysLogHandler`, pytest
---
## File Map
| Action | Path | What changes |
|---|---|---|
| Modify | `hbd/client/plugin.py` | `Plugin.__init__` gains `skip_reason`; `PluginLoader` checks it |
| Modify | `hbd/client/plugins/nagios_runner.py` | async subprocess, stderr, signal codes, init validation, `skip_reason` |
| Modify | `hbd/client/main.py` | `_reconfigure_logging_for_daemon()` helper; remove redundant syslog calls |
| Create | `tests/test_plugin.py` | PluginLoader messaging tests |
| Create | `tests/test_nagios_runner.py` | NagiosRunnerPlugin behaviour tests |
Run tests throughout with:
```bash
python -m pytest tests/test_plugin.py tests/test_nagios_runner.py -v
```
---
## Task 1: Plugin.skip_reason + PluginLoader messaging
**Files:**
- Modify: `hbd/client/plugin.py:40-48` (Plugin.__init__)
- Modify: `hbd/client/plugin.py:369-381` (PluginLoader.load_from_directory)
- Create: `tests/test_plugin.py`
- [ ] **Step 1: Write failing tests**
Create `tests/test_plugin.py`:
```python
import asyncio
import logging
import textwrap
from hbd.client.plugin import Plugin, PluginLoader, PluginRegistry
def test_plugin_skip_reason_defaults_none(tmp_path):
plugin_code = textwrap.dedent("""
from hbd.client.plugin import MonitorPlugin
class MinimalPlugin(MonitorPlugin):
name = "minimal"
version = "1.0.0"
interval = 60
async def initialize(self):
return True
async def _collect_metrics(self):
return {}
""")
(tmp_path / "minimal.py").write_text(plugin_code)
registry = PluginRegistry()
loader = PluginLoader(registry)
asyncio.run(loader.load_from_directory(tmp_path))
plugin = registry.get("minimal")
assert plugin is not None
assert plugin.skip_reason is None
def test_loader_logs_info_when_skip_reason_set(tmp_path, caplog):
plugin_code = textwrap.dedent("""
from hbd.client.plugin import MonitorPlugin
class SkippablePlugin(MonitorPlugin):
name = "skippable"
version = "1.0.0"
interval = 60
async def initialize(self):
self.skip_reason = "not configured in yaml"
return False
async def _collect_metrics(self):
return {}
""")
(tmp_path / "skippable.py").write_text(plugin_code)
registry = PluginRegistry()
loader = PluginLoader(registry)
with caplog.at_level(logging.INFO, logger="plugin.loader"):
count = asyncio.run(loader.load_from_directory(tmp_path))
assert count == 0
assert any("skipped: not configured in yaml" in r.message for r in caplog.records)
assert not any("failed initialization" in r.message for r in caplog.records)
def test_loader_logs_warning_when_no_skip_reason(tmp_path, caplog):
plugin_code = textwrap.dedent("""
from hbd.client.plugin import MonitorPlugin
class FailPlugin(MonitorPlugin):
name = "fail"
version = "1.0.0"
interval = 60
async def initialize(self):
return False
async def _collect_metrics(self):
return {}
""")
(tmp_path / "fail_plugin.py").write_text(plugin_code)
registry = PluginRegistry()
loader = PluginLoader(registry)
with caplog.at_level(logging.WARNING, logger="plugin.loader"):
count = asyncio.run(loader.load_from_directory(tmp_path))
assert count == 0
assert any("failed initialization" in r.message for r in caplog.records)
```
- [ ] **Step 2: Run tests to verify they fail**
```bash
python -m pytest tests/test_plugin.py -v
```
Expected: `test_plugin_skip_reason_defaults_none` FAILS (attribute missing), others may error.
- [ ] **Step 3: Add `skip_reason` to `Plugin.__init__`**
In `hbd/client/plugin.py`, in `Plugin.__init__` (around line 46), add one line:
```python
def __init__(self, config: Optional[Dict[str, Any]] = None):
self.config = config or {}
self.logger = logging.getLogger(f"plugin.{self.name}")
self._initialized = False
self.skip_reason: Optional[str] = None
```
- [ ] **Step 4: Update PluginLoader messaging**
In `hbd/client/plugin.py`, replace the `if not initialized:` block (around line 372):
```python
if not initialized:
if plugin.skip_reason:
self.logger.info(
f"Plugin {plugin.name} skipped: {plugin.skip_reason}"
)
else:
self.logger.warning(
f"Plugin {plugin.name} failed initialization, skipping"
)
continue
```
- [ ] **Step 5: Run tests to verify they pass**
```bash
python -m pytest tests/test_plugin.py -v
```
Expected: all 3 tests PASS.
- [ ] **Step 6: Commit**
```bash
git add hbd/client/plugin.py tests/test_plugin.py
git commit -m "feat: add skip_reason to Plugin; improve PluginLoader init messaging"
```
---
## Task 2: NagiosRunnerPlugin — skip_reason when no commands
**Files:**
- Modify: `hbd/client/plugins/nagios_runner.py:88-105` (initialize)
- Modify: `tests/test_nagios_runner.py` (create)
- [ ] **Step 1: Write failing test**
Create `tests/test_nagios_runner.py`:
```python
import asyncio
import logging
import os
import stat
import pytest
from hbd.client.plugins.nagios_runner import (
NagiosRunnerPlugin,
NAGIOS_OK,
NAGIOS_WARNING,
NAGIOS_CRITICAL,
NAGIOS_UNKNOWN,
)
def test_no_commands_sets_skip_reason():
plugin = NagiosRunnerPlugin(config={"commands": []})
result = asyncio.run(plugin.initialize())
assert result is False
assert plugin.skip_reason is not None
assert "nagios_runner.commands" in plugin.skip_reason
```
- [ ] **Step 2: Run test to verify it fails**
```bash
python -m pytest tests/test_nagios_runner.py::test_no_commands_sets_skip_reason -v
```
Expected: FAIL — `plugin.skip_reason` is `None`.
- [ ] **Step 3: Set skip_reason in NagiosRunnerPlugin.initialize()**
In `hbd/client/plugins/nagios_runner.py`, replace the early-return block in `initialize()` (around line 96):
```python
if not self.commands:
self.skip_reason = "no commands configured (add nagios_runner.commands to config)"
self.logger.info("No Nagios commands configured")
return False
```
- [ ] **Step 4: Run test to verify it passes**
```bash
python -m pytest tests/test_nagios_runner.py::test_no_commands_sets_skip_reason -v
```
Expected: PASS.
- [ ] **Step 5: Commit**
```bash
git add hbd/client/plugins/nagios_runner.py tests/test_nagios_runner.py
git commit -m "feat: set skip_reason on nagios_runner when no commands configured"
```
---
## Task 3: NagiosRunnerPlugin — async subprocess, stderr capture, negative return codes
**Files:**
- Modify: `hbd/client/plugins/nagios_runner.py` (imports + `_run_nagios_plugin`)
- Modify: `tests/test_nagios_runner.py`
- [ ] **Step 1: Write failing tests**
Append to `tests/test_nagios_runner.py`:
```python
def test_stderr_used_when_stdout_empty(tmp_path):
script = tmp_path / "check_err.sh"
script.write_text("#!/bin/sh\necho 'error from stderr' >&2\nexit 2\n")
script.chmod(script.stat().st_mode | stat.S_IEXEC)
config = {"commands": [{"name": "t", "command": str(script)}], "timeout": 5}
plugin = NagiosRunnerPlugin(config=config)
asyncio.run(plugin.initialize())
data = asyncio.run(plugin._collect_metrics())
assert "error from stderr" in data["t_output"]
assert data["t_status_code"] == NAGIOS_CRITICAL
def test_stderr_appended_when_both_present(tmp_path):
script = tmp_path / "check_both.sh"
script.write_text("#!/bin/sh\necho 'OK - all good'\necho 'extra detail' >&2\nexit 0\n")
script.chmod(script.stat().st_mode | stat.S_IEXEC)
config = {"commands": [{"name": "t", "command": str(script)}], "timeout": 5}
plugin = NagiosRunnerPlugin(config=config)
asyncio.run(plugin.initialize())
data = asyncio.run(plugin._collect_metrics())
assert "OK - all good" in data["t_output"]
assert "extra detail" in data["t_output"]
assert data["t_status_code"] == NAGIOS_OK
def test_negative_returncode_maps_to_unknown():
# kill -9 $$ kills the shell itself; asyncio sees returncode -9
config = {"commands": [{"name": "t", "command": "kill -9 $$"}], "timeout": 5}
plugin = NagiosRunnerPlugin(config=config)
asyncio.run(plugin.initialize())
data = asyncio.run(plugin._collect_metrics())
assert data["t_status_code"] == NAGIOS_UNKNOWN
assert "signal" in data["t_output"].lower()
```
- [ ] **Step 2: Run tests to verify they fail**
```bash
python -m pytest tests/test_nagios_runner.py::test_stderr_used_when_stdout_empty \
tests/test_nagios_runner.py::test_stderr_appended_when_both_present \
tests/test_nagios_runner.py::test_negative_returncode_maps_to_unknown -v
```
Expected: all FAIL — current implementation ignores stderr and doesn't handle negative codes.
- [ ] **Step 3: Update imports in nagios_runner.py**
Replace the import block at the top of `hbd/client/plugins/nagios_runner.py`:
```python
import asyncio
import os
import re
from typing import Any, Dict, List, Optional, Tuple
from hbd.client.plugin import MonitorPlugin
```
(Remove `import subprocess`; add `import asyncio` and `import os`.)
- [ ] **Step 4: Upgrade collection log level from DEBUG to INFO**
In `hbd/client/plugins/nagios_runner.py`, in `_collect_metrics()`, change the debug log (around line 144) so results are visible at INFO level:
```python
self.logger.info(
f"Executed {name}: {STATUS_NAMES.get(status_code, 'UNKNOWN')} - {output[:50]}"
)
```
- [ ] **Step 5: Replace `_run_nagios_plugin` with async implementation**
Replace the entire `_run_nagios_plugin` method in `hbd/client/plugins/nagios_runner.py`:
```python
async def _run_nagios_plugin(
self,
command: str
) -> Tuple[int, str, Dict[str, Any]]:
"""Execute a Nagios plugin and parse its output."""
try:
proc = await asyncio.create_subprocess_shell(
command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
try:
stdout_bytes, stderr_bytes = await asyncio.wait_for(
proc.communicate(), timeout=self.timeout
)
except asyncio.TimeoutError:
proc.kill()
await proc.communicate()
self.logger.error(f"Command timed out: {command}")
return NAGIOS_UNKNOWN, f"Command timed out after {self.timeout}s", {}
status_code = proc.returncode
if status_code < 0:
return NAGIOS_UNKNOWN, f"Process killed by signal {-status_code}", {}
if status_code > 3:
status_code = NAGIOS_UNKNOWN
stdout = stdout_bytes.decode(errors="replace").strip()
stderr = stderr_bytes.decode(errors="replace").strip()
# Parse perfdata from stdout before mixing in stderr
perfdata = self._parse_perfdata(stdout)
# Build status message
status_part = stdout.split('|')[0].strip() if '|' in stdout else stdout
if not stdout and stderr:
output_msg = stderr
elif stdout and stderr:
output_msg = f"{status_part} [stderr: {stderr}]"
else:
output_msg = status_part
return status_code, output_msg, perfdata
except Exception as e:
self.logger.error(f"Error executing command: {e}")
return NAGIOS_UNKNOWN, f"Execution error: {str(e)}", {}
```
Also remove the now-unused `self.shell` line from `__init__` (the `shell` config key is no longer used since `create_subprocess_shell` always uses a shell):
In `NagiosRunnerPlugin.__init__`, remove:
```python
self.shell: bool = config.get("shell", True) if config else True
```
- [ ] **Step 6: Run tests to verify they pass**
```bash
python -m pytest tests/test_nagios_runner.py -v
```
Expected: all tests PASS including the 3 new ones.
- [ ] **Step 7: Commit**
```bash
git add hbd/client/plugins/nagios_runner.py tests/test_nagios_runner.py
git commit -m "feat: async subprocess in nagios_runner with stderr capture and signal handling"
```
---
## Task 4: NagiosRunnerPlugin — command path validation at init
**Files:**
- Modify: `hbd/client/plugins/nagios_runner.py` (initialize)
- Modify: `tests/test_nagios_runner.py`
- [ ] **Step 1: Write failing tests**
Append to `tests/test_nagios_runner.py`:
```python
def test_absolute_path_not_found_warns(caplog):
fake_cmd = "/nonexistent_hbc_test_path/check_something"
config = {"commands": [{"name": "t", "command": fake_cmd}]}
plugin = NagiosRunnerPlugin(config=config)
with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
asyncio.run(plugin.initialize())
assert any("not found" in r.message for r in caplog.records)
def test_absolute_path_not_executable_warns(caplog, tmp_path):
non_exec = tmp_path / "check_test"
non_exec.write_text("#!/bin/sh\necho OK\n")
non_exec.chmod(0o644) # readable but not executable
config = {"commands": [{"name": "t", "command": str(non_exec)}]}
plugin = NagiosRunnerPlugin(config=config)
with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
asyncio.run(plugin.initialize())
assert any("not executable" in r.message for r in caplog.records)
def test_relative_path_not_checked(caplog):
# Relative paths (resolved via PATH) must not generate warnings
config = {"commands": [{"name": "t", "command": "echo OK"}]}
plugin = NagiosRunnerPlugin(config=config)
with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
asyncio.run(plugin.initialize())
assert not any(
"not found" in r.message or "not executable" in r.message
for r in caplog.records
)
```
- [ ] **Step 2: Run tests to verify they fail**
```bash
python -m pytest tests/test_nagios_runner.py::test_absolute_path_not_found_warns \
tests/test_nagios_runner.py::test_absolute_path_not_executable_warns \
tests/test_nagios_runner.py::test_relative_path_not_checked -v
```
Expected: `test_absolute_path_not_found_warns` and `test_absolute_path_not_executable_warns` FAIL (no warnings logged); `test_relative_path_not_checked` may pass.
- [ ] **Step 3: Add command path validation to `initialize()`**
In `hbd/client/plugins/nagios_runner.py`, extend `initialize()` by adding validation after the existing "log each command" loop (after line 103, before `return True`):
```python
# Validate absolute command paths early
for cmd_config in self.commands:
name = cmd_config.get("name", "unnamed")
command = cmd_config.get("command", "")
if not command:
continue
exe = command.split()[0]
if os.path.isabs(exe):
if not os.path.isfile(exe):
self.logger.warning(
f"Command '{name}': executable not found: {exe}"
)
elif not os.access(exe, os.X_OK):
self.logger.warning(
f"Command '{name}': executable not executable: {exe}"
)
```
- [ ] **Step 4: Run full test suite to verify all pass**
```bash
python -m pytest tests/test_plugin.py tests/test_nagios_runner.py -v
```
Expected: all tests PASS.
- [ ] **Step 5: Commit**
```bash
git add hbd/client/plugins/nagios_runner.py tests/test_nagios_runner.py
git commit -m "feat: validate absolute command paths at nagios_runner init"
```
---
## Task 5: Daemon mode logging — route to syslog after fork
**Files:**
- Modify: `hbd/client/main.py` (new helper + updated daemon block)
No automated test for daemonization itself (fork behaviour is hard to unit-test). Manual verification steps are provided below.
- [ ] **Step 1: Add `_reconfigure_logging_for_daemon` helper**
In `hbd/client/main.py`, add this function just before `def build_parser()` (around line 589):
```python
def _reconfigure_logging_for_daemon(log_level: int) -> None:
"""Replace StreamHandlers (now writing to /dev/null) with a SysLogHandler."""
from logging.handlers import SysLogHandler
root = logging.getLogger()
for handler in root.handlers[:]:
root.removeHandler(handler)
handler.close()
try:
syslog_handler = SysLogHandler(
address="/dev/log",
facility=SysLogHandler.LOG_DAEMON,
)
except OSError:
syslog_handler = SysLogHandler(
address=("localhost", 514),
facility=SysLogHandler.LOG_DAEMON,
)
# Attach the fallback first so the warning reaches syslog
syslog_handler.setFormatter(
logging.Formatter("hbc[%(process)d]: %(name)s %(levelname)s: %(message)s")
)
root.addHandler(syslog_handler)
root.setLevel(log_level)
logging.warning("/dev/log not found, using syslog UDP localhost:514")
return
syslog_handler.setFormatter(
logging.Formatter("hbc[%(process)d]: %(name)s %(levelname)s: %(message)s")
)
root.addHandler(syslog_handler)
root.setLevel(log_level)
```
- [ ] **Step 2: Update the daemon block in `main()`**
In `hbd/client/main.py`, replace the entire `if args.daemon:` block (lines 664675):
```python
if args.daemon:
print("Daemonizing...")
daemonize()
_reconfigure_logging_for_daemon(log_level)
logging.info(f"hbc starting, sending heartbeat to {', '.join(args.hosts)}")
```
This removes the `import syslog`, `syslog.openlog()`, and `syslog.syslog()` calls (now handled by the logging system) and removes the no-op second `logging.basicConfig()` call.
- [ ] **Step 3: Run existing test suite to confirm no regressions**
```bash
python -m pytest tests/test_plugin.py tests/test_nagios_runner.py -v
```
Expected: all tests still PASS.
- [ ] **Step 4: Manual smoke test — verify syslog output in daemon mode**
```bash
# In one terminal, tail syslog
sudo journalctl -f -t hbc
# In another terminal, start hbc in daemon mode (replace HOST with a real or dummy host)
python -m hbd.client.main -d -v localhost
# Expected in journalctl output:
# hbc[<pid>]: hbc.main INFO: Starting hbc for <hostname> -> ['localhost']
# hbc[<pid>]: hbc.main INFO: hbc starting, sending heartbeat to localhost
# hbc[<pid>]: plugin.loader INFO: ...
# Stop the daemon
pkill -f "hbd.client.main"
```
- [ ] **Step 5: Commit**
```bash
git add hbd/client/main.py
git commit -m "fix: reconfigure logging to syslog after daemonize() instead of no-op basicConfig"
```
@@ -0,0 +1,92 @@
# Plugin Error Checking & Daemon Logging — Design Spec
**Date:** 2026-04-25
**Scope:** hbc client — daemon mode logging, nagios_runner plugin robustness, PluginLoader messaging
**Files affected:** `hbd/client/main.py`, `hbd/client/plugins/nagios_runner.py`, `hbd/client/plugin.py`
---
## 1. Daemon Mode Logging
### Problem
In `main()`, `logging.basicConfig()` is called before `daemonize()` (establishing a StreamHandler to stderr), then called again after `daemonize()`. The second call is a no-op — Python ignores `basicConfig()` when handlers are already configured. After daemonization, stderr is redirected to `/dev/null`, so all subsequent log output is silently discarded.
The existing `syslog.openlog()` / `syslog.syslog()` calls (lines 666668) write a single startup message but do not integrate with the `logging` system, so plugin and connection log messages never reach syslog.
### Fix
After `daemonize()`, explicitly reconfigure the root logger:
1. Remove all existing handlers (they now write to `/dev/null`).
2. Add `logging.handlers.SysLogHandler(address='/dev/log', facility=LOG_DAEMON)`.
3. Set formatter: `hbc[%(process)d]: %(name)s %(levelname)s: %(message)s`
4. Preserve the `log_level` already determined from `-v`/`-x` CLI flags.
Remove the redundant `syslog.openlog()` / `syslog.syslog()` calls — the logging system handles routing.
**Fallback:** If `/dev/log` does not exist (containers, some BSDs), fall back to `SysLogHandler(address=('localhost', 514))`. Log one warning (to stderr, before handlers are replaced) so the operator knows.
---
## 2. Nagios Runner Improvements
### 2a — Async Subprocess
`_run_nagios_plugin()` is declared `async def` but calls `subprocess.run()` synchronously, blocking the event loop for the full command duration.
**Fix:** Replace with `asyncio.create_subprocess_shell()` + `await proc.communicate()`. Enforce timeout with `asyncio.wait_for(..., timeout=self.timeout)` and catch `asyncio.TimeoutError`.
### 2b — Stderr Capture
Subprocess stderr is currently discarded (`capture_output=True` only captures stdout in the sync call; stderr content is lost).
**Fix:** Pass `stderr=asyncio.subprocess.PIPE` to `create_subprocess_shell`. After `communicate()`, if stdout is empty but stderr has content, use stderr as the output message. If both have content, append stderr to the output for visibility.
### 2c — Negative Return Codes
A negative `returncode` means the process was killed by a signal (SIGKILL, OOM, etc.). The current code treats these as-is, which may produce unexpected status values.
**Fix:** If `returncode < 0`, map to `NAGIOS_UNKNOWN` with message `"Process killed by signal {-returncode}"`.
### 2d — Command Path Validation at Init
`initialize()` currently only checks that the commands list is non-empty.
**Fix:** For each command entry during `initialize()`:
- Warn and skip the entry if `name` or `command` is missing.
- Extract the executable (first whitespace-delimited token of the command string).
- If the executable is an absolute path, check `os.path.isfile()` and `os.access(..., os.X_OK)`. Log a `WARNING` if either check fails.
- Commands with relative paths or shell builtins are not checked (they may be on PATH) — just noted.
- Validation warns only; all original entries in `self.commands` are retained and still attempted at collection time (where the existing missing-name/command guard already skips them). The plugin initializes successfully as long as the commands list is non-empty.
---
## 3. PluginLoader Messaging
### Problem
When `initialize()` returns `False`, the loader always logs:
> `WARNING: Plugin X failed initialization, skipping`
This is alarming when the real reason is simply "no commands configured". There is no API to distinguish "not configured" from "genuinely broken".
### Fix
Add an optional `skip_reason` attribute to `Plugin.__init__()` (defaults to `None`).
In `PluginLoader.load_from_directory()`, after `initialize()` returns `False`:
- If `plugin.skip_reason` is set → `logger.info(f"Plugin {plugin.name} skipped: {plugin.skip_reason}")`
- If `plugin.skip_reason` is `None``logger.warning(f"Plugin {plugin.name} failed initialization, skipping")` (existing behaviour)
In `NagiosRunnerPlugin.initialize()`, when no commands are configured:
```python
self.skip_reason = "no commands configured (add nagios_runner.commands to config)"
return False
```
Genuine failures (exceptions) continue to go through the existing `except` block in the loader, logging at `ERROR` with traceback — unchanged.
---
## Decisions
| Topic | Decision |
|---|---|
| Daemon log destination | syslog only (LOG_DAEMON facility) |
| Syslog fallback | localhost:514 UDP if `/dev/log` absent |
| Nagios result log level | INFO for all statuses (OK/WARNING/CRITICAL/UNKNOWN) |
| Invalid command handling at init | Warn and continue; still attempt at collection time |
| PluginLoader API change | `skip_reason` attribute on Plugin base class, checked by loader |
-21
View File
@@ -1,21 +0,0 @@
Plan the following changes, ask questions to clarify before implementing
Re-factor the notification system:
- use available libraries for pushover, matrix, email and sms notifications.
- notifications have a title/subject: alert_type (recover/warning/critical), a body (info from threshold check) and a link to the host plugin metrix page
- define a list of notification channels for each user
- notifications are dispatched to users that are listed as managers for the host
1 - correct
2 - for now channels are defined globaly
3 - matrix-nio)sounds good, homeserver URL, access token, room ID per channel?
4 - use the REST api provided by https://voip.ms/api/v1/rest.php
5 - The page does not exist yet, point at the host tab in the /plugins
6 - per-channel minimum severity is a good idea, go fo it
7 - yes
1 - use base_url, there might not have been any incoming requests yet
2 - use same asyncio loop for matrix-nio
3 - for now, just silently do nothing
+1 -1
View File
@@ -14,4 +14,4 @@ Install options:
""" """
__all__ = ["__version__"] __all__ = ["__version__"]
__version__ = "5.1.2" __version__ = "5.1.4"
+33 -9
View File
@@ -15,6 +15,7 @@ import socket
import sys import sys
import time import time
from hashlib import md5 from hashlib import md5
from logging.handlers import SysLogHandler
from pathlib import Path from pathlib import Path
from typing import Dict, List, Optional from typing import Dict, List, Optional
@@ -586,6 +587,36 @@ def daemonize(
os.dup2(se.fileno(), sys.stderr.fileno()) os.dup2(se.fileno(), sys.stderr.fileno())
def _reconfigure_logging_for_daemon(log_level: int) -> None:
"""Replace StreamHandlers (now writing to /dev/null) with a SysLogHandler."""
root = logging.getLogger()
for handler in root.handlers[:]:
root.removeHandler(handler)
handler.close()
use_udp_fallback = not os.path.exists("/dev/log")
if use_udp_fallback:
syslog_handler = SysLogHandler(
address=("localhost", 514),
facility=SysLogHandler.LOG_DAEMON,
)
else:
syslog_handler = SysLogHandler(
address="/dev/log",
facility=SysLogHandler.LOG_DAEMON,
)
syslog_handler.setFormatter(
logging.Formatter("hbc[%(process)d]: %(name)s %(levelname)s: %(message)s")
)
root.addHandler(syslog_handler)
root.setLevel(log_level)
if use_udp_fallback:
logging.warning("/dev/log not found, using syslog UDP localhost:514")
def build_parser(): def build_parser():
"""Build argument parser.""" """Build argument parser."""
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
@@ -663,16 +694,9 @@ def main(argv=None):
# Daemonize if requested # Daemonize if requested
if args.daemon: if args.daemon:
print("Daemonizing...") print("Daemonizing...")
import syslog
syslog.openlog("hbc", syslog.LOG_PID, syslog.LOG_DAEMON)
syslog.syslog(syslog.LOG_INFO, f"Starting heartbeat to {', '.join(args.hosts)}")
daemonize() daemonize()
_reconfigure_logging_for_daemon(log_level)
# Reconfigure logging for syslog logging.info(f"hbc starting, sending heartbeat to {', '.join(args.hosts)}")
logging.basicConfig(
level=log_level,
format="hbc[%(process)d]: %(name)s %(levelname)s: %(message)s"
)
# Run async main # Run async main
try: try:
+12 -5
View File
@@ -22,13 +22,14 @@ from typing import Any, Dict, List, Optional, Type
class Plugin(ABC): class Plugin(ABC):
"""Base class for all plugins. """Base class for all plugins.
Attributes: Attributes:
name: Unique plugin identifier (e.g., "os_info", "cpu_monitor") name: Unique plugin identifier (e.g., "os_info", "cpu_monitor")
version: Plugin version string version: Plugin version string
description: Human-readable description description: Human-readable description
interval: Collection interval in seconds (0 for InfoPlugin = collect once) interval: Collection interval in seconds (0 for InfoPlugin = collect once)
enabled: Whether plugin is active (can be disabled via config) enabled: Whether plugin is active (can be disabled via config)
skip_reason: Set by plugin before returning False from initialize(); causes loader to log INFO instead of WARNING.
""" """
name: str = "" name: str = ""
@@ -39,13 +40,14 @@ class Plugin(ABC):
def __init__(self, config: Optional[Dict[str, Any]] = None): def __init__(self, config: Optional[Dict[str, Any]] = None):
"""Initialize plugin with optional configuration. """Initialize plugin with optional configuration.
Args: Args:
config: Plugin-specific configuration from YAML (e.g., thresholds, paths) config: Plugin-specific configuration from YAML (e.g., thresholds, paths)
""" """
self.config = config or {} self.config = config or {}
self.logger = logging.getLogger(f"plugin.{self.name}") self.logger = logging.getLogger(f"plugin.{self.name}")
self._initialized = False self._initialized = False
self.skip_reason: Optional[str] = None
@abstractmethod @abstractmethod
async def initialize(self) -> bool: async def initialize(self) -> bool:
@@ -369,9 +371,14 @@ class PluginLoader:
try: try:
initialized = await plugin.initialize() initialized = await plugin.initialize()
if not initialized: if not initialized:
self.logger.warning( if plugin.skip_reason:
f"Plugin {plugin.name} failed initialization, skipping" self.logger.info(
) f"Plugin {plugin.name} skipped: {plugin.skip_reason}"
)
else:
self.logger.warning(
f"Plugin {plugin.name} failed initialization, skipping"
)
continue continue
except Exception as e: except Exception as e:
self.logger.error( self.logger.error(
+69 -49
View File
@@ -21,8 +21,10 @@ nagios_runner:
``` ```
""" """
import asyncio
import os
import re import re
import subprocess import shlex
from typing import Any, Dict, List, Optional, Tuple from typing import Any, Dict, List, Optional, Tuple
from hbd.client.plugin import MonitorPlugin from hbd.client.plugin import MonitorPlugin
@@ -52,8 +54,7 @@ class NagiosRunnerPlugin(MonitorPlugin):
interval: Collection interval in seconds (default: 300) interval: Collection interval in seconds (default: 300)
commands: List of command definitions with 'name' and 'command' keys commands: List of command definitions with 'name' and 'command' keys
timeout: Command execution timeout in seconds (default: 30) timeout: Command execution timeout in seconds (default: 30)
shell: Whether to execute commands via shell (default: True)
Example: Example:
nagios_runner: nagios_runner:
interval: 300 # Check every 5 minutes interval: 300 # Check every 5 minutes
@@ -76,32 +77,48 @@ class NagiosRunnerPlugin(MonitorPlugin):
# Extract configuration # Extract configuration
self.commands: List[Dict[str, str]] = config.get("commands", []) if config else [] self.commands: List[Dict[str, str]] = config.get("commands", []) if config else []
self.timeout: int = config.get("timeout", 30) if config else 30 self.timeout: int = config.get("timeout", 30) if config else 30
self.shell: bool = config.get("shell", True) if config else True
self.interval = config.get("interval", 300) if config else 300 self.interval = config.get("interval", 300) if config else 300
# Validate commands
if not self.commands:
self.logger.info(
"No Nagios commands configured. Add 'nagios_runner.commands' to config."
)
async def initialize(self) -> bool: async def initialize(self) -> bool:
"""Initialize the Nagios runner plugin. """Initialize the Nagios runner plugin.
Returns: Returns:
True if at least one command is configured, False otherwise True if at least one command is configured, False otherwise
""" """
self.logger.info(f"Initializing {self.name} plugin") self.logger.info(f"Initializing {self.name} plugin")
if not self.commands: if not self.commands:
self.logger.info("No Nagios commands configured") self.skip_reason = "no commands configured (add nagios_runner.commands to config)"
return False return False
self.logger.info(f"Configured to run {len(self.commands)} Nagios plugin(s)") self.logger.info(f"Configured to run {len(self.commands)} Nagios plugin(s)")
for cmd_config in self.commands: for cmd_config in self.commands:
name = cmd_config.get("name", "unnamed") name = cmd_config.get("name", "unnamed")
self.logger.info(f" - {name}: {cmd_config.get('command', 'N/A')}") self.logger.info(f" - {name}: {cmd_config.get('command', 'N/A')}")
# Validate absolute command paths early
for cmd_config in self.commands:
name = cmd_config.get("name", "unnamed")
command = cmd_config.get("command", "")
if not command:
continue
try:
tokens = shlex.split(command)
except ValueError:
continue # malformed command string; skip validation
if not tokens:
continue
exe = tokens[0]
if os.path.isabs(exe):
if not os.path.isfile(exe):
self.logger.warning(
f"Command '{name}': executable not found: {exe}"
)
elif not os.access(exe, os.X_OK):
self.logger.warning(
f"Command '{name}': executable not executable: {exe}"
)
return True return True
async def _collect_metrics(self) -> Dict[str, Any]: async def _collect_metrics(self) -> Dict[str, Any]:
@@ -141,7 +158,7 @@ class NagiosRunnerPlugin(MonitorPlugin):
for metric_name, metric_value in perfdata.items(): for metric_name, metric_value in perfdata.items():
results[f"{name}_{metric_name}"] = metric_value results[f"{name}_{metric_name}"] = metric_value
self.logger.debug( self.logger.info(
f"Executed {name}: {STATUS_NAMES.get(status_code, 'UNKNOWN')} - {output[:50]}" f"Executed {name}: {STATUS_NAMES.get(status_code, 'UNKNOWN')} - {output[:50]}"
) )
@@ -163,46 +180,49 @@ class NagiosRunnerPlugin(MonitorPlugin):
self, self,
command: str command: str
) -> Tuple[int, str, Dict[str, Any]]: ) -> Tuple[int, str, Dict[str, Any]]:
"""Execute a Nagios plugin and parse its output. """Execute a Nagios plugin and parse its output."""
Args:
command: Command string to execute
Returns:
Tuple of (status_code, output_message, performance_data_dict)
"""
try: try:
# Run command proc = await asyncio.create_subprocess_shell(
result = subprocess.run(
command, command,
shell=self.shell, stdout=asyncio.subprocess.PIPE,
capture_output=True, stderr=asyncio.subprocess.PIPE,
timeout=self.timeout,
text=True
) )
try:
status_code = result.returncode stdout_bytes, stderr_bytes = await asyncio.wait_for(
output = result.stdout.strip() proc.communicate(), timeout=self.timeout
)
# Nagios plugins can return codes > 3, treat as UNKNOWN except asyncio.TimeoutError:
proc.kill()
await proc.communicate()
self.logger.error(f"Command timed out: {command}")
return NAGIOS_UNKNOWN, f"Command timed out after {self.timeout}s", {}
status_code = proc.returncode
if status_code < 0:
return NAGIOS_UNKNOWN, f"Process killed by signal {-status_code}", {}
if status_code > 3: if status_code > 3:
status_code = NAGIOS_UNKNOWN status_code = NAGIOS_UNKNOWN
# Parse performance data stdout = stdout_bytes.decode(errors="replace").strip()
perfdata = self._parse_perfdata(output) stderr = stderr_bytes.decode(errors="replace").strip()
# Extract just the status message (before the pipe if present) # Parse perfdata from stdout before mixing in stderr
if '|' in output: perfdata = self._parse_perfdata(stdout)
output_msg = output.split('|')[0].strip()
# Build status message
status_part = stdout.split('|')[0].strip() if '|' in stdout else stdout
if not stdout and stderr:
output_msg = stderr
elif stdout and stderr:
output_msg = f"{status_part} [stderr: {stderr}]"
else: else:
output_msg = output output_msg = status_part
return status_code, output_msg, perfdata return status_code, output_msg, perfdata
except subprocess.TimeoutExpired:
self.logger.error(f"Command timed out: {command}")
return NAGIOS_UNKNOWN, f"Command timed out after {self.timeout}s", {}
except Exception as e: except Exception as e:
self.logger.error(f"Error executing command: {e}") self.logger.error(f"Error executing command: {e}")
return NAGIOS_UNKNOWN, f"Execution error: {str(e)}", {} return NAGIOS_UNKNOWN, f"Execution error: {str(e)}", {}
+2 -2
View File
@@ -520,8 +520,8 @@ async def start(
tmpl = env.get_template("plugins.html") tmpl = env.get_template("plugins.html")
body = tmpl.render( body = tmpl.render(
title="Plugin Metrics - Heartbeat", title="Host Overview - Heartbeat",
header="Plugin Metrics", header="Host Overview",
hosts=hosts_with_plugins, hosts=hosts_with_plugins,
current_user=current_user.to_dict() if current_user else None, current_user=current_user.to_dict() if current_user else None,
active_page="plugins", active_page="plugins",
+1 -1
View File
@@ -4,7 +4,7 @@
</button> </button>
<div class="nav-links" id="nav-links"> <div class="nav-links" id="nav-links">
<a href="/live"{% if active_page == "live" %} class="active"{% endif %}>Live Dashboard</a> <a href="/live"{% if active_page == "live" %} class="active"{% endif %}>Live Dashboard</a>
<a href="/plugins"{% if active_page == "plugins" %} class="active"{% endif %}>Plugin Metrics</a> <a href="/plugins"{% if active_page == "plugins" %} class="active"{% endif %}>Host Overview</a>
<a href="/alerts"{% if active_page == "alerts" %} class="active"{% endif %}>Alerts</a> <a href="/alerts"{% if active_page == "alerts" %} class="active"{% endif %}>Alerts</a>
{% if current_user and current_user.admin %} {% if current_user and current_user.admin %}
<a href="/settings"{% if active_page == "settings" %} class="active"{% endif %}>Settings</a> <a href="/settings"{% if active_page == "settings" %} class="active"{% endif %}>Settings</a>
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "hbd" name = "hbd"
version = "5.1.2" version = "5.1.4"
description = "Heartbeat monitoring system — client (hbc) and server (hbd)" description = "Heartbeat monitoring system — client (hbc) and server (hbd)"
readme = "README.md" readme = "README.md"
requires-python = ">=3.11" requires-python = ">=3.11"
+99
View File
@@ -0,0 +1,99 @@
import asyncio
import logging
import os
import stat
from hbd.client.plugins.nagios_runner import (
NagiosRunnerPlugin,
NAGIOS_OK,
NAGIOS_WARNING,
NAGIOS_CRITICAL,
NAGIOS_UNKNOWN,
)
def test_no_commands_sets_skip_reason():
plugin = NagiosRunnerPlugin(config={"commands": []})
result = asyncio.run(plugin.initialize())
assert result is False
assert plugin.skip_reason is not None
assert "nagios_runner.commands" in plugin.skip_reason
def test_stderr_used_when_stdout_empty(tmp_path):
script = tmp_path / "check_err.sh"
script.write_text("#!/bin/sh\necho 'error from stderr' >&2\nexit 2\n")
script.chmod(script.stat().st_mode | stat.S_IEXEC)
config = {"commands": [{"name": "t", "command": str(script)}], "timeout": 5}
plugin = NagiosRunnerPlugin(config=config)
asyncio.run(plugin.initialize())
data = asyncio.run(plugin._collect_metrics())
assert "error from stderr" in data["t_output"]
assert data["t_status_code"] == NAGIOS_CRITICAL
def test_stderr_appended_when_both_present(tmp_path):
script = tmp_path / "check_both.sh"
script.write_text("#!/bin/sh\necho 'OK - all good'\necho 'extra detail' >&2\nexit 0\n")
script.chmod(script.stat().st_mode | stat.S_IEXEC)
config = {"commands": [{"name": "t", "command": str(script)}], "timeout": 5}
plugin = NagiosRunnerPlugin(config=config)
asyncio.run(plugin.initialize())
data = asyncio.run(plugin._collect_metrics())
assert "OK - all good" in data["t_output"]
assert "extra detail" in data["t_output"]
assert data["t_status_code"] == NAGIOS_OK
def test_negative_returncode_maps_to_unknown():
# kill -9 $$ kills the shell itself; asyncio sees returncode -9
config = {"commands": [{"name": "t", "command": "kill -9 $$"}], "timeout": 5}
plugin = NagiosRunnerPlugin(config=config)
asyncio.run(plugin.initialize())
data = asyncio.run(plugin._collect_metrics())
assert data["t_status_code"] == NAGIOS_UNKNOWN
assert "signal" in data["t_output"].lower()
def test_absolute_path_not_found_warns(caplog):
fake_cmd = "/nonexistent_hbc_test_path/check_something"
config = {"commands": [{"name": "t", "command": fake_cmd}]}
plugin = NagiosRunnerPlugin(config=config)
with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
asyncio.run(plugin.initialize())
assert any("not found" in r.message for r in caplog.records)
def test_absolute_path_not_executable_warns(caplog, tmp_path):
non_exec = tmp_path / "check_test"
non_exec.write_text("#!/bin/sh\necho OK\n")
non_exec.chmod(0o644) # readable but not executable
config = {"commands": [{"name": "t", "command": str(non_exec)}]}
plugin = NagiosRunnerPlugin(config=config)
with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
asyncio.run(plugin.initialize())
assert any("not executable" in r.message for r in caplog.records)
def test_relative_path_not_checked(caplog):
# Relative paths (resolved via PATH) must not generate warnings
config = {"commands": [{"name": "t", "command": "echo OK"}]}
plugin = NagiosRunnerPlugin(config=config)
with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
asyncio.run(plugin.initialize())
assert not any(
"not found" in r.message or "not executable" in r.message
for r in caplog.records
)
+83
View File
@@ -0,0 +1,83 @@
import asyncio
import logging
import textwrap
from hbd.client.plugin import PluginLoader, PluginRegistry
def test_plugin_skip_reason_defaults_none(tmp_path):
plugin_code = textwrap.dedent("""
from hbd.client.plugin import MonitorPlugin
class MinimalPlugin(MonitorPlugin):
name = "minimal"
version = "1.0.0"
interval = 60
async def initialize(self):
return True
async def _collect_metrics(self):
return {}
""")
(tmp_path / "minimal.py").write_text(plugin_code)
registry = PluginRegistry()
loader = PluginLoader(registry)
asyncio.run(loader.load_from_directory(tmp_path))
plugin = registry.get("minimal")
assert plugin is not None
assert plugin.skip_reason is None
def test_loader_logs_info_when_skip_reason_set(tmp_path, caplog):
plugin_code = textwrap.dedent("""
from hbd.client.plugin import MonitorPlugin
class SkippablePlugin(MonitorPlugin):
name = "skippable"
version = "1.0.0"
interval = 60
async def initialize(self):
self.skip_reason = "not configured in yaml"
return False
async def _collect_metrics(self):
return {}
""")
(tmp_path / "skippable.py").write_text(plugin_code)
registry = PluginRegistry()
loader = PluginLoader(registry)
with caplog.at_level(logging.INFO, logger="plugin.loader"):
count = asyncio.run(loader.load_from_directory(tmp_path))
assert count == 0
assert any("skipped: not configured in yaml" in r.message for r in caplog.records)
assert not any("failed initialization" in r.message for r in caplog.records)
def test_loader_logs_warning_when_no_skip_reason(tmp_path, caplog):
plugin_code = textwrap.dedent("""
from hbd.client.plugin import MonitorPlugin
class FailPlugin(MonitorPlugin):
name = "fail"
version = "1.0.0"
interval = 60
async def initialize(self):
return False
async def _collect_metrics(self):
return {}
""")
(tmp_path / "fail_plugin.py").write_text(plugin_code)
registry = PluginRegistry()
loader = PluginLoader(registry)
with caplog.at_level(logging.WARNING, logger="plugin.loader"):
count = asyncio.run(loader.load_from_directory(tmp_path))
assert count == 0
assert any("failed initialization" in r.message for r in caplog.records)