version 5.1.3

fix: remove unused pytest import in test_nagios_runner
fix: use os.path.exists check for /dev/log instead of dead-code OSError catch
2026-04-25 16:52:56 +02:00 · 2026-04-25 16:39:56 +02:00 · 2026-04-25 16:36:00 +02:00 · 2026-04-25 16:29:54 +02:00 · 2026-04-25 16:28:32 +02:00 · 2026-04-25 16:24:33 +02:00
9 changed files with 992 additions and 65 deletions
@@ -0,0 +1,602 @@
 # Plugin Error Checking Implementation Plan
 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
 **Goal:** Improve plugin error checking in hbc, especially for nagios_runner, and fix logger messages silently discarded in daemon mode.
 **Architecture:** Three focused changes across three files: (1) `hbd/client/plugin.py` gains a `skip_reason` attribute on Plugin and updated PluginLoader messaging; (2) `hbd/client/plugins/nagios_runner.py` gains async subprocess execution, stderr capture, signal-killed process handling, and init-time command path validation; (3) `hbd/client/main.py` gains proper post-fork logging reconfiguration to syslog.
 **Tech Stack:** Python 3.11+, asyncio, `logging.handlers.SysLogHandler`, pytest
 ---
 ## File Map
 | Action | Path | What changes |
 |---|---|---|
 | Modify | `hbd/client/plugin.py` | `Plugin.__init__` gains `skip_reason`; `PluginLoader` checks it |
 | Modify | `hbd/client/plugins/nagios_runner.py` | async subprocess, stderr, signal codes, init validation, `skip_reason` |
 | Modify | `hbd/client/main.py` | `_reconfigure_logging_for_daemon()` helper; remove redundant syslog calls |
 | Create | `tests/test_plugin.py` | PluginLoader messaging tests |
 | Create | `tests/test_nagios_runner.py` | NagiosRunnerPlugin behaviour tests |
 Run tests throughout with:
 ```bash
 python -m pytest tests/test_plugin.py tests/test_nagios_runner.py -v
 ```
 ---
 ## Task 1: Plugin.skip_reason + PluginLoader messaging
 **Files:**
 - Modify: `hbd/client/plugin.py:40-48` (Plugin.__init__)
 - Modify: `hbd/client/plugin.py:369-381` (PluginLoader.load_from_directory)
 - Create: `tests/test_plugin.py`
 - [ ] **Step 1: Write failing tests**
 Create `tests/test_plugin.py`:
 ```python
 import asyncio
 import logging
 import textwrap
 from hbd.client.plugin import Plugin, PluginLoader, PluginRegistry
 def test_plugin_skip_reason_defaults_none(tmp_path):
    plugin_code = textwrap.dedent("""
        from hbd.client.plugin import MonitorPlugin
        class MinimalPlugin(MonitorPlugin):
            name = "minimal"
            version = "1.0.0"
            interval = 60
            async def initialize(self):
                return True
            async def _collect_metrics(self):
                return {}
    """)
    (tmp_path / "minimal.py").write_text(plugin_code)
    registry = PluginRegistry()
    loader = PluginLoader(registry)
    asyncio.run(loader.load_from_directory(tmp_path))
    plugin = registry.get("minimal")
    assert plugin is not None
    assert plugin.skip_reason is None
 def test_loader_logs_info_when_skip_reason_set(tmp_path, caplog):
    plugin_code = textwrap.dedent("""
        from hbd.client.plugin import MonitorPlugin
        class SkippablePlugin(MonitorPlugin):
            name = "skippable"
            version = "1.0.0"
            interval = 60
            async def initialize(self):
                self.skip_reason = "not configured in yaml"
                return False
            async def _collect_metrics(self):
                return {}
    """)
    (tmp_path / "skippable.py").write_text(plugin_code)
    registry = PluginRegistry()
    loader = PluginLoader(registry)
    with caplog.at_level(logging.INFO, logger="plugin.loader"):
        count = asyncio.run(loader.load_from_directory(tmp_path))
    assert count == 0
    assert any("skipped: not configured in yaml" in r.message for r in caplog.records)
    assert not any("failed initialization" in r.message for r in caplog.records)
 def test_loader_logs_warning_when_no_skip_reason(tmp_path, caplog):
    plugin_code = textwrap.dedent("""
        from hbd.client.plugin import MonitorPlugin
        class FailPlugin(MonitorPlugin):
            name = "fail"
            version = "1.0.0"
            interval = 60
            async def initialize(self):
                return False
            async def _collect_metrics(self):
                return {}
    """)
    (tmp_path / "fail_plugin.py").write_text(plugin_code)
    registry = PluginRegistry()
    loader = PluginLoader(registry)
    with caplog.at_level(logging.WARNING, logger="plugin.loader"):
        count = asyncio.run(loader.load_from_directory(tmp_path))
    assert count == 0
    assert any("failed initialization" in r.message for r in caplog.records)
 ```
 - [ ] **Step 2: Run tests to verify they fail**
 ```bash
 python -m pytest tests/test_plugin.py -v
 ```
 Expected: `test_plugin_skip_reason_defaults_none` FAILS (attribute missing), others may error.
 - [ ] **Step 3: Add `skip_reason` to `Plugin.__init__`**
 In `hbd/client/plugin.py`, in `Plugin.__init__` (around line 46), add one line:
 ```python
 def __init__(self, config: Optional[Dict[str, Any]] = None):
    self.config = config or {}
    self.logger = logging.getLogger(f"plugin.{self.name}")
    self._initialized = False
    self.skip_reason: Optional[str] = None
 ```
 - [ ] **Step 4: Update PluginLoader messaging**
 In `hbd/client/plugin.py`, replace the `if not initialized:` block (around line 372):
 ```python
                    if not initialized:
                        if plugin.skip_reason:
                            self.logger.info(
                                f"Plugin {plugin.name} skipped: {plugin.skip_reason}"
                            )
                        else:
                            self.logger.warning(
                                f"Plugin {plugin.name} failed initialization, skipping"
                            )
                        continue
 ```
 - [ ] **Step 5: Run tests to verify they pass**
 ```bash
 python -m pytest tests/test_plugin.py -v
 ```
 Expected: all 3 tests PASS.
 - [ ] **Step 6: Commit**
 ```bash
 git add hbd/client/plugin.py tests/test_plugin.py
 git commit -m "feat: add skip_reason to Plugin; improve PluginLoader init messaging"
 ```
 ---
 ## Task 2: NagiosRunnerPlugin — skip_reason when no commands
 **Files:**
 - Modify: `hbd/client/plugins/nagios_runner.py:88-105` (initialize)
 - Modify: `tests/test_nagios_runner.py` (create)
 - [ ] **Step 1: Write failing test**
 Create `tests/test_nagios_runner.py`:
 ```python
 import asyncio
 import logging
 import os
 import stat
 import pytest
 from hbd.client.plugins.nagios_runner import (
    NagiosRunnerPlugin,
    NAGIOS_OK,
    NAGIOS_WARNING,
    NAGIOS_CRITICAL,
    NAGIOS_UNKNOWN,
 )
 def test_no_commands_sets_skip_reason():
    plugin = NagiosRunnerPlugin(config={"commands": []})
    result = asyncio.run(plugin.initialize())
    assert result is False
    assert plugin.skip_reason is not None
    assert "nagios_runner.commands" in plugin.skip_reason
 ```
 - [ ] **Step 2: Run test to verify it fails**
 ```bash
 python -m pytest tests/test_nagios_runner.py::test_no_commands_sets_skip_reason -v
 ```
 Expected: FAIL — `plugin.skip_reason` is `None`.
 - [ ] **Step 3: Set skip_reason in NagiosRunnerPlugin.initialize()**
 In `hbd/client/plugins/nagios_runner.py`, replace the early-return block in `initialize()` (around line 96):
 ```python
        if not self.commands:
            self.skip_reason = "no commands configured (add nagios_runner.commands to config)"
            self.logger.info("No Nagios commands configured")
            return False
 ```
 - [ ] **Step 4: Run test to verify it passes**
 ```bash
 python -m pytest tests/test_nagios_runner.py::test_no_commands_sets_skip_reason -v
 ```
 Expected: PASS.
 - [ ] **Step 5: Commit**
 ```bash
 git add hbd/client/plugins/nagios_runner.py tests/test_nagios_runner.py
 git commit -m "feat: set skip_reason on nagios_runner when no commands configured"
 ```
 ---
 ## Task 3: NagiosRunnerPlugin — async subprocess, stderr capture, negative return codes
 **Files:**
 - Modify: `hbd/client/plugins/nagios_runner.py` (imports + `_run_nagios_plugin`)
 - Modify: `tests/test_nagios_runner.py`
 - [ ] **Step 1: Write failing tests**
 Append to `tests/test_nagios_runner.py`:
 ```python
 def test_stderr_used_when_stdout_empty(tmp_path):
    script = tmp_path / "check_err.sh"
    script.write_text("#!/bin/sh\necho 'error from stderr' >&2\nexit 2\n")
    script.chmod(script.stat().st_mode | stat.S_IEXEC)
    config = {"commands": [{"name": "t", "command": str(script)}], "timeout": 5}
    plugin = NagiosRunnerPlugin(config=config)
    asyncio.run(plugin.initialize())
    data = asyncio.run(plugin._collect_metrics())
    assert "error from stderr" in data["t_output"]
    assert data["t_status_code"] == NAGIOS_CRITICAL
 def test_stderr_appended_when_both_present(tmp_path):
    script = tmp_path / "check_both.sh"
    script.write_text("#!/bin/sh\necho 'OK - all good'\necho 'extra detail' >&2\nexit 0\n")
    script.chmod(script.stat().st_mode | stat.S_IEXEC)
    config = {"commands": [{"name": "t", "command": str(script)}], "timeout": 5}
    plugin = NagiosRunnerPlugin(config=config)
    asyncio.run(plugin.initialize())
    data = asyncio.run(plugin._collect_metrics())
    assert "OK - all good" in data["t_output"]
    assert "extra detail" in data["t_output"]
    assert data["t_status_code"] == NAGIOS_OK
 def test_negative_returncode_maps_to_unknown():
    # kill -9 $$ kills the shell itself; asyncio sees returncode -9
    config = {"commands": [{"name": "t", "command": "kill -9 $$"}], "timeout": 5}
    plugin = NagiosRunnerPlugin(config=config)
    asyncio.run(plugin.initialize())
    data = asyncio.run(plugin._collect_metrics())
    assert data["t_status_code"] == NAGIOS_UNKNOWN
    assert "signal" in data["t_output"].lower()
 ```
 - [ ] **Step 2: Run tests to verify they fail**
 ```bash
 python -m pytest tests/test_nagios_runner.py::test_stderr_used_when_stdout_empty \
    tests/test_nagios_runner.py::test_stderr_appended_when_both_present \
    tests/test_nagios_runner.py::test_negative_returncode_maps_to_unknown -v
 ```
 Expected: all FAIL — current implementation ignores stderr and doesn't handle negative codes.
 - [ ] **Step 3: Update imports in nagios_runner.py**
 Replace the import block at the top of `hbd/client/plugins/nagios_runner.py`:
 ```python
 import asyncio
 import os
 import re
 from typing import Any, Dict, List, Optional, Tuple
 from hbd.client.plugin import MonitorPlugin
 ```
 (Remove `import subprocess`; add `import asyncio` and `import os`.)
 - [ ] **Step 4: Upgrade collection log level from DEBUG to INFO**
 In `hbd/client/plugins/nagios_runner.py`, in `_collect_metrics()`, change the debug log (around line 144) so results are visible at INFO level:
 ```python
                self.logger.info(
                    f"Executed {name}: {STATUS_NAMES.get(status_code, 'UNKNOWN')} - {output[:50]}"
                )
 ```
 - [ ] **Step 5: Replace `_run_nagios_plugin` with async implementation**
 Replace the entire `_run_nagios_plugin` method in `hbd/client/plugins/nagios_runner.py`:
 ```python
    async def _run_nagios_plugin(
        self,
        command: str
    ) -> Tuple[int, str, Dict[str, Any]]:
        """Execute a Nagios plugin and parse its output."""
        try:
            proc = await asyncio.create_subprocess_shell(
                command,
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE,
            )
            try:
                stdout_bytes, stderr_bytes = await asyncio.wait_for(
                    proc.communicate(), timeout=self.timeout
                )
            except asyncio.TimeoutError:
                proc.kill()
                await proc.communicate()
                self.logger.error(f"Command timed out: {command}")
                return NAGIOS_UNKNOWN, f"Command timed out after {self.timeout}s", {}
            status_code = proc.returncode
            if status_code < 0:
                return NAGIOS_UNKNOWN, f"Process killed by signal {-status_code}", {}
            if status_code > 3:
                status_code = NAGIOS_UNKNOWN
            stdout = stdout_bytes.decode(errors="replace").strip()
            stderr = stderr_bytes.decode(errors="replace").strip()
            # Parse perfdata from stdout before mixing in stderr
            perfdata = self._parse_perfdata(stdout)
            # Build status message
            status_part = stdout.split('|')[0].strip() if '|' in stdout else stdout
            if not stdout and stderr:
                output_msg = stderr
            elif stdout and stderr:
                output_msg = f"{status_part} [stderr: {stderr}]"
            else:
                output_msg = status_part
            return status_code, output_msg, perfdata
        except Exception as e:
            self.logger.error(f"Error executing command: {e}")
            return NAGIOS_UNKNOWN, f"Execution error: {str(e)}", {}
 ```
 Also remove the now-unused `self.shell` line from `__init__` (the `shell` config key is no longer used since `create_subprocess_shell` always uses a shell):
 In `NagiosRunnerPlugin.__init__`, remove:
 ```python
        self.shell: bool = config.get("shell", True) if config else True
 ```
 - [ ] **Step 6: Run tests to verify they pass**
 ```bash
 python -m pytest tests/test_nagios_runner.py -v
 ```
 Expected: all tests PASS including the 3 new ones.
 - [ ] **Step 7: Commit**
 ```bash
 git add hbd/client/plugins/nagios_runner.py tests/test_nagios_runner.py
 git commit -m "feat: async subprocess in nagios_runner with stderr capture and signal handling"
 ```
 ---
 ## Task 4: NagiosRunnerPlugin — command path validation at init
 **Files:**
 - Modify: `hbd/client/plugins/nagios_runner.py` (initialize)
 - Modify: `tests/test_nagios_runner.py`
 - [ ] **Step 1: Write failing tests**
 Append to `tests/test_nagios_runner.py`:
 ```python
 def test_absolute_path_not_found_warns(caplog):
    fake_cmd = "/nonexistent_hbc_test_path/check_something"
    config = {"commands": [{"name": "t", "command": fake_cmd}]}
    plugin = NagiosRunnerPlugin(config=config)
    with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
        asyncio.run(plugin.initialize())
    assert any("not found" in r.message for r in caplog.records)
 def test_absolute_path_not_executable_warns(caplog, tmp_path):
    non_exec = tmp_path / "check_test"
    non_exec.write_text("#!/bin/sh\necho OK\n")
    non_exec.chmod(0o644)  # readable but not executable
    config = {"commands": [{"name": "t", "command": str(non_exec)}]}
    plugin = NagiosRunnerPlugin(config=config)
    with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
        asyncio.run(plugin.initialize())
    assert any("not executable" in r.message for r in caplog.records)
 def test_relative_path_not_checked(caplog):
    # Relative paths (resolved via PATH) must not generate warnings
    config = {"commands": [{"name": "t", "command": "echo OK"}]}
    plugin = NagiosRunnerPlugin(config=config)
    with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
        asyncio.run(plugin.initialize())
    assert not any(
        "not found" in r.message or "not executable" in r.message
        for r in caplog.records
    )
 ```
 - [ ] **Step 2: Run tests to verify they fail**
 ```bash
 python -m pytest tests/test_nagios_runner.py::test_absolute_path_not_found_warns \
    tests/test_nagios_runner.py::test_absolute_path_not_executable_warns \
    tests/test_nagios_runner.py::test_relative_path_not_checked -v
 ```
 Expected: `test_absolute_path_not_found_warns` and `test_absolute_path_not_executable_warns` FAIL (no warnings logged); `test_relative_path_not_checked` may pass.
 - [ ] **Step 3: Add command path validation to `initialize()`**
 In `hbd/client/plugins/nagios_runner.py`, extend `initialize()` by adding validation after the existing "log each command" loop (after line 103, before `return True`):
 ```python
        # Validate absolute command paths early
        for cmd_config in self.commands:
            name = cmd_config.get("name", "unnamed")
            command = cmd_config.get("command", "")
            if not command:
                continue
            exe = command.split()[0]
            if os.path.isabs(exe):
                if not os.path.isfile(exe):
                    self.logger.warning(
                        f"Command '{name}': executable not found: {exe}"
                    )
                elif not os.access(exe, os.X_OK):
                    self.logger.warning(
                        f"Command '{name}': executable not executable: {exe}"
                    )
 ```
 - [ ] **Step 4: Run full test suite to verify all pass**
 ```bash
 python -m pytest tests/test_plugin.py tests/test_nagios_runner.py -v
 ```
 Expected: all tests PASS.
 - [ ] **Step 5: Commit**
 ```bash
 git add hbd/client/plugins/nagios_runner.py tests/test_nagios_runner.py
 git commit -m "feat: validate absolute command paths at nagios_runner init"
 ```
 ---
 ## Task 5: Daemon mode logging — route to syslog after fork
 **Files:**
 - Modify: `hbd/client/main.py` (new helper + updated daemon block)
 No automated test for daemonization itself (fork behaviour is hard to unit-test). Manual verification steps are provided below.
 - [ ] **Step 1: Add `_reconfigure_logging_for_daemon` helper**
 In `hbd/client/main.py`, add this function just before `def build_parser()` (around line 589):
 ```python
 def _reconfigure_logging_for_daemon(log_level: int) -> None:
    """Replace StreamHandlers (now writing to /dev/null) with a SysLogHandler."""
    from logging.handlers import SysLogHandler
    root = logging.getLogger()
    for handler in root.handlers[:]:
        root.removeHandler(handler)
        handler.close()
    try:
        syslog_handler = SysLogHandler(
            address="/dev/log",
            facility=SysLogHandler.LOG_DAEMON,
        )
    except OSError:
        syslog_handler = SysLogHandler(
            address=("localhost", 514),
            facility=SysLogHandler.LOG_DAEMON,
        )
        # Attach the fallback first so the warning reaches syslog
        syslog_handler.setFormatter(
            logging.Formatter("hbc[%(process)d]: %(name)s %(levelname)s: %(message)s")
        )
        root.addHandler(syslog_handler)
        root.setLevel(log_level)
        logging.warning("/dev/log not found, using syslog UDP localhost:514")
        return
    syslog_handler.setFormatter(
        logging.Formatter("hbc[%(process)d]: %(name)s %(levelname)s: %(message)s")
    )
    root.addHandler(syslog_handler)
    root.setLevel(log_level)
 ```
 - [ ] **Step 2: Update the daemon block in `main()`**
 In `hbd/client/main.py`, replace the entire `if args.daemon:` block (lines 664–675):
 ```python
    if args.daemon:
        print("Daemonizing...")
        daemonize()
        _reconfigure_logging_for_daemon(log_level)
        logging.info(f"hbc starting, sending heartbeat to {', '.join(args.hosts)}")
 ```
 This removes the `import syslog`, `syslog.openlog()`, and `syslog.syslog()` calls (now handled by the logging system) and removes the no-op second `logging.basicConfig()` call.
 - [ ] **Step 3: Run existing test suite to confirm no regressions**
 ```bash
 python -m pytest tests/test_plugin.py tests/test_nagios_runner.py -v
 ```
 Expected: all tests still PASS.
 - [ ] **Step 4: Manual smoke test — verify syslog output in daemon mode**
 ```bash
 # In one terminal, tail syslog
 sudo journalctl -f -t hbc
 # In another terminal, start hbc in daemon mode (replace HOST with a real or dummy host)
 python -m hbd.client.main -d -v localhost
 # Expected in journalctl output:
 #   hbc[<pid>]: hbc.main INFO: Starting hbc for <hostname> -> ['localhost']
 #   hbc[<pid>]: hbc.main INFO: hbc starting, sending heartbeat to localhost
 #   hbc[<pid>]: plugin.loader INFO: ...
 # Stop the daemon
 pkill -f "hbd.client.main"
 ```
 - [ ] **Step 5: Commit**
 ```bash
 git add hbd/client/main.py
 git commit -m "fix: reconfigure logging to syslog after daemonize() instead of no-op basicConfig"
 ```
@@ -0,0 +1,92 @@
 # Plugin Error Checking & Daemon Logging — Design Spec
 **Date:** 2026-04-25  
 **Scope:** hbc client — daemon mode logging, nagios_runner plugin robustness, PluginLoader messaging  
 **Files affected:** `hbd/client/main.py`, `hbd/client/plugins/nagios_runner.py`, `hbd/client/plugin.py`
 ---
 ## 1. Daemon Mode Logging
 ### Problem
 In `main()`, `logging.basicConfig()` is called before `daemonize()` (establishing a StreamHandler to stderr), then called again after `daemonize()`. The second call is a no-op — Python ignores `basicConfig()` when handlers are already configured. After daemonization, stderr is redirected to `/dev/null`, so all subsequent log output is silently discarded.
 The existing `syslog.openlog()` / `syslog.syslog()` calls (lines 666–668) write a single startup message but do not integrate with the `logging` system, so plugin and connection log messages never reach syslog.
 ### Fix
 After `daemonize()`, explicitly reconfigure the root logger:
 1. Remove all existing handlers (they now write to `/dev/null`).
 2. Add `logging.handlers.SysLogHandler(address='/dev/log', facility=LOG_DAEMON)`.
 3. Set formatter: `hbc[%(process)d]: %(name)s %(levelname)s: %(message)s`
 4. Preserve the `log_level` already determined from `-v`/`-x` CLI flags.
 Remove the redundant `syslog.openlog()` / `syslog.syslog()` calls — the logging system handles routing.
 **Fallback:** If `/dev/log` does not exist (containers, some BSDs), fall back to `SysLogHandler(address=('localhost', 514))`. Log one warning (to stderr, before handlers are replaced) so the operator knows.
 ---
 ## 2. Nagios Runner Improvements
 ### 2a — Async Subprocess
 `_run_nagios_plugin()` is declared `async def` but calls `subprocess.run()` synchronously, blocking the event loop for the full command duration.
 **Fix:** Replace with `asyncio.create_subprocess_shell()` + `await proc.communicate()`. Enforce timeout with `asyncio.wait_for(..., timeout=self.timeout)` and catch `asyncio.TimeoutError`.
 ### 2b — Stderr Capture
 Subprocess stderr is currently discarded (`capture_output=True` only captures stdout in the sync call; stderr content is lost).
 **Fix:** Pass `stderr=asyncio.subprocess.PIPE` to `create_subprocess_shell`. After `communicate()`, if stdout is empty but stderr has content, use stderr as the output message. If both have content, append stderr to the output for visibility.
 ### 2c — Negative Return Codes
 A negative `returncode` means the process was killed by a signal (SIGKILL, OOM, etc.). The current code treats these as-is, which may produce unexpected status values.
 **Fix:** If `returncode < 0`, map to `NAGIOS_UNKNOWN` with message `"Process killed by signal {-returncode}"`.
 ### 2d — Command Path Validation at Init
 `initialize()` currently only checks that the commands list is non-empty.
 **Fix:** For each command entry during `initialize()`:
 - Warn and skip the entry if `name` or `command` is missing.
 - Extract the executable (first whitespace-delimited token of the command string).
 - If the executable is an absolute path, check `os.path.isfile()` and `os.access(..., os.X_OK)`. Log a `WARNING` if either check fails.
 - Commands with relative paths or shell builtins are not checked (they may be on PATH) — just noted.
 - Validation warns only; all original entries in `self.commands` are retained and still attempted at collection time (where the existing missing-name/command guard already skips them). The plugin initializes successfully as long as the commands list is non-empty.
 ---
 ## 3. PluginLoader Messaging
 ### Problem
 When `initialize()` returns `False`, the loader always logs:
 > `WARNING: Plugin X failed initialization, skipping`
 This is alarming when the real reason is simply "no commands configured". There is no API to distinguish "not configured" from "genuinely broken".
 ### Fix
 Add an optional `skip_reason` attribute to `Plugin.__init__()` (defaults to `None`).
 In `PluginLoader.load_from_directory()`, after `initialize()` returns `False`:
 - If `plugin.skip_reason` is set → `logger.info(f"Plugin {plugin.name} skipped: {plugin.skip_reason}")`
 - If `plugin.skip_reason` is `None` → `logger.warning(f"Plugin {plugin.name} failed initialization, skipping")` (existing behaviour)
 In `NagiosRunnerPlugin.initialize()`, when no commands are configured:
 ```python
 self.skip_reason = "no commands configured (add nagios_runner.commands to config)"
 return False
 ```
 Genuine failures (exceptions) continue to go through the existing `except` block in the loader, logging at `ERROR` with traceback — unchanged.
 ---
 ## Decisions
 | Topic | Decision |
 |---|---|
 | Daemon log destination | syslog only (LOG_DAEMON facility) |
 | Syslog fallback | localhost:514 UDP if `/dev/log` absent |
 | Nagios result log level | INFO for all statuses (OK/WARNING/CRITICAL/UNKNOWN) |
 | Invalid command handling at init | Warn and continue; still attempt at collection time |
 | PluginLoader API change | `skip_reason` attribute on Plugin base class, checked by loader |
@@ -14,4 +14,4 @@ Install options:
 """
 __all__ = ["__version__"]
-__version__ = "5.1.2"
+__version__ = "5.1.3"
@@ -15,6 +15,7 @@ import socket
 import sys
 import time
 from hashlib import md5
 from logging.handlers import SysLogHandler
 from pathlib import Path
 from typing import Dict, List, Optional
@@ -586,6 +587,36 @@ def daemonize(
    os.dup2(se.fileno(), sys.stderr.fileno())
 def _reconfigure_logging_for_daemon(log_level: int) -> None:
    """Replace StreamHandlers (now writing to /dev/null) with a SysLogHandler."""
    root = logging.getLogger()
    for handler in root.handlers[:]:
        root.removeHandler(handler)
        handler.close()
    use_udp_fallback = not os.path.exists("/dev/log")
    if use_udp_fallback:
        syslog_handler = SysLogHandler(
            address=("localhost", 514),
            facility=SysLogHandler.LOG_DAEMON,
        )
    else:
        syslog_handler = SysLogHandler(
            address="/dev/log",
            facility=SysLogHandler.LOG_DAEMON,
        )
    syslog_handler.setFormatter(
        logging.Formatter("hbc[%(process)d]: %(name)s %(levelname)s: %(message)s")
    )
    root.addHandler(syslog_handler)
    root.setLevel(log_level)
    if use_udp_fallback:
        logging.warning("/dev/log not found, using syslog UDP localhost:514")
 def build_parser():
    """Build argument parser."""
    parser = argparse.ArgumentParser(
@@ -663,16 +694,9 @@ def main(argv=None):
    # Daemonize if requested
    if args.daemon:
        print("Daemonizing...")
        import syslog
        syslog.openlog("hbc", syslog.LOG_PID, syslog.LOG_DAEMON)
        syslog.syslog(syslog.LOG_INFO, f"Starting heartbeat to {', '.join(args.hosts)}")
        daemonize()
-        
+        _reconfigure_logging_for_daemon(log_level)
-        # Reconfigure logging for syslog
+        logging.info(f"hbc starting, sending heartbeat to {', '.join(args.hosts)}")
        logging.basicConfig(
            level=log_level,
            format="hbc[%(process)d]: %(name)s %(levelname)s: %(message)s"
        )
    # Run async main
    try:
@@ -22,13 +22,14 @@ from typing import Any, Dict, List, Optional, Type
 class Plugin(ABC):
    """Base class for all plugins.
-    
+
    Attributes:
        name: Unique plugin identifier (e.g., "os_info", "cpu_monitor")
        version: Plugin version string
        description: Human-readable description
        interval: Collection interval in seconds (0 for InfoPlugin = collect once)
        enabled: Whether plugin is active (can be disabled via config)
        skip_reason: Set by plugin before returning False from initialize(); causes loader to log INFO instead of WARNING.
    """
    name: str = ""
@@ -39,13 +40,14 @@ class Plugin(ABC):
    def __init__(self, config: Optional[Dict[str, Any]] = None):
        """Initialize plugin with optional configuration.
-        
+
        Args:
            config: Plugin-specific configuration from YAML (e.g., thresholds, paths)
        """
        self.config = config or {}
        self.logger = logging.getLogger(f"plugin.{self.name}")
        self._initialized = False
        self.skip_reason: Optional[str] = None
    @abstractmethod
    async def initialize(self) -> bool:
@@ -369,9 +371,14 @@ class PluginLoader:
                    try:
                        initialized = await plugin.initialize()
                        if not initialized:
-                            self.logger.warning(
+                            if plugin.skip_reason:
-                                f"Plugin {plugin.name} failed initialization, skipping"
+                                self.logger.info(
-                            )
+                                    f"Plugin {plugin.name} skipped: {plugin.skip_reason}"
                                )
                            else:
                                self.logger.warning(
                                    f"Plugin {plugin.name} failed initialization, skipping"
                                )
                            continue
                    except Exception as e:
                        self.logger.error(
@@ -21,8 +21,10 @@ nagios_runner:
 ```
 """
 import asyncio
 import os
 import re
-import subprocess
+import shlex
 from typing import Any, Dict, List, Optional, Tuple
 from hbd.client.plugin import MonitorPlugin
@@ -52,8 +54,7 @@ class NagiosRunnerPlugin(MonitorPlugin):
        interval: Collection interval in seconds (default: 300)
        commands: List of command definitions with 'name' and 'command' keys
        timeout: Command execution timeout in seconds (default: 30)
-        shell: Whether to execute commands via shell (default: True)
+
    Example:
        nagios_runner:
          interval: 300  # Check every 5 minutes
@@ -76,32 +77,48 @@ class NagiosRunnerPlugin(MonitorPlugin):
        # Extract configuration
        self.commands: List[Dict[str, str]] = config.get("commands", []) if config else []
        self.timeout: int = config.get("timeout", 30) if config else 30
        self.shell: bool = config.get("shell", True) if config else True
        self.interval = config.get("interval", 300) if config else 300
        # Validate commands
        if not self.commands:
            self.logger.info(
                "No Nagios commands configured. Add 'nagios_runner.commands' to config."
            )
    async def initialize(self) -> bool:
        """Initialize the Nagios runner plugin.
-        
+
        Returns:
            True if at least one command is configured, False otherwise
        """
        self.logger.info(f"Initializing {self.name} plugin")
-        
+
        if not self.commands:
-            self.logger.info("No Nagios commands configured")
+            self.skip_reason = "no commands configured (add nagios_runner.commands to config)"
            return False
-        
+
        self.logger.info(f"Configured to run {len(self.commands)} Nagios plugin(s)")
        for cmd_config in self.commands:
            name = cmd_config.get("name", "unnamed")
            self.logger.info(f"  - {name}: {cmd_config.get('command', 'N/A')}")
-        
+
        # Validate absolute command paths early
        for cmd_config in self.commands:
            name = cmd_config.get("name", "unnamed")
            command = cmd_config.get("command", "")
            if not command:
                continue
            try:
                tokens = shlex.split(command)
            except ValueError:
                continue  # malformed command string; skip validation
            if not tokens:
                continue
            exe = tokens[0]
            if os.path.isabs(exe):
                if not os.path.isfile(exe):
                    self.logger.warning(
                        f"Command '{name}': executable not found: {exe}"
                    )
                elif not os.access(exe, os.X_OK):
                    self.logger.warning(
                        f"Command '{name}': executable not executable: {exe}"
                    )
        return True
    async def _collect_metrics(self) -> Dict[str, Any]:
@@ -141,7 +158,7 @@ class NagiosRunnerPlugin(MonitorPlugin):
                    for metric_name, metric_value in perfdata.items():
                        results[f"{name}_{metric_name}"] = metric_value
-                self.logger.debug(
+                self.logger.info(
                    f"Executed {name}: {STATUS_NAMES.get(status_code, 'UNKNOWN')} - {output[:50]}"
                )
@@ -163,46 +180,49 @@ class NagiosRunnerPlugin(MonitorPlugin):
        self,
        command: str
    ) -> Tuple[int, str, Dict[str, Any]]:
-        """Execute a Nagios plugin and parse its output.
+        """Execute a Nagios plugin and parse its output."""
        Args:
            command: Command string to execute
        Returns:
            Tuple of (status_code, output_message, performance_data_dict)
        """
        try:
-            # Run command
+            proc = await asyncio.create_subprocess_shell(
            result = subprocess.run(
                command,
-                shell=self.shell,
+                stdout=asyncio.subprocess.PIPE,
-                capture_output=True,
+                stderr=asyncio.subprocess.PIPE,
                timeout=self.timeout,
                text=True
            )
-            
+            try:
-            status_code = result.returncode
+                stdout_bytes, stderr_bytes = await asyncio.wait_for(
-            output = result.stdout.strip()
+                    proc.communicate(), timeout=self.timeout
-            
+                )
-            # Nagios plugins can return codes > 3, treat as UNKNOWN
+            except asyncio.TimeoutError:
                proc.kill()
                await proc.communicate()
                self.logger.error(f"Command timed out: {command}")
                return NAGIOS_UNKNOWN, f"Command timed out after {self.timeout}s", {}
            status_code = proc.returncode
            if status_code < 0:
                return NAGIOS_UNKNOWN, f"Process killed by signal {-status_code}", {}
            if status_code > 3:
                status_code = NAGIOS_UNKNOWN
-            
+
-            # Parse performance data
+            stdout = stdout_bytes.decode(errors="replace").strip()
-            perfdata = self._parse_perfdata(output)
+            stderr = stderr_bytes.decode(errors="replace").strip()
-            
+
-            # Extract just the status message (before the pipe if present)
+            # Parse perfdata from stdout before mixing in stderr
-            if '|' in output:
+            perfdata = self._parse_perfdata(stdout)
-                output_msg = output.split('|')[0].strip()
+
            # Build status message
            status_part = stdout.split('|')[0].strip() if '|' in stdout else stdout
            if not stdout and stderr:
                output_msg = stderr
            elif stdout and stderr:
                output_msg = f"{status_part} [stderr: {stderr}]"
            else:
-                output_msg = output
+                output_msg = status_part
-            
+
            return status_code, output_msg, perfdata
-            
+
        except subprocess.TimeoutExpired:
            self.logger.error(f"Command timed out: {command}")
            return NAGIOS_UNKNOWN, f"Command timed out after {self.timeout}s", {}
        except Exception as e:
            self.logger.error(f"Error executing command: {e}")
            return NAGIOS_UNKNOWN, f"Execution error: {str(e)}", {}
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "hbd"
-version = "5.1.2"
+version = "5.1.3"
 description = "Heartbeat monitoring system — client (hbc) and server (hbd)"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -0,0 +1,99 @@
 import asyncio
 import logging
 import os
 import stat
 from hbd.client.plugins.nagios_runner import (
    NagiosRunnerPlugin,
    NAGIOS_OK,
    NAGIOS_WARNING,
    NAGIOS_CRITICAL,
    NAGIOS_UNKNOWN,
 )
 def test_no_commands_sets_skip_reason():
    plugin = NagiosRunnerPlugin(config={"commands": []})
    result = asyncio.run(plugin.initialize())
    assert result is False
    assert plugin.skip_reason is not None
    assert "nagios_runner.commands" in plugin.skip_reason
 def test_stderr_used_when_stdout_empty(tmp_path):
    script = tmp_path / "check_err.sh"
    script.write_text("#!/bin/sh\necho 'error from stderr' >&2\nexit 2\n")
    script.chmod(script.stat().st_mode | stat.S_IEXEC)
    config = {"commands": [{"name": "t", "command": str(script)}], "timeout": 5}
    plugin = NagiosRunnerPlugin(config=config)
    asyncio.run(plugin.initialize())
    data = asyncio.run(plugin._collect_metrics())
    assert "error from stderr" in data["t_output"]
    assert data["t_status_code"] == NAGIOS_CRITICAL
 def test_stderr_appended_when_both_present(tmp_path):
    script = tmp_path / "check_both.sh"
    script.write_text("#!/bin/sh\necho 'OK - all good'\necho 'extra detail' >&2\nexit 0\n")
    script.chmod(script.stat().st_mode | stat.S_IEXEC)
    config = {"commands": [{"name": "t", "command": str(script)}], "timeout": 5}
    plugin = NagiosRunnerPlugin(config=config)
    asyncio.run(plugin.initialize())
    data = asyncio.run(plugin._collect_metrics())
    assert "OK - all good" in data["t_output"]
    assert "extra detail" in data["t_output"]
    assert data["t_status_code"] == NAGIOS_OK
 def test_negative_returncode_maps_to_unknown():
    # kill -9 $$ kills the shell itself; asyncio sees returncode -9
    config = {"commands": [{"name": "t", "command": "kill -9 $$"}], "timeout": 5}
    plugin = NagiosRunnerPlugin(config=config)
    asyncio.run(plugin.initialize())
    data = asyncio.run(plugin._collect_metrics())
    assert data["t_status_code"] == NAGIOS_UNKNOWN
    assert "signal" in data["t_output"].lower()
 def test_absolute_path_not_found_warns(caplog):
    fake_cmd = "/nonexistent_hbc_test_path/check_something"
    config = {"commands": [{"name": "t", "command": fake_cmd}]}
    plugin = NagiosRunnerPlugin(config=config)
    with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
        asyncio.run(plugin.initialize())
    assert any("not found" in r.message for r in caplog.records)
 def test_absolute_path_not_executable_warns(caplog, tmp_path):
    non_exec = tmp_path / "check_test"
    non_exec.write_text("#!/bin/sh\necho OK\n")
    non_exec.chmod(0o644)  # readable but not executable
    config = {"commands": [{"name": "t", "command": str(non_exec)}]}
    plugin = NagiosRunnerPlugin(config=config)
    with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
        asyncio.run(plugin.initialize())
    assert any("not executable" in r.message for r in caplog.records)
 def test_relative_path_not_checked(caplog):
    # Relative paths (resolved via PATH) must not generate warnings
    config = {"commands": [{"name": "t", "command": "echo OK"}]}
    plugin = NagiosRunnerPlugin(config=config)
    with caplog.at_level(logging.WARNING, logger="plugin.nagios_runner"):
        asyncio.run(plugin.initialize())
    assert not any(
        "not found" in r.message or "not executable" in r.message
        for r in caplog.records
    )
@@ -0,0 +1,83 @@
 import asyncio
 import logging
 import textwrap
 from hbd.client.plugin import PluginLoader, PluginRegistry
 def test_plugin_skip_reason_defaults_none(tmp_path):
    plugin_code = textwrap.dedent("""
        from hbd.client.plugin import MonitorPlugin
        class MinimalPlugin(MonitorPlugin):
            name = "minimal"
            version = "1.0.0"
            interval = 60
            async def initialize(self):
                return True
            async def _collect_metrics(self):
                return {}
    """)
    (tmp_path / "minimal.py").write_text(plugin_code)
    registry = PluginRegistry()
    loader = PluginLoader(registry)
    asyncio.run(loader.load_from_directory(tmp_path))
    plugin = registry.get("minimal")
    assert plugin is not None
    assert plugin.skip_reason is None
 def test_loader_logs_info_when_skip_reason_set(tmp_path, caplog):
    plugin_code = textwrap.dedent("""
        from hbd.client.plugin import MonitorPlugin
        class SkippablePlugin(MonitorPlugin):
            name = "skippable"
            version = "1.0.0"
            interval = 60
            async def initialize(self):
                self.skip_reason = "not configured in yaml"
                return False
            async def _collect_metrics(self):
                return {}
    """)
    (tmp_path / "skippable.py").write_text(plugin_code)
    registry = PluginRegistry()
    loader = PluginLoader(registry)
    with caplog.at_level(logging.INFO, logger="plugin.loader"):
        count = asyncio.run(loader.load_from_directory(tmp_path))
    assert count == 0
    assert any("skipped: not configured in yaml" in r.message for r in caplog.records)
    assert not any("failed initialization" in r.message for r in caplog.records)
 def test_loader_logs_warning_when_no_skip_reason(tmp_path, caplog):
    plugin_code = textwrap.dedent("""
        from hbd.client.plugin import MonitorPlugin
        class FailPlugin(MonitorPlugin):
            name = "fail"
            version = "1.0.0"
            interval = 60
            async def initialize(self):
                return False
            async def _collect_metrics(self):
                return {}
    """)
    (tmp_path / "fail_plugin.py").write_text(plugin_code)
    registry = PluginRegistry()
    loader = PluginLoader(registry)
    with caplog.at_level(logging.WARNING, logger="plugin.loader"):
        count = asyncio.run(loader.load_from_directory(tmp_path))
    assert count == 0
    assert any("failed initialization" in r.message for r in caplog.records)
Author	SHA1	Message	Date
andreas	7d8ca5d8db	version 5.1.3 Release / release (push) Successful in 4s Details	2026-04-25 16:52:56 +02:00
andreas	56037a036d	fix: remove unused pytest import in test_nagios_runner	2026-04-25 16:39:56 +02:00
andreas	65ceb31d8d	fix: use os.path.exists check for /dev/log instead of dead-code OSError catch	2026-04-25 16:36:00 +02:00
andreas	1c9b6c1ca9	fix: reconfigure logging to syslog after daemonize() instead of no-op basicConfig After daemonize() redirects stderr to /dev/null, the existing StreamHandler writes to /dev/null. logging.basicConfig() is a no-op when handlers are already configured, so log messages are silently lost. Replace the daemon block to: 1. Call daemonize() first 2. Explicitly remove existing handlers (pointing to /dev/null) 3. Add SysLogHandler pointing to /dev/log with fallback to UDP localhost:514 4. Log startup message to the new syslog handler Removes redundant syslog.openlog() call which is no longer needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:29:54 +02:00
andreas	d7e6b478e1	fix: use shlex.split() in nagios_runner path validation to handle quoted paths	2026-04-25 16:28:32 +02:00
andreas	535dbda47d	feat: validate absolute command paths at nagios_runner init	2026-04-25 16:24:33 +02:00
andreas	c9567dddae	fix: remove stale shell config key from NagiosRunnerPlugin docstring	2026-04-25 16:23:03 +02:00
andreas	b5963badd6	feat: async subprocess in nagios_runner with stderr capture and signal handling Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:18:09 +02:00
andreas	a76a39b4a0	fix: remove redundant no-commands log lines; fix skip_reason docstring style Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:15:44 +02:00
andreas	94e1597978	feat: set skip_reason on nagios_runner when no commands configured When NagiosRunnerPlugin has no commands configured, set skip_reason before returning False from initialize(). This allows PluginLoader to log INFO (not WARNING) when the plugin is skipped. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:13:03 +02:00
andreas	c9c2ed772f	fix: document skip_reason in Plugin docstring; remove unused import in test	2026-04-25 16:10:35 +02:00
andreas	aeb78dcb8e	feat: add skip_reason to Plugin; improve PluginLoader init messaging	2026-04-25 16:08:07 +02:00
andreas	77b337e4dd	Add implementation plan for plugin error checking and daemon logging fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 16:04:13 +02:00
andreas	293461f3f6	Add design spec for plugin error checking and daemon logging fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-25 15:49:09 +02:00