# Plugin Error Checking & Daemon Logging — Design Spec **Date:** 2026-04-25 **Scope:** hbc client — daemon mode logging, nagios_runner plugin robustness, PluginLoader messaging **Files affected:** `hbd/client/main.py`, `hbd/client/plugins/nagios_runner.py`, `hbd/client/plugin.py` --- ## 1. Daemon Mode Logging ### Problem In `main()`, `logging.basicConfig()` is called before `daemonize()` (establishing a StreamHandler to stderr), then called again after `daemonize()`. The second call is a no-op — Python ignores `basicConfig()` when handlers are already configured. After daemonization, stderr is redirected to `/dev/null`, so all subsequent log output is silently discarded. The existing `syslog.openlog()` / `syslog.syslog()` calls (lines 666–668) write a single startup message but do not integrate with the `logging` system, so plugin and connection log messages never reach syslog. ### Fix After `daemonize()`, explicitly reconfigure the root logger: 1. Remove all existing handlers (they now write to `/dev/null`). 2. Add `logging.handlers.SysLogHandler(address='/dev/log', facility=LOG_DAEMON)`. 3. Set formatter: `hbc[%(process)d]: %(name)s %(levelname)s: %(message)s` 4. Preserve the `log_level` already determined from `-v`/`-x` CLI flags. Remove the redundant `syslog.openlog()` / `syslog.syslog()` calls — the logging system handles routing. **Fallback:** If `/dev/log` does not exist (containers, some BSDs), fall back to `SysLogHandler(address=('localhost', 514))`. Log one warning (to stderr, before handlers are replaced) so the operator knows. --- ## 2. Nagios Runner Improvements ### 2a — Async Subprocess `_run_nagios_plugin()` is declared `async def` but calls `subprocess.run()` synchronously, blocking the event loop for the full command duration. **Fix:** Replace with `asyncio.create_subprocess_shell()` + `await proc.communicate()`. Enforce timeout with `asyncio.wait_for(..., timeout=self.timeout)` and catch `asyncio.TimeoutError`. ### 2b — Stderr Capture Subprocess stderr is currently discarded (`capture_output=True` only captures stdout in the sync call; stderr content is lost). **Fix:** Pass `stderr=asyncio.subprocess.PIPE` to `create_subprocess_shell`. After `communicate()`, if stdout is empty but stderr has content, use stderr as the output message. If both have content, append stderr to the output for visibility. ### 2c — Negative Return Codes A negative `returncode` means the process was killed by a signal (SIGKILL, OOM, etc.). The current code treats these as-is, which may produce unexpected status values. **Fix:** If `returncode < 0`, map to `NAGIOS_UNKNOWN` with message `"Process killed by signal {-returncode}"`. ### 2d — Command Path Validation at Init `initialize()` currently only checks that the commands list is non-empty. **Fix:** For each command entry during `initialize()`: - Warn and skip the entry if `name` or `command` is missing. - Extract the executable (first whitespace-delimited token of the command string). - If the executable is an absolute path, check `os.path.isfile()` and `os.access(..., os.X_OK)`. Log a `WARNING` if either check fails. - Commands with relative paths or shell builtins are not checked (they may be on PATH) — just noted. - Validation warns only; all original entries in `self.commands` are retained and still attempted at collection time (where the existing missing-name/command guard already skips them). The plugin initializes successfully as long as the commands list is non-empty. --- ## 3. PluginLoader Messaging ### Problem When `initialize()` returns `False`, the loader always logs: > `WARNING: Plugin X failed initialization, skipping` This is alarming when the real reason is simply "no commands configured". There is no API to distinguish "not configured" from "genuinely broken". ### Fix Add an optional `skip_reason` attribute to `Plugin.__init__()` (defaults to `None`). In `PluginLoader.load_from_directory()`, after `initialize()` returns `False`: - If `plugin.skip_reason` is set → `logger.info(f"Plugin {plugin.name} skipped: {plugin.skip_reason}")` - If `plugin.skip_reason` is `None` → `logger.warning(f"Plugin {plugin.name} failed initialization, skipping")` (existing behaviour) In `NagiosRunnerPlugin.initialize()`, when no commands are configured: ```python self.skip_reason = "no commands configured (add nagios_runner.commands to config)" return False ``` Genuine failures (exceptions) continue to go through the existing `except` block in the loader, logging at `ERROR` with traceback — unchanged. --- ## Decisions | Topic | Decision | |---|---| | Daemon log destination | syslog only (LOG_DAEMON facility) | | Syslog fallback | localhost:514 UDP if `/dev/log` absent | | Nagios result log level | INFO for all statuses (OK/WARNING/CRITICAL/UNKNOWN) | | Invalid command handling at init | Warn and continue; still attempt at collection time | | PluginLoader API change | `skip_reason` attribute on Plugin base class, checked by loader |