[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jacobcole / picortex — planning / docs/specs/008-observability.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- visibility: public --- # Spec 008 — Observability **Status:** Draft **Related:** [PRD FR-22..FR-25](../prd/001-picortex-v1.md#observability), Jacob's global [dev-patterns](https://github.com/tmad4000/jacob-computer-config-private) ## Goal Everything observable without standing up a paid SaaS. Logs useful for the user ("what happened in chat X last Tuesday?") and for the developer ("why did the discriminator skip this message?"). ## Structured logs - **Logger:** `pino` with `pino-pretty` in dev, JSON in prod. - **Level:** default `info`; `debug` via `LOG_LEVEL=debug`. - **Location:** stdout (journald / docker-less systemd captures); no separate log files. - **Fields every log carries:** - `time` (ISO) - `level` (`info`/`warn`/`error`) - `request_id` (see below) - `chat_id` (if in chat context) - `event_type` (e.g. `linq.inbound`, `tmux.turn.start`, `discriminator.decision`) - `msg` (free text) ## Request IDs - Fastify middleware generates `X-Request-ID` (uuid v7) for every inbound HTTP request. - Response headers echo it. - Logs in that request's async context include it. - Child-process spawns inherit it via env (`PICORTEX_REQUEST_ID`). - Linq inbound events tag the request ID into the `events` SQLite row. ## `/api/frontend-log` Per Jacob's global rules. Client-side: ```ts window.addEventListener('error', ev => fetch('/api/frontend-log', { method: 'POST', body: JSON.stringify({ level: 'error', message: ev.message, error: ev.error?.toString(), stack: ev.error?.stack, context: { url: location.href, ua: navigator.userAgent, build: __VERSION__ } }) })) ``` Server-side endpoint: - Accepts up to `FRONTEND_LOG_MAX_BYTES` (default 64 KB) - Rate-limited to 30/min per IP - Logs under `event_type: "frontend"` with the browser-supplied fields plus the request ID tying it to the current user session ## Metrics No Prometheus in v1. Instead, lightweight counters in SQLite `metrics` table that `/health` exposes: ``` chats_total chats_active_7d turns_total turns_last_24h discriminator_skipped_24h errors_last_24h ``` `/health` returns: ```json { "status": "ok", "version": "0.0.1", "commit": "abcd123", "uptime_seconds": 3412, "db_ok": true, "tmux_ok": true, "metrics": { ... } } ``` ## Network egress allowlist Claude Code chat users should only reach: - `api.anthropic.com` - `registry.npmjs.org` (for tooling, if used by Claude) - `pypi.org` (if Python is used) - `github.com`, `raw.githubusercontent.com` - Anything the user explicitly allowlists in `/etc/picortex/egress-allowlist.txt` Enforced via iptables `owner` match on the chat-user's UID. Rejected connections log an event — Jacob gets an alert if a new host is attempted (learning mode). ## Sentry (optional, post-v0.1) If Jacob wants error aggregation: `@sentry/node` + `@sentry/browser`. Keep it off by default. ## Testing - **Unit:** request-ID middleware; log shape sanity. - **Integration:** frontend-log roundtrip. - **Manual:** tail logs during E2E; verify every turn has a request ID. ## Open questions - OQ1: Where are logs archived long-term? (Not in v1 — stdout + journald is fine.) - OQ2: Do we want Axiom or Loki integration? (Not for v1. Cortex uses Axiom.)