Rick Hallett

“How do you trust, evaluate, and operate AI systems at every layer?”

Seven projects. Built solo across ~2 months. Some are running. Some are resting. Some are ideas with teeth that haven't been built yet. This is the stack.

[ Input / Output Trust ]

WASP

Prompt injection prevention. Archived — input defense layer.

Status ArchivedFocus Prevention, not detection

Python

Stain

Multi-agent AI slop detection via linguistic fingerprinting.

░░░░░░░░░░░░░░░░░░░░░░░░░█

today · 65 commits

Model Cerebras Qwen3 235BDetectors 6Benchmark samples 138Accuracy (θ=0.50) 74%

Python · YAML

[ Knowledge Trust ]

Arcana

Document RAG + LLM-as-judge claim verification.

░░░░░░░░░░░░░░░░░░░░░░░░░█

today · 42 commits

Workers 5 (extract, embed, analyst, checker, gateway)Vector store ChromaDBOrchestration LangGraph + NATS

Bash · Python · YAML

[ Agent Trust ]

The Pit

Agent research platform + structured roast battles with traces.

░░░░░░░░░░░░░░░░▒█▒▓▒▓▒▒▒▒

today · 1236 commits

Tests ~1,289Coverage ~96%

Bash · Go · JavaScript · Python · Rust · SQL · TypeScript · YAML

Sortie

Convergence-based code review gating with immutable ledger.

░░░░░░░░░░░░░░░░░░░░░░░░█▒

today · 24 commits

Runs 6Pass rate 100%Branches reviewed 4

Python

[ Orchestration ]

Halo

Event-sourced advisory fleet — personal AI operating system.

░░░░░░░░░░░░░░░▒▒▒▒▒▒▓█▒▒▒

today · 647 commits

CLI modules 28Event bus NATS JetStreamRuntime K8s (k3s)

Bash · Go · JavaScript · Python · SQL · TypeScript · YAML

[ Intelligence ]

Jeany

Cost-attributed AI content economy intelligence.

░░░░░░░░░░░░░░░░░░░░░░░░░█

4d ago · 128 commits

Services 6Observability Prometheus + OTelUnique Cost-attributed pipeline

Bash · Go · Python · TypeScript · YAML

[ The Gaps ]

Unified Observability

Each project tracks its own costs and traces. Nothing connects them yet. This dashboard is the first step — it observes the stack but doesn't unify the telemetry. Per-project API keys give partial visibility. The elephant in the room is Claude Code token spend via subscription, which is trackable but not yet tracked.

Model Drift Detection

When a model version changes, does output quality shift? Isolating real drift from normal churn is genuinely hard. The Pit's traces and Stain's benchmarks produce the raw data this would need. The detection layer doesn't exist yet.

Human Accountability

This dashboard is a partial answer. The act of maintaining it creates a feedback loop — make the numbers honest, keep the lights accurate, notice when something goes dark. Before this, the only external record was a blog. One-way. Monoculture. This is two-way by construction, but it's still v0.

[ Build Cost ]

Total tokens 1.6B

Total cost (est.) $1012.52

2026-03 ████████████████████ $540.61

2026-04 █████████████████░░░ $471.91

By model:

Opus 4.6 █████████████████░ $443.75 (94%)

Sonnet 4.6 █░░░░░░░░░░░░░░░░░ $22.26 (5%)

Haiku 4.5 ░░░░░░░░░░░░░░░░░░ $5.90 (1%)

Cache hit rate: 97.3%

Per-project token attribution not yet available. OTel adapters planned.

[ Aggregate ]

Commits (6mo) 2,142

Tests 1,289

Languages Python, YAML, Bash, Go, JavaScript, Rust, SQL, TypeScript

Hot: 6Warm: 0Cool: 0Dormant: 0Conceptual: 1