Project
AI Operations Monitor
The dashboard that tells you when your AI is slow, expensive, or quietly failing.
The problem
Most teams ship AI into production blind. They learn about latency spikes, runaway cost, and silent failures from customers rather than from a dashboard, which means they learn late.
What I built
A lightweight telemetry layer that captures chat events, workflow runs, and model calls into Postgres and surfaces them in provisioned Grafana dashboards, tracking latency, cost, schema validity, escalations, and failure trends. The whole stack comes up with one command.
The result
Problems surface on a dashboard before they reach the support queue, and cost and reliability stop being guesswork.
Stack
PythonFastAPIPostgresGrafanaDocker Compose