Dev iconDevMay 7, 2026 ~1 min source read

Building a Full-Stack Agentic AI Data Platform on ClickHouse: A Complete Architecture Guide

A production-grade, end-to-end agentic AI platform â€" chat UI, self-hosted LLM, MCP server, LLM observability, medallion data architecture, security guardrails, HA, and cost analysis. An actual production AI platform requires far more than a model and an interface â€" it needs data plumbing, semantic grounding, query safety, observability, role-based access, high availability, and a real cost model.

Building a Full-Stack Agentic AI Data Platform on ClickHouse: A Complete Architecture Guide

Share this story

Send the public story page.

Useful takeaways from this story.

A production-grade, end-to-end agentic AI platform â€" chat UI, self-hosted LLM, MCP server, LLM observability, medallion data architecture, security guardrails, HA, and cost analysis.

Same stack ClickHouse uses internally (DWAINE: 250+ employees, ~70% of internal analytics use cases covered, 50-70% workload reduction on the data-warehouse team).

An actual production AI platform requires far more than a model and an interface â€" it needs data plumbing, semantic grounding, query safety, observability, role-based access, high availability, and a real...

Building the complete brief

The page is ready to read now. The fuller skim-friendly version will appear here automatically.

The useful part

A production-grade, end-to-end agentic AI platform â€" chat UI, self-hosted LLM, MCP server, LLM observability, medallion data architecture, security guardrails, HA, and cost analysis. Same stack ClickHouse uses internally (DWAINE: 250+ employees, ~70% of internal analytics use cases covered, 50-70% workload reduction on the data-warehouse team). An actual production AI platform requires far more than a model and an interface â€" it needs data plumbing, semantic grounding, query safety, observability, role-based access, high availability, and a real cost model.

How it works

  • PostgreSQL â†' CDC (Debezium + Kafka) â†' ClickHouse with medallion architecture (Raw â†' Staging â†' Marts) LLM layer:
  • Self-hosted Qwen 2.5 72B (Apache 2.0, all data stays on-prem) + business glossary for domain grounding Tool layer:
  • ClickHouse's official MCP server with bearer-token auth, SSRF protection, schema discovery + safe SQL execution UX layer:
  • LibreChat (open-source, SSO, role-based) â€" same chat UI ClickHouse acquired and uses for DWAINE Observability layer:
  • Langfuse (open-source, runs on ClickHouse) â€" every query, response, latency, cost tracked Operations layer: HA cluster, query timeouts, memory caps, RBAC, full audit trail JOIN strategy: 3-tier approach...

What to take from it

This is the architecture I shipped for a crypto exchange â€" covering all of those layers.

Details worth keeping

Every component is open-source and battle-tested.

Keep reading in the app

Open the app view to save this story, compare related coverage, and continue from the same source.

Open in app