The build
What you'd have to build
A high-level view of the work it takes to get to a good enough, accurate, reliable customer brain. The point isn't to scare you off building, but to share what we've uncovered so you can decide what's worth doing yourself.
| The problem | How we handle it | Why it matters |
|---|---|---|
| Phase 1Getting the data in clean | ||
| Making every source legibleCalls, emails, tickets, Slack, and docs all arrive bloated — HTML, quoted replies, signatures, large attachments. Fed raw, they inflate token cost and blow context windows. | We clean and compress each source on ingest: strip noise, extract what matters from attachments, shrink transcripts, without losing meaning. | It's the difference between a query that costs cents and one that costs dollars, on every call. |
| Taming emailHTML bleeding into text, bounce tags wrapping real addresses, the same attachment reprocessed on every reply, dynamic content defeating dedup, automated mail flooding signal. | Parsing, dedup, sender normalization, attachment de-duplication, and noise routing built up over time. | Email is often the richest signal and the messiest source. Wrong here is wrong everywhere downstream. |
| Knowing who is whoThe same person or account shows up under different names, emails, and IDs across CRM, calls, and email. A meeting host matches no one; the same call from two recorders becomes two meetings. | An identity layer that resolves people and accounts across sources, dedups near-duplicates, and flags when names don't line up. | Get it wrong and the model attaches data to the wrong account without anyone noticing. |
| Phase 2Making it usable by a model | ||
| Making answers reproduciblePoint a model at raw data and it samples differently each run — a good answer one day, a wrong one the next, from the same prompt. | We pre-join and pre-process data into stable, reusable results, so the same question returns the same thing. | People stop trusting a tool the moment they can't reproduce their own results. |
| Giving raw data structureRaw data is an undifferentiated pile. To answer "what's urgent this week" or "show me product feedback," something has to label each item as it lands. | We categorize and score every source for type and urgency on ingest, so the model can filter instead of re-reading everything. | Without it, every question re-scans everything — slow, expensive, inconsistent. |
| Letting the model reach for dataYou can’t just dump everything. The model needs scoped ways to ask for exactly what it needs (by account, person, topic, time) or it will pull a random sample, and just makes up an answer. | Scoped ways for the model to reach for exactly the data it needs — by account, person, topic, or time — tuned so it asks correctly instead of pulling everything. | Done poorly, the model either can't find what it needs or drowns trying. |
| Keeping answers honestIt can log your own team's chatter as customer "wins," wrap answers in formatting junk, call a working feature broken over one word, or drop valid data on a parsing hiccup. | Ongoing tuning on extraction, categorization, and output parsing, plus guards against hallucination on empty input. | An answer that looks right but is quietly wrong is worse than no answer. |
| Phase 3Running it for real | ||
| Surviving productionA model call times out and the job queue jams. A large backfill starves everything else. Oversized inputs just fail. Scheduled jobs break on timezones and DST, and orphan themselves when a record is deleted. | A state-machine pipeline with throttled loads, retry-and-shrink on oversized inputs, failure recovery, and a scheduler that handles timezones, DST, overlaps, and cleanup. | A pipeline that jams on the first timeout, or a recurring job that silently stops firing, isn't something a team will rely on. |
| Answering to securityA model plugged straight into your tools sees everything the connected user can see, including what was never locked down. The same inbox is signal for one role and noise for another. | One place to control what the model can read and do, with role-aware filtering. | It's usually the first question security asks, and raw connectors don't have a clean answer. |
| Explaining what happenedWhen the model acts, or an answer looks off, you need to know what it read, what it changed, and how to roll it back. | An audit trail across the system that persists after records are deleted. | Without it you can't pass a security review or explain what the model actually did. |
| Keeping the connections aliveVendors change APIs without notice. OAuth tokens expire. A recorder sends malformed data. A customer's meetings just stop showing up. | Maintained connectors for tools like Zoom, Slack, HubSpot, and Salesforce, plus the upkeep to keep them working. | This work is never finished — someone owns every vendor's changes, permanently. |
What it adds up to
Any single row is a sprint. The pieces overlap, the failures are silent, and the integration work never ends — it becomes a standing job someone on your team owns permanently.
We're live in days, and the maintenance stays on our side.