What We Learned Building Our Own Support Bot Widget (Chat With Your Content)
We built our own support bot widget: a chat bubble you embed on a website so visitors can ask questions and get answers from your content (docs, knowledge base, service pages, policies). If you’ve ever shipped one of these, you know the pitch is easy and the reality is messy. The problems you run into are rarely “model problems”. They’re engineering problems:- retrieval breaks in predictable ways
- latency kills trust before the first answer lands
- access control is easy to get wrong
- prompt injection becomes “content poisoning”
- answers must be auditable or you end up with support risk
What we were building (requirements and constraints)
We set a few non-negotiable requirements upfront:- Fast first token: users should see a response quickly, even if the assistant needs to “search” in the background.
- Grounded answers: answers must be supported by content we actually ship.
- Read-only by default: the assistant must not mutate content.
- Multi-locale ready: our site is localized; the assistant should not mix languages casually.
- Safe escalation: when it can’t answer, it should route the user to contact/support without hallucinating.
The assistant should behave like a careful engineer reading our website, not like a creative writer improvising.
A working mental model: “support bot = search + navigation + synthesis”
A good support assistant does three things in sequence:- Search: identify candidate pages/sections.
- Navigate: open pages, scan headings, follow links, refine the search.
- Synthesize: produce a short, correct answer with pointers to where it came from.
Lesson 1: Chunk-only RAG fails when the answer spans pages
Top-K chunk retrieval breaks in consistent ways:- the answer is split across two pages
- the “exact syntax” is in a code block that didn’t rank
- the right answer is on a page with weak lexical overlap
- the user’s question is underspecified, so the embedding match is “close enough”
- correct tone
- wrong details
- no way to verify
Lesson 2: Give the assistant deterministic primitives (docs-as-files)
Humans don’t answer docs questions by reading five random paragraphs. We:findthe pageopenit- scan headings
- search for exact tokens
- follow internal links
- repeat until we can prove the answer
- each page is a “file”
- directories represent sections (or URL path segments)
- the assistant can
ls,find,cat, andgrep
- auditable (you can log tool calls)
- deterministic (same query yields the same reads)
- scalable (it can explore without you hardcoding flows)
Architecture (the simplest version that works)
At a high level, we ended up with three layers:Browser widget
-> /api/support-chat (stream)
-> Retrieval layer (lexical + vector)
-> Tool layer ("filesystem" over content)
-> Answer synthesis (grounded prompt + citations)
-> Observability (logs + traces + eval hooks)
The key point: we treat “reading our content” as a tool, not as a side effect of retrieval.
Data model: pages, chunks, and a path tree
We kept the content representation intentionally boring:page: canonical URL/slug + title + locale + visibility + lastmod + headingschunk:{ page, chunk_index, text, embedding, tokens, hash }path_tree:{ "services/seo": { isPublic: true, groups: [] }, ... }
- the assistant needs a map of “what exists”
- access control becomes structural (prune paths before sessions begin)
ls/findbecomes fast and cacheable
Ingestion pipeline: turn a website into a reliable knowledge surface
This was the biggest time sink. “Index your docs” sounds easy until you see what real content looks like:- marketing copy + components
- MDX and headings that don’t map cleanly to HTML
- navigation pages that repeat content blocks
- locale variants that partially diverge
- Discover pages: sitemap + known route list.
- Fetch rendered HTML: what users and crawlers actually see.
- Extract main content: strip nav, footers, cookie banners, repeated UI.
- Normalize: collapse whitespace, remove tracking query strings from links.
- Segment:
- page-level record for navigation and citations
- chunk-level records for retrieval
- Annotate:
- locale
- visibility (public, client-only, internal)
- headings outline
- outbound internal links
Retrieval: combine lexical and vector before you “ask the model”
We don’t trust a single retrieval strategy. We do:- lexical search for exact tokens (great for identifiers, acronyms, error codes)
- vector search for semantic match (great for vague questions)
- reject pages that don’t match the user’s locale (unless explicitly asked)
- prefer canonical pages over tag pages / duplicate listings
- prefer pages with strong heading overlap with query terms
Tool design: what the assistant can do (and what it cannot)
We implemented a small tool surface area and made it strict: Allowed:- list directories:
ls /services - search paths:
find -name "billing" - read a full page:
cat /services/seo - search within content:
grep -ri "canonical" /
- writing or editing content
- fetching arbitrary URLs
- arbitrary network calls
The latency lesson: real filesystems are too slow for interactive chat
If you spin up a sandbox/container per session to provide a real filesystem:- cold start becomes visible
- chat feels broken
- you’re tempted to add complexity like warm pools
lsandfindresolve from the cached path treecatreassembles the page from stored chunks (sorted bychunk_index)- the results are cached per-session (and partially globally, when safe)
Cache what repeats: directory tree, pages, and “grep targets”
Caching “answers” is weak because questions vary. What repeats during real conversations is:- listing the same sections
- opening the same 5-10 important pages
- grepping for the same tokens
- the path tree
- reconstructed full pages
- recent grep candidates
RBAC: access control must be structural, not prompt-based
If some docs are not public (drafts, internal notes, client-only docs), you cannot rely on prompting. We enforced RBAC before the assistant runs a single tool call:- build a user-scoped path tree
- prune everything the user can’t access
- apply the same filter to every query and page read
ls a file, it can’t cat it, and it can’t cite it.
That’s the only mental model that holds under pressure.
Guardrails: how we stopped confident wrong answers
We added a few rules that dramatically reduced bad answers:Rule 1: No evidence, no answer
If the assistant can’t find supporting content with tools, it should:- say it couldn’t verify
- show what it checked (pages or sections)
- offer a next step (contact/support)
Rule 2: Prefer quoting over paraphrasing for critical details
When the answer is sensitive to exact wording (requirements, limitations, legal copy):- quote the relevant sentence(s) from the page it read
- keep the synthesis minimal
Rule 3: Be strict about locale and canonical pages
If the user is on a locale route, the assistant should:- prefer that locale’s content
- avoid mixing languages in a single response
- fall back to default locale only if the locale page does not exist
The “grep problem”: the killer feature needs a two-phase plan
Naive recursive grep is slow if it reads everything over the network. We used a two-phase approach:- coarse filter using the index (which pages might contain the token)
- fine filter in memory over cached page text to extract exact matches + context
Widget UX: the UI makes or breaks trust
We shipped multiple UI iterations before the widget felt reliable. What mattered most:- streaming tokens with a stable layout (avoid jumpy reflow)
- clear states:
- “Searching…”
- “Reading page…”
- “Answering…”
- short, in-line citations:
- “From: /services/seo”
- “From: /legal/privacy”
- fallbacks that don’t feel like failure:
- “I checked X and Y but couldn’t confirm; here’s how to reach us.”
Observability: log the reads, not just the tokens
If you want to improve quality, you need to know what happened. We logged:- tool calls (paths listed/read/searched)
- which pages were used as evidence
- the final answer length and latency buckets
- “could not verify” rates
- escalation rates (contact clicks)
- which pages are missing important information?
- which questions never find evidence?
- which pages cause confusion and need restructuring?
A practical build checklist (what we’d do again)
If you’re building a support widget that chats with your content, we’d start with:- index the rendered site and store page-level records
- combine lexical + vector retrieval
- add an exploration tool layer (files + grep)
- prune content for RBAC before sessions start
- keep tools read-only by default
- add a “no evidence, no answer” rule
- make “searching/reading/answering” visible in the UI
- log tool calls so you can debug failures