RAG — Retrieval-Augmented Generation

Give an LLM access to your own documents at query time. The most effective pattern to get accurate, grounded answers on your data.

Easy Technical

1 min read

**Problem**: don't know your internal data (product docs, tickets, contracts, policies). Their training data is generic + ~6-18 months old. How do you make them answer questions about YOUR data?

**Solution**: — Retrieval-Augmented Generation. At query time, you find the most relevant chunks of YOUR docs and inject them into the LLM's prompt. The LLM answers using those chunks as source. Like giving a student the textbook open at the right page before the exam.

Concrete flow: **(1)** you chunk your docs (by paragraph, page, section) and store each chunk as a 'semantic fingerprint' (called an ) in a specialized database (). **(2)** when a user asks a question, you compute the question's embedding, fetch the 5-10 most similar chunks, inject them into the prompt, ask the LLM to answer based on those.

**Why it works**: the LLM no longer needs to 'know' the answer — it just needs to read and synthesize. drop dramatically because the answer is grounded on real text. Bonus: you can cite sources ('this info comes from section 3.2 of the 2024 employee handbook').

When RAG is a perfect fit: internal Q&A bots (HR, IT, support), document search + summarization, customer support (answer from product docs), legal Q&A on contracts. When it's not the right answer: tasks requiring absolutely precise reasoning (RAG can miss the critical chunk), or tasks well-known from generic training (asking Claude what Python is doesn't need RAG).

Tools: vector DBs (Pinecone, Weaviate, Chroma, Qdrant, pgvector in Postgres), embedding models (OpenAI text-embedding-3, Cohere embed, Voyage, Google text-embedding-005), frameworks (LangChain, LlamaIndex). For 80% of use cases, pgvector in your existing Postgres is enough.

Diagram

User question

Embed query

Vector DB (top-k semantic search)

Reranker (optional, boosts quality)

LLM (answer grounded on retrieved docs)

Your docs (indexed as chunks + embeddings)

Grounded on https://www.anthropic.com/news/contextual-retrieval

Next up

AI Agents — what they do and where they break

An agent is an LLM that plans, calls tools, and iterates until a goal is reached. Powerful for multi-step work but brittle — know when to trust one.