Chunking 101: how RAG splits your content

Chunking is the unglamorous step that makes or breaks retrieval quality. Here's what it is and how to do it well.

Before your content can be searched, it has to be split into pieces — *chunks* — small enough to embed and retrieve precisely. How you split matters more than people expect.

Why not just embed whole pages?

Embed a 3,000-word page as one vector and you get one blurry average of everything on it. A question about shipping retrieves the whole page, most of which is noise. Smaller chunks mean sharper matches.

Why not split every sentence?

Go too small and you lose context. A sentence like "It ships in 3 days" is useless if the chunk doesn't know *what* ships.

The sweet spot

~500 tokens per chunk, with a small overlap (say 50 tokens) so ideas aren't cut mid-thought.
Respect headings. Keep a section together and prefix its heading path so the embedding captures context.
Split FAQs per question. Each Q&A is its own high-precision chunk.

Curious how a page breaks down? The Token & Chunk Estimator shows tokens, chunk count, and cost for any text you paste.

Chunking 101: how RAG splits your content

Why not just embed whole pages?

Why not split every sentence?

The sweet spot

Keep reading