Back to list

Tutorial

Building a simple RAG app with open-source models

RAG (Retrieval-Augmented Generation) lets you add your own docs to an LLM. Here’s a minimal path to a working app.

Stack Use sentence-transformers or OpenAI embeddings for chunks. Store vectors in Chroma or FAISS. Use a small local LLM (e.g. Llama, Mistral) or an API for the final answer.

Steps 1. Chunk documents (500–1000 tokens, overlap 50–100). 2. Embed chunks and store in a vector DB. 3. On query, embed the query, retrieve top-k chunks. 4. Pass chunks + query to the LLM and return the answer.

Tips Keep chunks meaningful (e.g. by section). Add metadata (source, page) for citations. Tune top-k and chunk size on a small test set.

You can ship a first version in a weekend; iterate on chunking and model choice for quality.

Related tools

AI tools mentioned in this post. Try them out.

  • 퍼플렉시티

    인용이 있는 AI 검색 엔진. 리서치·요약 한 번에.

    Visit site
  • 클로드

    긴 문서·코드·추론에 강한 Anthropic AI. 넓은 컨텍스트.

    Visit site