RAG (Retrieval-Augmented Generation)

Introduction

There are three main mode for AI systems:

  1. Retrieval: The system retrieves relevant information from a database or knowledge base.
  2. Generation: The system generates new content based on the input it receives.
  3. Action: The system takes actions based on the input it receives. Like MCP, Agent, etc. Like Camel, the agent can interact with the environment and take actions based on the input it receives.

The RAG is welled grounded by the time I write this article and it includes mainly the first two modes.

How Rag works:

  1. Question: The user asks a question or provides input to the system.
  2. Retrieval: The system retrieves relevant information from a database or knowledge base, where the chunking and embedding are used to find the most relevant information. The knowlege are often saved as vector database, like Pinecone, Weaviate, etc. The embedding is often done by LLMs, like OpenAI’s text-embedding-ada-002.
  3. Generation: The system generates a response based on the retrieved information and the input it received.

Components of RAG

  1. Embedding: The process of converting text into a numerical representation that can be used by the system.
  2. Vector Database: A database that stores the embeddings and allows for efficient retrieval of relevant information.
  3. Retriver: The component that retrieves relevant information from the vector database based on the input it receives.
  4. Generator: The component that generates a response based on the retrieved information and the input it received.

The pipeline of Indexing

  1. Load: Load different format of data into the system. The data can be in different format, like TXT, PDF, CSV, etc.
  2. Chunk: The data is chunked into smaller pieces to make it easier to process and retrieve relevant information.
  3. Embed: The chunked data is embedded into a numerical representation that can be used by the system.
  4. Store: The embedded data is stored in a vector database for efficient retrieval.

The pipeline of Retrieval and generation

  1. Query: The user asks a question or provides input to the system. which include the clarification of the question, split the question into smaller pieces, and rephrase the question to make it easier to retrieve.
  2. Retrieve: Select and ranking the relevant information from the vector database based on the input it receives. The retriever can be a simple keyword search or a more complex semantic search.

The pipeline of Generation

The retrieved information is provided to the generator together with the input question. The GenAI can then do the reponse generation. The generator can be a simple template-based system or a more complex LLM-based system.