Close Menu
Edu Expertise Hub
    Facebook X (Twitter) Instagram
    Friday, May 9
    • About us
    • Contact
    • Submit Coupon
    Facebook X (Twitter) Instagram YouTube
    Edu Expertise Hub
    • Home
    • Udemy Coupons
    • Best Online Courses and Software Tools
      • Business & Investment
      • Computers & Internet
      • eBusiness and eMarketing
    • Reviews
    • Jobs
    • Latest News
    • Blog
    • Videos
    Edu Expertise Hub
    Home » Latest News » Understanding RAG architecture and its fundamentals
    Latest News

    Understanding RAG architecture and its fundamentals

    TeamBy TeamMarch 30, 2025No Comments10 Mins Read23 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Understanding RAG architecture and its fundamentals
    Share
    Facebook Twitter LinkedIn Pinterest Email


    All the large language model (LLM) publishers and suppliers are focusing on the advent of artificial intelligence (AI) agents and agentic AI. These terms are confusing. All the more so as the players do not yet agree on how to develop and deploy them.

    This is much less true for retrieval augmented generation (RAG) architectures where, since 2023, there has been widespread consensus in the IT industry.

    Augmented generation through retrieval enables the results of a generative AI model to be anchored in truth. While it does not prevent hallucinations, the method aims to obtain relevant answers, based on a company’s internal data or on information from a verified knowledge base.

    It could be summed up as the intersection of generative AI and an enterprise search engine.

    What is RAG architecture?

    Initial representations of RAG architectures do not shed any light on the essential workings of these systems.

    Broadly speaking, the process of a RAG system is simple to understand. It starts with the user sending a prompt – a question or request. This natural language prompt and the associated query are compared with the content of the knowledge base. The results closest to the request are ranked in order of relevance, and the whole process is then sent to an LLM to produce the response sent back to the user.

    The companies that have tried to deploy RAG have learned the specifics of such an approach, starting with support for the various components that make up the RAG mechanism. These components are associated with the steps required to transform the data, from ingesting it into a source system to generating a response using an LLM.

    Data preparation, a necessity even with RAG

    The first step is to gather the documents you want to search. While it is tempting to ingest all the documents available, this is the wrong strategy. Especially as you have to decide whether to update the system in batch or continuously.

    “Failures come from the quality of the input. Some customers say to me: ‘I’ve got two million documents, you’ve got three weeks, give me a RAG’. Obviously, it doesn’t work,” says Bruno Maillot, director of the AI for Business practice at Sopra Steria Next. “This notion of refinement is often forgotten, even though it was well understood in the context of machine learning. Generative AI doesn’t make Chocapic”.

    An LLM is not de facto a data preparation tool. It is advisable to remove duplicates and intermediate versions of documents and to apply strategies for selecting up-to-date items. This pre-selection avoids overloading the system with potentially useless information and avoids performance problems.

    Once the documents have been selected, the raw data – HTML pages, PDF documents, images, doc files, etc – needs to be converted into a usable format, such astext and associated metadata, expressed in a JSON file, for example. This metadata can not only document the structure of the data, but also its authors, origin, date of creation, and so on. This formatted data is then transformed into tokens and vectors.

    Publishers quickly realised that with large volumes of documents and long texts, it was inefficient to vectorise the whole document.

    Chunking and its strategies

    Hence the importance of implementing a “chunking” strategy. This involves breaking down a document into short extracts. A crucial step, according to Mistral AI, which says, “It makes it easier to identify and retrieve the most relevant information during the search process”.

    There are two considerations here – the size of the fragments and the way in which they are obtained.

    The size of a chunk is often expressed as a number of characters or tokens. A larger number of chunks improves the accuracy of the results, but the multiplication of vectors increases the amount of resources and time required to process them.

    There are several ways of dividing a text into chunks.

    • The first is to slice according to fragments of fixed size – characters, words or tokens. “This method is simple, which makes it a popular choice for the initial phases of data processing where you need to browse the data quickly,” says Ziliz, a vector database editor.
    • A second approach consists of a semantic breakdown – that is, based on a “natural” breakdown: by sentence, by section – defined by an HTML header for example – subject or paragraph. Although more complex to implement, this method is more precise. It often depends on a recursive approach, since it involves using logical separators, such as a space, comma, full stop, heading, and so on.
    • The third approach is a combination of the previous two. Hybrid chunking combines an initial fixed breakdown with a semantic method when a very precise response is required.

    In addition to these techniques, it is possible to chain the fragments together, taking into account that some of the content of the chunks may overlap.

    “Overlap ensures that there is always some margin between segments, which increases the chances of capturing important information even if it is split according to the initial chunking strategy,” according to documentation from LLM platform Cohere. “The disadvantage of this method is that it generates redundancy.

    The most popular solution seems to be to keep fixed fragments of 100 to 200 words with an overlap of 20% to 25% of the content between chunks.

    This splitting is often done using Python libraries, such as SpaCy or NTLK, or with the “text splitters” tools in the LangChain framework.

    The right approach generally depends on the precision required by users. For example, a semantic breakdown seems more appropriate when the aim is to find specific information, such as the article of a legal text.

    The size of the chunks must match the capacities of the embedding model. This is precisely why chunking is necessary in the first place. This “allows you to stay below the input token limit of the embedding model”, explains Microsoft in its documentation. “For example, the maximum length of input text for the Azure OpenAI text-embedding-ada-002 model is 8,191 tokens. Given that one token corresponds on average to around four characters with current OpenAI models, this maximum limit is equivalent to around 6,000 words”.

    Vectorisation and embedding models

    An embedding model is responsible for converting chunks or documents into vectors. These vectors are stored in a database.

    Here again, there are several types of embedding model, mainly dense and sparse models. Dense models generally produce vectors of fixed size, expressed in x number of dimensions. The latter generate vectors whose size depends on the length of the input text. A third approach combines the two to vectorise short extracts or comments (Splade, ColBERT, IBM sparse-embedding-30M).

    The choice of the number of dimensions will determine the accuracy and speed of the results. A vector with many dimensions captures more context and nuance, but may require more resources to create and retrieve. A vector with fewer dimensions will be less rich, but faster to search.

    The choice of embedding model also depends on the database in which the vectors will be stored, the large language model with which it will be associated and the task to be performed. Benchmarks such as the MTEB ranking are invaluable. It is sometimes possible to use an embedding model that does not come from the same LLM collection, but it is necessary to use the same embedding model to vectorise the document base and user questions.

    Note that it is sometimes useful to fine-tune the embeddings model when it does not contain sufficient knowledge of the language related to a specific domain, for example, oncology or systems engineering.

    The vector database and its retriever algorithm

    Vector databases do more than simply store vectors – they generally incorporate a semantic search algorithm based on the nearest-neighbour technique to index and retrieve information that corresponds to the question. Most publishers have implemented the Hierarchical Navigable Small Worlds (HNSW) algorithm. Microsoft is also influential with DiskANN, an open source algorithm designed to obtain an ideal performance-cost ratio with large volumes of vectors, at the expense of accuracy. Google has chosen to develop a proprietary model, ScANN, also designed for large volumes of data. The search process involves traversing the dimensions of the vector graph in search of the nearest approximate neighbour, and is based on a cosine or Euclidean distance calculation.

    The cosine distance is more effective at identifying semantic similarity, while the Euclidean method is simpler, but less demanding in terms of computing resources.

    Since most databases are based on an approximate search for nearest neighbours, the system will return several vectors potentially corresponding to the answer. It is possible to limit the number of results (top-k cutoff). This is even necessary, since we want the user’s query and the information used to create the answer to fit within the LLM context window. However, if the database contains a large number of vectors, precision may suffer or the result we are looking for may be beyond the limit imposed.

    Hybrid search and reranking

    Combining a traditional search model such as BM25 with an HNSW-type retriever can be useful for obtaining a good cost-performance ratio, but it will also be limited to a restricted number of results. All the more so as not all vector databases support the combination of HNSW models with BM25 (also known as hybrid search).

    A reranking model can help to find more content deemed useful for the response. This involves increasing the limit of results returned by the “retriever” model. Then, as its name suggests, the reranker reorders the chunks according to their relevance to the question. Examples of rerankers include Cohere Rerank, BGE, Janus AI and Elastic Rerank. On the other hand, such a system can increase the latency of the results returned to the user. It may also be necessary to re-train this model if the vocabulary used in the document base is specific. However, some consider it useful – relevance scores are useful data for supervising the performance of a RAG system.

    Reranker or not, it is necessary to send the responses to the LLMs. Here again, not all LLMs are created equal – the size of their context window, their response speed and their ability to respond factually (even without having access to documents) are all criteria that need to be evaluated. In this respect, Google DeepMind, OpenAI, Mistral AI, Meta and Anthropic have trained their LLMs to support this use case.

    Assessing and observing

    In addition to the reranker, an LLM can be used as a judge to evaluate the results and identify potential problems with the LLM that is supposed to generate the response. Some APIs rely instead on rules to block harmful content or requests for access to confidential documents for certain users. Opinion-gathering frameworks can also be used to refine the RAG architecture. In this case, users are invited to rate the results in order to identify the positive and negative points of the RAG system. Finally, observability of each of the building blocks is necessary to avoid problems of cost, security and performance.

    This post is exclusively published on eduexpertisehub.com

    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Team

      Related Posts

      US tells CNI orgs to stop connecting OT kit to the web

      May 8, 2025

      Pre-K Spending and Enrollment Reach All-Time High, But Quality Concerns Remain

      May 8, 2025

      Ignite Reading Partners with UF Lastinger Center’s Florida Tutoring Advantage

      May 7, 2025

      UK hands Indian IT suppliers competitive boost in trade deal

      May 7, 2025

      Every Student Deserves High-Quality Computer Science Education

      May 7, 2025

      With VR goggles, students in detention centers gain career training

      May 6, 2025
      Courses and Software Tools

      Extreme Privacy: What It Takes to Disappear

      August 24, 202436 Views

      Modern C++ Programming Cookbook: Master Modern C++ with comprehensive solutions for C++23 and all previous standards

      September 18, 202423 Views

      Meebook E-Reader M7 | 6.8′ Eink Carta Screen | 300PPI Smart Light | Android 11 | Ouad Core Processor | Out Speaker | Support Google Play Store | 3GB+32GB Storage | Micro-SD Slot | Gray

      August 19, 202421 Views

      Coders at Work: Reflections on the Craft of Programming

      April 19, 202516 Views

      Bigme inkNote Color + Lite Eink Tablet 10.3″ eBook Reader 4G 64GB eReader for Reading and Writing ePaper Tablet Digital Notepad with Stylus and Cover

      June 13, 202413 Views
      Reviews

      Model Context Protocol(MCP) Implementation in C# | Udemy Coupons 2025

      May 9, 2025

      Locum Physician (MD/DO) – Anesthesiology in Bemidji, MN

      May 9, 2025

      The Basics of Process Improvement

      May 9, 2025

      Improve Your Social Skills

      May 9, 2025

      Motorola Edge + |2022| 4800mAh Battery | Unlocked | Made for US 8/512GB | 50MP Camera | Cosmos Blue

      May 9, 2025
      Stay In Touch
      • Facebook
      • YouTube
      • TikTok
      • WhatsApp
      • Twitter
      • Instagram
      Latest News

      US tells CNI orgs to stop connecting OT kit to the web

      May 8, 2025

      Pre-K Spending and Enrollment Reach All-Time High, But Quality Concerns Remain

      May 8, 2025

      Ignite Reading Partners with UF Lastinger Center’s Florida Tutoring Advantage

      May 7, 2025

      UK hands Indian IT suppliers competitive boost in trade deal

      May 7, 2025

      Every Student Deserves High-Quality Computer Science Education

      May 7, 2025
      Latest Videos

      Cybersecurity has high scope in government jobs! (Tamil) | cyber security career

      May 8, 2025

      Why Pursue A Career In Digital Marketing?

      May 7, 2025

      Want to be a Certified Ethical Hacker? #ethicalhackingtraining#cybersecuritycourses #ethicalhacking

      May 6, 2025

      4 Best Courses to do before pursuing a Career in Finance

      May 5, 2025

      Kaunsa Course Sahi? #shortvideo #digitalmarketing #career

      May 4, 2025
      Latest Jobs

      Locum Physician (MD/DO) – Anesthesiology in Bemidji, MN

      May 9, 2025

      Registered Behavioral Technician (RBT) – Audubon School

      May 9, 2025

      Administrative Coordinator II

      May 8, 2025

      AICS Valuations, AVP

      May 8, 2025

      Testing Technical Project Manager

      May 8, 2025
      Legal
      • Home
      • Privacy Policy
      • Cookie Policy
      • Terms and Conditions
      • Disclaimer
      • Affiliate Disclosure
      • Amazon Affiliate Disclaimer
      Latest Udemy Coupons

      Mastering Maxon Cinema 4D 2024: Complete Tutorial Series | Udemy Coupons 2025

      August 22, 202434 Views

      Advanced Program in Human Resources Management | Udemy Coupons 2025

      April 5, 202530 Views

      Diploma in Aviation, Airlines, Air Transportation & Airports | Udemy Coupons 2025

      March 21, 202528 Views

      Time Management and Timeboxing in Business, Projects, Agile | Udemy Coupons 2025

      April 2, 202521 Views

      Digital Platforms and Ecosystems Business and Partnership | Udemy Coupons 2025

      March 29, 202520 Views
      Blog

      4 Phrases To Never Include On Your Resume

      May 8, 2025

      How To Start A Conversation With A LinkedIn Connection

      May 7, 2025

      8 Mistakes Companies Make During Layoffs

      May 4, 2025

      How To End Your Week On A Positive Note

      May 3, 2025

      How To Optimize Your LinkedIn Profile For Job Search Success

      May 2, 2025
      Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
      © 2025 All rights reserved!

      Type above and press Enter to search. Press Esc to cancel.

      We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
      .
      SettingsAccept
      Privacy & Cookies Policy

      Privacy Overview

      This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
      Necessary
      Always Enabled
      Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
      Non-necessary
      Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
      SAVE & ACCEPT