The video discusses two prominent AI buzzwords: agentic AI and retrieval augmented generation (RAG), addressing common misconceptions about their use cases and effectiveness. Agentic AI refers to multi-agent workflows where AI agents perceive their environment, make decisions, and execute actions with minimal human intervention. These agents operate in a loop of perceiving, reasoning, acting, and observing, often communicating and collaborating at the application level. A popular use case for agentic AI today is coding agents, which function like a mini developer team with roles such as architect, implementer, and reviewer, helping to plan, write, and review code autonomously but still requiring some human oversight.
Beyond coding, agentic AI is also valuable in enterprise scenarios like handling support tickets or HR requests. Specialized agents can autonomously filter queries and route them to the appropriate tool or service using protocols like the model context protocol, which standardizes interactions between large language models (LLMs) and external tools. This allows agents to be responsive within their environment rather than relying solely on user-initiated chat prompts. However, a significant challenge for agentic AI is the risk of hallucination or misinformed decisions without reliable access to up-to-date external information.
This is where retrieval augmented generation (RAG) comes into play. RAG is a two-phase system involving an offline phase where documents are ingested, chunked, and converted into vector embeddings stored in a vector database, and an online phase where user queries are converted into embeddings to retrieve the most relevant document chunks. These chunks are then fed into an LLM to generate accurate responses. While powerful, scaling RAG can be tricky because increasing the amount of retrieved data can lead to noise, redundancy, and higher costs, sometimes even degrading performance. Therefore, careful data curation and context engineering are essential to optimize the quality and relevance of the information provided to the LLM.
Data ingestion for RAG involves converting various document types into machine-readable formats enriched with metadata, ensuring that not only text but also tables, graphs, and images are properly processed. Context engineering further refines retrieval by combining semantic search with keyword matching, re-ranking results for relevance, and merging related chunks to create a coherent and prioritized context for the LLM. This approach improves accuracy, reduces inference time, and lowers AI operational costs, making RAG applications more efficient and effective.
Finally, the video highlights that local, open-source models can power both RAG and agentic AI applications, offering benefits like data sovereignty and cost savings. Tools such as vLLM and Llama C++ enable developers to maintain API compatibility with proprietary models while optimizing runtime performance through techniques like KV caching. In conclusion, while agentic AI combined with RAG can be a powerful solution, their success depends on thoughtful implementation and context, reinforcing the consultant’s classic answer: “it depends.”
