Building Production RAG Over Complex Documents
Databricks
@databricksAbout
Databricks is the Data and AI company. More than 20,000 organizations worldwide — including adidas, AT&T, Bayer, Block, Mastercard, Rivian, Unilever, and over 60% of the Fortune 500 — rely on Databricks to build and scale data and AI apps, analytics and agents. Headquartered in San Francisco with 30+ offices around the globe, Databricks offers a unified Data Intelligence Platform that includes Agent Bricks, Genie, Lakebase, Lakeflow, Lakehouse, and Unity Catalog.
Latest Posts
Video Description
Large Language Models (LLMs) are revolutionizing how users search for, interact with, and generate new content. Some recent stacks and toolkits around Retrieval-Augmented Generation (RAG) have emerged, enabling users to build applications such as chatbots using LLMs on their private data. However, while setting up naive RAG is straightforward, building production RAG is very challenging, especially as users scale to larger and more complex data sources. A classic example is a large number of PDFs with embedded tables. RAG is only as good as your data, and developers must carefully consider how to parse, ingest, and retrieve their data to successfully build RAG over complex documents. This session provides an in-depth exploration of this entire process; you will get an overview of the process around building a RAG pipeline that can handle messy, complicated PDF documents. This includes implementing a parsing strategy for parsing a complex document with embedded objects. This consists of an indexing strategy to process these documents beyond simple chunking techniques. We will then explore various advanced retrieval algorithms to handle questions about the tabular and unstructured data and discuss their use cases and tradeoffs. Talk By: Jerry Liu, Co-founder and CEO, LlamaIndex Here's more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data… Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc
Elevate Your Data Workflow Now
AI-recommended products based on this video

Apple 2025 MacBook Air 13-inch Laptop with M4 chip: Built for Apple Intelligence, 16GB Unified Memory, 256GB SSD Storage, Touch ID; Sky Blue - English Keyboard

TERRAMASTER F8 SSD Plus NAS - 8Bay All SSD NAS Storage Core i3 8-Core 8-Thread CPU, 16GB DDR5 RAM, 10GbE Port, 8 Heat Sinks Included, Palm-Sized Network Attached Storage Peak Performance (Diskless)

Lenovo IdeaPad Slim 3i 15.6" FHD Laptop, Intel i3-N305, 8GB LPDDR5, 416GB Storage (256GB SSD + 160GB Docking Station Set), Intel UHD Graphics, Numeric keypad, Dolby Audio, Wi-Fi 6, Win 11 Pro, Gray

Lenovo IdeaPad Slim 3i 15.6" FHD Laptop, Intel i3-N305, 8GB LPDDR5, 1.16TB Storage (1TB SSD + 160GB Docking Station Set), Intel UHD Graphics, Numeric keypad, Dolby Audio, Wi-Fi 6, Win 11 Pro, Gray

OWC 64GB DDR5 4800 PC5-38400 CL40 2Rx4 288-pin 1.1V ECC Registered RDIMM Memory RAM Module Upgrade Compatible with Dell PowerEdge R660XS R760XS

OWC 64GB DDR5 4800 PC5-38400 CL40 2Rx4 288-pin 1.1V ECC Registered RDIMM Memory RAM Module Upgrade Compatible with Dell PowerEdge XE9680

OWC 64GB DDR5 4800 PC5-38400 CL40 2Rx4 288-pin 1.1V ECC Registered RDIMM Memory RAM Module Upgrade Compatible with Dell PowerEdge R6625 R760 R7615 R7625




















