Extract Tables + Texts from .htm pages for RAG Using LLAMA-INDEX & UNSTRUCTURED

Показать описание

In this video, I will show you how to chat with .htm pages which contains text as well as tables. We will be using llamaindex, openai, and Unstructured.

Happy Learning 😎

👉🏼 Links:

------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------
🔗 🎥 Other videos you might find helpful:

------------------------------------------------------------------------------------------
🤝 Connect with me:

#llamaindex #llm #rag #semistructuredrag #datasciencebasics

Рекомендации по теме

Комментарии

🎯 Key Takeaways for quick navigation:

00:00 📚 Introduction to the video topic
- Introduction to the video topic, which is about extracting information from .htm files using llama index and unstructured.
- The speaker mentions that the previous video was about extracting information from PDF files.
- The speaker will use a 10K filing of Tesla as an example for this video.
01:09 📖 Explanation of the process
- Explanation of how the information extraction process works.
- The speaker explains how the document is partitioned into different elements and how tables are identified and converted into data frames.
- The speaker also explains how the information is indexed and retrieved.
04:14 💻 Code implementation
- The speaker starts the code implementation part of the video.
- The speaker explains how to install llama index and how to extract data.
- The speaker also explains how to read .htm files and how to extract tables from the data.
07:14 🗝️ Setting up the API key
- The speaker explains how to get the API key and how to use it in the code.
- The speaker also explains how to create a pickle file and how to extract information from the documents.
09:27 🔄 Setting up the recursive retriever
- The speaker explains how to set up the recursive retriever.
- The speaker explains how to create a vector index and how to create a vector retriever.
- The speaker also explains how to create a query engine.
11:14 📊 Asking questions to the system
- The speaker starts asking questions to the system to demonstrate how it works.
- The speaker asks questions that require information from tables and questions that do not require information from tables.
- The speaker also compares the answers given by the system when using the recursive retriever and when not using it.
14:27 🎬 Conclusion of the video
- The speaker concludes the video by summarizing what was covered.
- The speaker mentions that he is exploring different RAG implementations and encourages viewers to do the same.
- The speaker also mentions that he is enjoying creating videos about unstructured and believes it is revolutionizing RAG implementations.

Made with HARPA AI

twoplustwo

Thanks for the video. It would be nice to see a video on how to integrate our structured data into RAG in real world application. I think the majority of all companies have their data in structured sources. We always need to access these sources in conjunction with our LLM applications. Feeding sales data or anything into the LLM application process for visualization and decision making.

Pure_Science_and_Technology

Can you explain how we can persist these indexes into a vector database like Milvus?

harshsavasil

Is anyone else getting: "Embeddings have been explicitly disabled. Using MockEmbedding." After pickle.dump? As a result node_mappings_2021 is empty and I can properly retrieve table data

AndresBribiesca

can I switch the Retriever to a multi vector retriever adding a raw pdf?

tom_

Hi! This is helpful! Can you help provide code where we can modify the default llm and embed model?

RaymondCruzin

Great video! It is just what I need, but do you know if it still works? It doesn't create the pickle file at all (raw_nodes_2021).

The code doesnot work. It produce error in indexing the nodes in the line

salwamostafa

Superr video as always. I am having images inside pdf/docx and tables inside csv, xlsx whether unstructured using llms will help on this. Can you share the link for tgat website, i will explore...

VenkatesanVenkat-fdhg

I hav some question on my career growth. I am working as senior data scientist(5 yrs exp)/started to explore much on databricks by having inspiration from your playlist. Later, I hav planned to learn much on data engineering tools. Whether Data scientist can concentrate on this as self improvement.

VenkatesanVenkat-fdhg

Kindly update the notebook. It is not working.

kelvinadungosi

shouldnt it be possible to use another llm apart from open ai ?

timtensor

Extract Tables + Texts from .htm pages for RAG Using LLAMA-INDEX & UNSTRUCTURED

Extract Data from PDFs Easily & Quickly (table form/image/text/pages)

Extract Tables + Texts from .htm pages for RAG Using LLAMA-INDEX & UNSTRUCTURED

Extract text, links, images, tables from Pdf with Python | PyMuPDF, PyPdf, PdfPlumber tutorial

Amazon Textract - Extracting text, tables and forms from documents

Extract Tables and Texts from PDF Easily

Extract Tables from Image Documents | Paddle Paddle | Paddleocr | OCR | Text Extraction |

Extract Table and Text from image

Extract All the Tables From PDF in 3 minutes With Python

Document Automation Training Camp: Learning Instances & Extraction Methods

Extract Tables Containing Text from PDF using PDF.co and Integromat

Extract Tables Containing Text from PDF using PDF.co and Zapier

PDF Extractor SDK - C# - Extract Table Structure

How To: Extract Table From Image In Python (OpenCV & OCR)

Extract Tables from PDFs

Extracting Tables from PDF with ChatGPT

Extracting Tables from PDF | Automate Everything with Python

How to Extract Tables from PDF using Python

Intro to PDF Text & Table Extraction - Anna Godwin

Extract Tables from PDF and convert to Excel sheet with Paddle OCR text detection and recognition.

Realtime Multimodal RAG Usecase Part 1 | Extract Image,Table,Text from Documents #rag #multimodal

Extract Tables from PDF and Image Documents Automatically Using Advanced AI | Deep Learning

Best Way to Extract Tables from PDF with LLMs

Extract Table with Text from PDF using PDF.co API in PHP

Extract Texts or Tables from CAD Drawings