Extract Tables + Texts from .htm pages for RAG Using LLAMA-INDEX & UNSTRUCTURED

preview_player
Показать описание
In this video, I will show you how to chat with .htm pages which contains text as well as tables. We will be using llamaindex, openai, and Unstructured.

Happy Learning 😎

👉🏼 Links:

------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------
🔗 🎥 Other videos you might find helpful:

------------------------------------------------------------------------------------------
🤝 Connect with me:

#llamaindex #llm #rag #semistructuredrag #datasciencebasics
Рекомендации по теме
Комментарии
Автор

🎯 Key Takeaways for quick navigation:

00:00 📚 Introduction to the video topic
- Introduction to the video topic, which is about extracting information from .htm files using llama index and unstructured.
- The speaker mentions that the previous video was about extracting information from PDF files.
- The speaker will use a 10K filing of Tesla as an example for this video.
01:09 📖 Explanation of the process
- Explanation of how the information extraction process works.
- The speaker explains how the document is partitioned into different elements and how tables are identified and converted into data frames.
- The speaker also explains how the information is indexed and retrieved.
04:14 💻 Code implementation
- The speaker starts the code implementation part of the video.
- The speaker explains how to install llama index and how to extract data.
- The speaker also explains how to read .htm files and how to extract tables from the data.
07:14 🗝️ Setting up the API key
- The speaker explains how to get the API key and how to use it in the code.
- The speaker also explains how to create a pickle file and how to extract information from the documents.
09:27 🔄 Setting up the recursive retriever
- The speaker explains how to set up the recursive retriever.
- The speaker explains how to create a vector index and how to create a vector retriever.
- The speaker also explains how to create a query engine.
11:14 📊 Asking questions to the system
- The speaker starts asking questions to the system to demonstrate how it works.
- The speaker asks questions that require information from tables and questions that do not require information from tables.
- The speaker also compares the answers given by the system when using the recursive retriever and when not using it.
14:27 🎬 Conclusion of the video
- The speaker concludes the video by summarizing what was covered.
- The speaker mentions that he is exploring different RAG implementations and encourages viewers to do the same.
- The speaker also mentions that he is enjoying creating videos about unstructured and believes it is revolutionizing RAG implementations.

Made with HARPA AI

twoplustwo
Автор

Thanks for the video. It would be nice to see a video on how to integrate our structured data into RAG in real world application. I think the majority of all companies have their data in structured sources. We always need to access these sources in conjunction with our LLM applications. Feeding sales data or anything into the LLM application process for visualization and decision making.

Pure_Science_and_Technology
Автор

Can you explain how we can persist these indexes into a vector database like Milvus?

harshsavasil
Автор

Is anyone else getting: "Embeddings have been explicitly disabled. Using MockEmbedding." After pickle.dump? As a result node_mappings_2021 is empty and I can properly retrieve table data

AndresBribiesca
Автор

can I switch the Retriever to a multi vector retriever adding a raw pdf?

tom_
Автор

Hi! This is helpful! Can you help provide code where we can modify the default llm and embed model?

RaymondCruzin
Автор

Great video! It is just what I need, but do you know if it still works? It doesn't create the pickle file at all (raw_nodes_2021).

Автор

The code doesnot work. It produce error in indexing the nodes in the line

salwamostafa
Автор

Superr video as always. I am having images inside pdf/docx and tables inside csv, xlsx whether unstructured using llms will help on this. Can you share the link for tgat website, i will explore...

VenkatesanVenkat-fdhg
Автор

I hav some question on my career growth. I am working as senior data scientist(5 yrs exp)/started to explore much on databricks by having inspiration from your playlist. Later, I hav planned to learn much on data engineering tools. Whether Data scientist can concentrate on this as self improvement.

VenkatesanVenkat-fdhg
Автор

Kindly update the notebook. It is not working.

kelvinadungosi
Автор

shouldnt it be possible to use another llm apart from open ai ?

timtensor