Extract Table Info From SCANNED PDF & Summarise It Using Llama3.1 via Ollama | LangChain

preview_player
Показать описание
In this video, I will explain you how to extract table data from scanned PDF and use that to summarise the table content using Llama3 model via Ollama. Also as a bonus, I will demonstrate how to convert the data into pandas df for further exploration if needed. Enjoy 😎

80% of enterprise data exists in difficult-to-use formats like HTML, PDF, CSV, PNG, PPTX, and more. Unstructured effortlessly extracts and transforms complex data for use with every major vector database and LLM framework.

Link ⛓️‍💥

Code 👨🏻‍💻
------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------
🤝 Connect with me:

#unstructureddata #llama3 #langchain #ollama #unstructuredio #llm #datasciencebasics
Рекомендации по теме
Комментарии
Автор

Thank you for this video. It would be good if you try the same with image as well. Images are not extracted properly on scanned copy. can you recommend any other packages help to extract images even better?

Jeganbaskaran
Автор

Thanks. I think unstructured is not open source. Can you suggest any pdf to data library which is completely free, such as tabula-py or pdfplumber? Have you tested with these or anything else which performs better?

stanTrX
Автор

Sir can you make a video on LangGraph and for Agents...

arpittalmale