How to Scrape and Extract Data with Langchain GPT Function Calling

preview_player
Показать описание
the extractooor

#gpt4 #gpt3 #ai #python #webscraping
Рекомендации по теме
Комментарии
Автор

love the videos! Enjoying the straight to the point and fun commenatry. Very honest and very helpful!

fortestingpurposesonly
Автор

Just wanted to drop a comment to say thank you for creating and sharing this insightful video on how to use Langchain and chatGPT for web scraping and data extraction. The step-by-step demonstration using Python, Beautiful Soup, and Playwright was clear and extremely easy to follow.
Keep up the excellent work and I'm looking forward to your future content. Thanks again!

CalmCascade.
Автор

Great video. I learn so much just from reading other people's code.

jsfnnyc
Автор

I accidentally landed and now subscribed... you're lit 🔥

SigmaScorpion
Автор

i am getting a not implemented error from running the async playwright function. unable to figure out why

uiucdsc
Автор

hey, you don't have to re-declare the function on each cell!
also i would like to see if generating the schemas can also be done using the openai api

qnskjyp
Автор

I tried using function calling for invoice data extraction, but when the schema content and description got big I noticed a weird regression where the gpt will return a weird {text:nonsense} instead of the valid schema, for reference I was using gpt 3.5 1106

MohamedJemai-pwgn
Автор

how can we use vectore store as input for langchain extraction chain ?

aadhilimam
Автор

Rate limite exceeded error from langchain after several tries what you recommend Tyler

mertzorlu
Автор

also one needs to fidget with selenium or playwright instead of bs4 to navigate to/from pages

qnskjyp
Автор

You can also give the option to save in the CSV file

shivamkumar-qpjm
Автор

output=await run_player ("url") fails for me ;dont know why .I even installed asyncio and tried, if i remove await there it does not give correct output .everything else is fine, why is this happening

hccuwwi
Автор

Bro. 😂 i came for copper but i found G O L D.

MarxOrx
Автор

Love your videos. Have you thought about making the code avalible through google collabs?

nicolasmartinez
Автор

is this a better method than gpt-engineer ?

ronm
Автор

Any possibility to use an open source LLM to achieve similar results?

whackojaco
Автор

So just to be sure I understand this correctly...
- It will only scrape one page at a time, it won't do a full directory (say, a shared folder of Google docs)
- You have to know in advance what information you want; it looks for that specifically and generates output based on your query
- You cannot have it scan a number of pages/documents and *then* ask various questions about the content
- The info that it scrapes is not persistent from one query to the next, much less from one session to the next
- The scraped data is private to you, it does not get fed back into the model

Is that right?

Backstory: I'm an author. I'm looking for a way to feed all my manuscripts and copious notes, timelines, plot outlines, etc into a LLM and then be able to ask it questions about the content. Sort-of a virtual assistant dynamic story bible that helps me keep all my details straight without having to take time to dig for the info myself. (Like "What color are Karen's eyes?" or "In which book did Joe meet Captain Huffer for the first time?")
I'm thinking GPT4All is my best bet for now, but boy all the demos I've seen of it are horrendously slow. I haven't yet found an online-hosted model that will
1) take that much data (we're talking multiple 100k-word novels, plus notes) and
2) keep it private to me, not feed it back into the model. (If you know of one, please tell! :) )

carriebartkowiak