Industrial-scale Web Scraping with AI & Proxy Networks

preview_player
Показать описание
Learn advanced web scraping techniques with Puppeteer and BrightData's scraping browser. We collect ecommerce data from sites like Amazon then analyze that data with ChatGPT.

#javascript #datascience #chatgpt

Рекомендации по теме
Комментарии
Автор

I like how he didn't use "cheap" during the entire video because my god the pricing is absolutely madness on the advertised product

rvft
Автор

As a freelance dev I get contacted all the time for scraping, it's definitely one of the most requested along with Wordpress (which I also dont work with)

albiceleste
Автор

Your videos are somehow exactly relevant to the code I am writing every week - interesting for sure!

Maneki-Nico
Автор

This reminds me of when I solved 100 captchas manually so that I could download some data files from a website for an ai. I got a sever message temporarily banning me from the website saying that I must be a bot. I learned my lesson and stuck to only solving 99 captchas each day from then on until I had enough data files

alexcasillas
Автор

As a web scraping tool developer, one thing to note about the chatGPT code about extracting product names etc is that it's not going to work on all cases. What I mean by that is we can see there are some random class names like '._cDEzb'. And these classes can vary from page to page. So your code for one listing page, might not work for other. The way I do this is using some advanced query selectors that don't rely on unreliable classes. Can go into more detail if required.

EliteGamerpk
Автор

To be frank out of all youtubers Fireship has most interesting and to the point videos and gives most value out of time spend. Kind of just wondering how he keeps track of all the varied topics and able to make most out of it.

yashkhd
Автор

toward the end of the video, Jeff suggests that you can grab all the links and then make requests to those links. it gave me flashbacks of another video on the main channel where a company did this and ended up with a 70k+ GCP bill after one night of web scraping, because their computing instance was forever recursing and was scalable up to 1000 instances lmao

YuriG
Автор

Thanks Jeff. I was planning on building a project that uses web scraping and this video absolutely dropped at the perfect time. Appreciate it. I love your videos and hope for more such content in the future :)

prabhavkhera
Автор

If I'm not mistaken, already returns the element handle, so you don't need to use page.$(selector) after that.
Anyway, great video, as always.
Thank you! ❤

xanderbarkhatov
Автор

Zeus Proxy's specific emphasis on session management is a key factor that resonates with my goal of executing data retrieval tasks with a focus on mimicking genuine user behaviors.

Loubensdoriscar
Автор

I love how there are legit businesses to bypass captchas and mess up with data :)

meansnada
Автор

Web scraping is still my favourite type of projects it's so fun and "meaningful" to me and with the help of AI i can see it becoming much much easier

wtfdoiputhere
Автор

An extraordinary piece of video material that has proven highly useful for our new team members. Your generosity is immensely appreciated!

Autoscraping
Автор

I remembered that web scrapping was a nightmare to deal with, specially doing this proxy rotation by ourselves. This tool is not cheap, though, so at least here in Brazil (and other emerging countries alike), companies will still be doing that like the old days. The captcha solving was actually done by real people at the time I worked in a company that mined those kind of data a few years ago, but I guess this can be automated with GPT-4 tools now

DanielLavedoniodeLima_DLL
Автор

Man, you are reading my thoughts! this video came at the right time when I wanted to scrape some websites!!!!

Rufeg
Автор

You've foiled my plan 5 years in the making. At least now I have a free 10$ credit for Brightdata to catch up. Thanks Fireship!

Jeanseb
Автор

Wow, you’re on the cutting edge of technology 🤯

shawnvirdree
Автор

Thank you for teaching me puppeteer and bright data, beats all content on internet

bpdxirw
Автор

One thing jeff is that these websites change css class names on every refresh. So it's better to write code with selectors that don't change like id or aria label.

BharadwajGiridhar
Автор

I love you man i was trying for so long and you are the only one who gave the solution thank you so much

CODE_YOUR_TYPE