Industrial-scale Web Scraping with AI & Proxy Networks

Показать описание

Learn advanced web scraping techniques with Puppeteer and BrightData's scraping browser. We collect ecommerce data from sites like Amazon then analyze that data with ChatGPT.

#javascript #datascience #chatgpt

Beyond Fireship

Рекомендации по теме

Комментарии

I like how he didn't use "cheap" during the entire video because my god the pricing is absolutely madness on the advertised product

rvft

As a freelance dev I get contacted all the time for scraping, it's definitely one of the most requested along with Wordpress (which I also dont work with)

albiceleste

Your videos are somehow exactly relevant to the code I am writing every week - interesting for sure!

Maneki-Nico

This reminds me of when I solved 100 captchas manually so that I could download some data files from a website for an ai. I got a sever message temporarily banning me from the website saying that I must be a bot. I learned my lesson and stuck to only solving 99 captchas each day from then on until I had enough data files

alexcasillas

As a web scraping tool developer, one thing to note about the chatGPT code about extracting product names etc is that it's not going to work on all cases. What I mean by that is we can see there are some random class names like '._cDEzb'. And these classes can vary from page to page. So your code for one listing page, might not work for other. The way I do this is using some advanced query selectors that don't rely on unreliable classes. Can go into more detail if required.

EliteGamerpk

To be frank out of all youtubers Fireship has most interesting and to the point videos and gives most value out of time spend. Kind of just wondering how he keeps track of all the varied topics and able to make most out of it.

yashkhd

toward the end of the video, Jeff suggests that you can grab all the links and then make requests to those links. it gave me flashbacks of another video on the main channel where a company did this and ended up with a 70k+ GCP bill after one night of web scraping, because their computing instance was forever recursing and was scalable up to 1000 instances lmao

YuriG

Thanks Jeff. I was planning on building a project that uses web scraping and this video absolutely dropped at the perfect time. Appreciate it. I love your videos and hope for more such content in the future :)

prabhavkhera

If I'm not mistaken, already returns the element handle, so you don't need to use page.$(selector) after that.
Anyway, great video, as always.
Thank you! ❤

xanderbarkhatov

Zeus Proxy's specific emphasis on session management is a key factor that resonates with my goal of executing data retrieval tasks with a focus on mimicking genuine user behaviors.

Loubensdoriscar

I love how there are legit businesses to bypass captchas and mess up with data :)

meansnada

Web scraping is still my favourite type of projects it's so fun and "meaningful" to me and with the help of AI i can see it becoming much much easier

wtfdoiputhere

An extraordinary piece of video material that has proven highly useful for our new team members. Your generosity is immensely appreciated!

Autoscraping

I remembered that web scrapping was a nightmare to deal with, specially doing this proxy rotation by ourselves. This tool is not cheap, though, so at least here in Brazil (and other emerging countries alike), companies will still be doing that like the old days. The captcha solving was actually done by real people at the time I worked in a company that mined those kind of data a few years ago, but I guess this can be automated with GPT-4 tools now

DanielLavedoniodeLima_DLL

Man, you are reading my thoughts! this video came at the right time when I wanted to scrape some websites!!!!

Rufeg

You've foiled my plan 5 years in the making. At least now I have a free 10$ credit for Brightdata to catch up. Thanks Fireship!

Jeanseb

Wow, you’re on the cutting edge of technology 🤯

shawnvirdree

Thank you for teaching me puppeteer and bright data, beats all content on internet

bpdxirw

One thing jeff is that these websites change css class names on every refresh. So it's better to write code with selectors that don't change like id or aria label.

BharadwajGiridhar

I love you man i was trying for so long and you are the only one who gave the solution thank you so much

CODE_YOUR_TYPE

Industrial-scale Web Scraping with AI & Proxy Networks

Industrial-scale Web Scraping with AI & Proxy Networks

Scrape ANY Website with AI For Free | Best AI Tools

Industrial-scale Web Scraping with AI & Proxy Networks #airevolution #aitechnology #technology

Web Scraping 101: A Million Dollar Project Idea

How To Use ChatGPT To Fully Automate Web Scraping

The easiest website SCRAPER of the year, Browse.ai is here to stomp APIFY

Scraping ALL the web data using AI!

The Biggest Mistake Beginners Make When Web Scraping

The Shocking Truth About Browse Ai: Breaking the Web Scraping Industry

Am I going to jail for web scraping?

Web Scraping For AI: New Trend Emerges #shorts #proxymarketresearch23

Web Scraping Using ChatGPT #openai #chatgpt #webscraping | extract data from website

Automate web scraping with Python and AI (LangChain tutorial)

How to scrape any website with AI 😱

Lessons learnt from Web Scraping 9 billion pages

How to Make Money using AI Web Scraper

Unleashing the Power of MrScraper AI - Next-Generation Web Scraping Tool

Is This The Best Way to Scrape at Scale?

Extracting or scraping web pages with AI

Scraping the web with the help of AI - NodeJS/Puppeteer Tutorial

How to make an AI Web Scraper with ChatGPT and Axiom.ai

Scrapy Course – Python Web Scraping for Beginners

Web Scraping with ChatGPT Code Interpreter is Mind-Blowing!

The Ultimate Scraper Tutorial | Extract Data Without Code