Scraping Dynamic JavaScript Websites - Beautiful Soup Python

preview_player
Показать описание

Gathering data from static websites is usually simple, but scraping data from dynamic websites can be challenging. Python is a popular choice for this task due to its many helpful libraries and extensive documentation.

This tutorial will guide you through the process of scraping dynamic websites. You'll learn how to use a browser to check if a website is dynamically rendered with JavaScript and locate AJAX calls that load additional data.

We recommend using a Chromium-based browser to spot dynamically rendered content. To scrape the data, combine Selenium (or Python’s Requests library) to make HTTP requests, and BeautifulSoup to parse the raw HTML. Once your web scraping script is ready, use a headless browser to speed up the process.

📚 *OTHER RESOURCES*
Best Python Libraries for Web Scraping:

🔧 *OUR SCRAPING SOLUTIONS*
Residential Proxies:
Shared Datacenter Proxies:
Dedicated Datacenter Proxies
SOCKS5 Proxies:

🤝 *LET'S CONNECT*

⏳ *TIMESTAMPS*
0:00 Introduction
0:45 How to See if the Website is Dynamic
1:35 Can BeautifulSoup Render Javascript?
2:16 How to Scrape Data From a Dynamic Website
3:35 Finding Elements by Using Selenium
5:16 Finding Elements by Using BeautifulSoup
6:33 Python Scraping With a Headless Browser
7:05 Locating AJAX Calls
9:40 Data Embedding in Other Pages
11:11 Conclusion

🎥 *RELATED VIDEOS*
Learn how to extract data to Excel:
Find out how to scrape multiple URLs:
For more topics on all things web scraping:

© 2022 Oxylabs. All rights reserved.

#Oxylabs #WebScraping #BeautifulSoup
Рекомендации по теме
Комментарии
Автор

This is gold guideline. Literally covered most of the cases

abhijitboda
Автор

This is right on spot, most other videos are not even close to mention all

DigitalAlligator
Автор

short detailed very informative, that's how a good tutorial is made
Thanks

mehdiahmed
Автор

Wow. Great Video! I was looking for a video that highlights realistic and efficient web scraping and this is it. Thanks.

roystonfurtado
Автор

😭😭😭 Idk how to say thank you.. I've been searching for a help for this ajax stuff. this is the one I can say made my day

angellaz
Автор

Thanks for this video. Never thought about to use F12 and Network-Tab to find the source of websites data. greetings

masterbe
Автор

Thank you for the informative tutorial! I will probably try web scrapping over the next month, so I'll comment here again if I have any problem!

toshirv
Автор

Hello!
And why do all parsers analyze the same site? Interesting different approaches...
Thanks for the interesting example!

vladimirantonov
Автор

is this possible with website that requires a user input from the user for example adding a quantity or selecting a shipping service ?

dakooki
Автор

thanks a lot for all these content you constantly share. I would like to ask you something: this tutorial example works if I want to deploy it on the web as an api for consuming it after? thank you so much

ismaelperezmesa
Автор

hi! I was not able to install the chrome driver, do you have any suggestion?

gianni_ari
Автор

Hi, Thank's for this video! For me, in a dynamic sites, using selenium for a get page source, don't work ! Still responding in javascript tag's. The path of the server request and response is: browser request -> server response -> javascript response -> api response -> browser ? Thank's

carlostoledoFLA
Автор

How to get the data when the <script> tag source is not None instead there is a file mentioned

shreyasoni
Автор

Love the explanation, but also loved the music. Can you share the track id? <3

capunzel
Автор

Hello, help me please, how to get the text out "Wilson Tour Premier All Court 4B"
soup = BeautifulSoup(html, 'lxml')
title = soup.find('h1', class_='product--title')

<h1 class="product--title" itemprop="name">
<span balls</span> Wilson Tour Premier All Court 4B
</h1>

python
Автор

Thanks for the video. Could you please explain where you took value 'h3 >a' for select at the end of the video?

xybythh
Автор

why are you not using requests-html library? Seems to achieve the same in a simpler way

mantasda
Автор

Can you sent the new version selenium 4 video for dynamic web scraping

ureshkayastha
Автор

Excellent video. Quick question - when I click Ctrl+U on the website, my source page looks different. I don't have <script> anywhere and I have </div> separating each section. Does this matter or was "script" just used to locate the data needed?

drewgatch
Автор

This method is impossible if the script have src especially reCaptcha right?

legit_nyel
join shbcf.ru