The HTML Element I check FIRST when Web Scraping

preview_player
Показать описание

Doing some string parsing to grab the structured data from a script tag.

If you are new, welcome! I am John, a self taught Python developer working in the web and data space. I specialize in data extraction and JSON web API's both server and client. If you like programming and web content as much as I do, you can subscribe for weekly content.

:: Links ::

:: Disclaimer ::
Some/all of the links above are affiliate links. By clicking on these links I receive a small commission should you chose to purchase any services or items.
Рекомендации по теме
Комментарии
Автор

Hello John! Regarding to this particular case from the video i think it is worth to note that if you use js environment like Puppeteer for scraping you can just omit all these transformations simply by using eval function to get valid js object and have all required data. Of course it's risky to use such method when we talk about security but I thnik when scraping store data it is an edge case.

ktbewoi
Автор

Thanks John. I just now noticed you switched to Neovim, what did you find were the best learning resources and tricks to get started?

dhillaz
Автор

Hey John, recently subscribed. W aged to ask if you have sites you recommend to learn an array of coding eg Mimo?

H.Simmerson
Автор

Hi John, i am a regular viewer of your channel and appreciate what you do for others. i am having a trouble scraping php - magento 2 based web page for product price, name etc.. I am using request_html to scrape dynamically loaded content, however item returning none. There is no json i can see in xhr/network, but json like (document) in the accessibility tab of inspect tools. Looks like data is Sec-fetched to this (document) and javascript in main html is running jquery script to get data from this (document). Any idea how to get this document data and succesfully scrape this web site? Thanks in advance.

bathuudamdin
Автор

If you can strip the comments the remainder seems to be valid YAML.

bloody_albatross
Автор

var d = [...
eval(d);
JSON.stringify(dataObject);

=D

alexanderscott
welcome to shbcf.ru