filmov
tv
How to use Python to parse JSON sitemaps | Flatten nested dictionaries to get codes for WEB SCRAPING

Показать описание
If you are web scraping with Scrapy, you may want to scrape many categories, but not just scrape all links with a crawler.
If you can find a sitemap, in JSON format, you can flatten the structure, with its lists and dictionaries and then make a new list to use for your URLS or form query string parameters for your URLs to scrape.
Sound like hard work? Not really, 8 lines of code inside a function and off you go. Just print "type" regularly to check what type you are iterating through...
Timings:
0:00 Intro - About sitemaps
4:05 - Start the code
19:00 - Using slice to get 'code' and 'name'
Any questions, add a comment, I'll be pleased to reply!
Dr Pi.
#webscraping #json #sitemap
If you can find a sitemap, in JSON format, you can flatten the structure, with its lists and dictionaries and then make a new list to use for your URLS or form query string parameters for your URLs to scrape.
Sound like hard work? Not really, 8 lines of code inside a function and off you go. Just print "type" regularly to check what type you are iterating through...
Timings:
0:00 Intro - About sitemaps
4:05 - Start the code
19:00 - Using slice to get 'code' and 'name'
Any questions, add a comment, I'll be pleased to reply!
Dr Pi.
#webscraping #json #sitemap