Python Scrapy Tutorial - 12 - Item containers ( Storing scraped data )

preview_player
Показать описание
In this video we are going to learning how to put that extracted data in containers called items.

Now why exactly do we need to put them in containers? Because we have already extracted the data. Can;t we just put them in some kind of database? The answer is yes. You can. But there might be a few problems when you are storing the data directly in the database when you are working on big/multiple projects.

Scrapy spiders can return the extracted data as Python dictionaries which we have already been doing right with our quotes project. But the problem with Python dictionaries is that it lacks structure. It is easy to make a typo in a field name or return inconsistent data, especially in a larger project with many spiders.

So it's always a good idea to move the scraped data to temporary location called containers and then store them inside the database. So these temporary containers are called as items.

Next video - Storing in JSON, XML and CSV

#python
Рекомендации по теме
Комментарии
Автор

DUDE! This is awesome!! I can't stop the series!
As a full stack senior dev I want to totally thank you for makings this good for all levels!

idontnowachannelname
Автор

that's an amazing course mate, thanks

kevom
Автор

One of of best best tutorial series on YouTube, thanks bro

uzzwalmiah
Автор

Man! I can't stop watching and learnin. You are a master in sharing knowledge. This is contageous! Thank you so much. I'm a DB learner and wanted to learn only to parse some stuff and found this tutorial. This is great!

catz
Автор

I want to say thanks again for making such wonderful content in the field of web scraping. I will share this playlist with data science enthusiasm as much as I can.

amiralikhatib
Автор

You are the best.
Im from Mexico.
Regards

rolandohernandez
Автор

BOSSS I love the way you explain it, even a dumb programmer can understand

usmansharifgujjar
Автор

I really like your serie.It's well explain even if for me who have allready done scraping using BS4. Really great stuff

isaacyimgaingkuissu
Автор

great content man...
Helped me alot
Thanks alot...

siddharthchaturvedi
Автор

Great tutorial my man!! Love your teaching methods.
Can you make a video on how to scrap all links and sub-links from websites recursively?

RobertTiger
Автор

Bro, I am facing a problem. the program runs perfectly and showing the output in the terminal without any error but data can't save to CSV/JSON/XML format. When I run scrapy crawl name -o file.csv -t csv, then file.csv is created in my folder but inside this file there is no data store. can you please help me ????

tusharmazhartalukdar
Автор

shit man i love you im trynna get my hands on scrapy. i feel like this is better than both selenium and bs4 combined.

hiddenarray
Автор

Getting relative import error on
from ..items import JobItems
Please help me

skhapijulhossen
Автор

@buildwithpython shouldn't the yield block be outside the for loop now since items variable contains all the elements ?

stanley_george
Автор

The prior videos did not use the crawl command to run the spider. Why is that? Is there a specific reason you used the crawl command instead of runspider?

AmitKumar-exfn
Автор

Excellent video. Very useful. Suggest: scrapy crawl quotes - s LOG_ENABLED=False

madjayhawk
Автор

Is there a way to use scrapy without command line

mmanuel
Автор

I guess the Fields are class attributes. So why are we able to get the using a dictionary notation e.g.
why items['title'] instead of items.title =
THanks and awesome series!!!

DanielWeikert
Автор

Hi, I'm getting this error when running the code:
NotImplementedError: QuoteSpider.parse callback is not defined
Any advice on how to resolve this? thanks

TarekJamil
Автор

Hi mate, thank you for the great vids!
Why is the items attributes accessed using "item['key']" instead of "item.attribute"?

danmo