Python Scrapy Tutorial - 16 - Storing data in Sqlite3 Database

preview_player
Показать описание
In this video we will be integrating scrapy with sqlite3 and finally storing the data inside a database using pipelines

Next video - Storing data in a MySQL database

#python
Рекомендации по теме
Комментарии
Автор

One of the best scrapy explanation. I'm waiting for the next video. :)

yudiaditya
Автор

I had never used SQL lite before. Your tuts has made me like it.

mutalemulenga
Автор

This is one of the best, if not the best, video tutorial about Scrapy. This is the 26 video and I'm going to do all of them.

Just a few tips:


- If instead of using extract() we use extract_first(), instead of getting a list of one item we get a string. So later, at the pipeline we don't need to use "item['title'][0]" but "item['title']"
- Using SQLite we don't need to drop the table and create it again (unless we want to). We can use the SQL command """"CREATE TABLE IF NOT EXISTS quotes_db..."


Still, thanks for the tutorials. You are helping us a lot.

Skaxarrat
Автор

We Can use
> self.curr.execute("""
insert into quotes_tb values(?, ?, ?) """, (item['title'][0], item['author'][0], ', '.join(item['tags'])))
instead of
> self.curr.execute("""
insert into quotes_tb values(?, ?, ?) """, (item['title'][0], item['author'][0], item['tags'][0]))
to store all the tags in a single column.

utkarshsinha
Автор

@buildwithpython, one of the best tutorials for beginners in web scraping. KUDOS to you BUDDY :)
just wanted to correct 1 thing in here is that, its not the functionality of SQLLITE that it just store 1 value of TAG. in your code you have explicitly specified it to store the 1st value when you used item[tag][0]... Instead replace it with ", ".join(item['tags']) it will join all the tags with comma and create a single string to store in DB.
Let me know if i am missing out. i tried and it has worked for me.

sajidpathan
Автор

Thank you so much. This video is really easy to understand. And for people who have no knowledge about SQL but want to learn scrapy, this is very useful.

jiayinzhu
Автор

One of the best Scrapy Tutorials on YouTube. Mate you saved my a lot of time, i've integrated scraper with django thanx to you Tutorials, thanx.
Keep the good work.

EpoksiShop
Автор

"AttributeError: 'QuotetutorialPipeline' object has no attribute 'create_table'"

Does anyone know?

rafaelagustinhofmann
Автор

Your Videos have helped a lot of people Brother. Best wishes :D

vrbabu
Автор

definitely we can use foreign key to store multiple tags in single row. But we can also use for loop to just store all the tags in single go. Here's how can we do this -- >>
def store_data(self, item):
titles = item['titles']
authors = item['authors']
tags = item['tags']

for i in range(len(titles)):
self.cursor.execute("""INSERT INTO quote_tb values(?, ?, ?)""", (
titles[i],
authors[i],
tags[i]
))
self.conn.commit()

Just make sure, you are scraping all the tags in the quotes_spider.py file correctly, here's how we can scrape all the tags -->
tags = all_quotes.css('div.tags

surajpaikekar
Автор

instead of using - item["tags"][0]

use - ", ".join(item["tags"])
it joins all items from the list as a single string and inserts it into the db

I'm also going to go revise the foreign key concept

obeliskphaeton
Автор

could you enable the subtitle for this video please?
and where do you recommend to close the connexion?
many thanks in advance for all your teachings

whayAl
Автор

first thanks for the video. As you said the proper way would be to store tags in a different table with a foreign key, but for the sake of simplicity maybe it would be better to convert the list item['tag'] to string and store the string.

georgesmith
Автор

Hey Man, your content is awesome. I built my first website with the help of your Django tutorials. Now I'm working on building an OCR API using OpenCV and Tessaract. Can you make a tutorial for these two libraries? It would be a lot of help to a lot of people.

paritosh
Автор

Hello, may I ask why is there (object) next to class QuotetutorialPipeline? I was able to produce results without it so far from video 1 to 15, but this is the first time I see it.

sope
Автор

If you want to store multiple tags, use ', '.join(item['tag']) .

prawnydagrate
Автор

I'm bit confused about process_item function, how it will run although it was not added to __init__??

Buhassan
Автор

how do I assign an id, because the database is created but it doesn't have an id in the table

agnaldogabriel
Автор

The overall content is great, much appreciated

just my little input, concat the list items into a string and close the db connection in destructor

#- ''.join(mylist) - concat a list into a single string, use ', ' as separator
def store_db(self, item):
self.curr.execute(""" insert into quotes_table values (?, ?, ?)""", (
', '.join(item['title']),
', '.join(item['author']),
', '.join(item['tags'])
)
)
self.conn.commit()

#- destructor
def __del__(self):
self.conn.close()

Moon-Birdy
Автор

Where we would we close the connection to the database?

martinmags