Building Database - Creating a Chatbot with Deep Learning, Python, and TensorFlow p.5

preview_player
Показать описание
Welcome to part 5 of the chatbot with Python and TensorFlow tutorial series. Leading up to this tutorial, we've been working with our data and preparing the logic for how we want to insert it, now we're ready to start inserting.

Рекомендации по теме
Комментарии
Автор

For anyone facing problem with comment_id = row['name'], just checkout ur json file and look for value similar to parent_id, the key of that value will be the variable u will use for comment_id.
Also replace '?' with '{}' in the update query and make sure your table name is consistent throughout the code.
@sentdex Hope u pin this

murtazahaji
Автор

I'll definitely credit your videos for my final year project. Your python tutorials have been such a blessing. Thank you very much. Your videos are best !

elephant
Автор

It was awesome watching debugging in a code walkthrough video. It makes you seem so much more human

scwestby
Автор

for everyone trying to parse data from before 2007-09, replace comment_id = row['name'] with:
try:
comment_id = row['name']
except KeyError:
comment_id = 't1_' + row['id']

markstraatman
Автор

Stop making these videos so short dammit xD every time I watch, time flies like crazy.... shiieet...

HighInquisitorBonobotheGreat
Автор

Fixes for RC_2018-06 prooved
1. problem fix for (paired rows == 0) comment_id and parent_id and the NULL or null in the database of parent everytime:
parent_id =
comment_id = row['id']

I hope I could help you (then please like, so that more can see this)
Explanation:
if comment_id = row['name'] or comment_id = row['link_id'] it would not fit
the key 'name' is not defined in the RC_2018-06 file and the 'link_id' not fits with the parent_id to find the right question and answer
split is a function used on strings to split the string in to an array with the divide of the char '_' in this case it grabs the array by the index 1 (so after the 'tcp_')
Now parent_id is a valid string of for example 'ilawjhr23' and not 'tcp_ilawjhr23'
Thanks for reading the description ;-)!

chralt
Автор

I am really liking this so far, this is a lot more in depth than the chatbots I have made in the past.

jakerember
Автор

Hi Sentdex, am new with chatbots, trying to make one with your help, however upto this, when I am running the code, a db is created in the folder (2018-10) as this is the latest dataset, but db is not growing, it's stucked at 16KB still after 15 mins of last run.
Am I doing something wrong or I should wait, Please Help !

those.who.love.flying
Автор

So I just finished processing the comments for 2017-11, and came to the following conclusions:

1. Sqlite is convenient, but slow as frozen excrement on a spinning disk drive. To get around this, I setup a ramdisk (I'm cheap and didn't opt for an SSD, but I do have some RAM), and gave it ~8GB RAM, which greatly increased loading/row-pairing speed (about 100-150x faster than spinning disk).

2. The slowest part of this script is the deleting null parents and especially vacuuming. If you have the space for it (and an SSD), set your cleanup value to something like 40, 000, 000 rows (I was limited by the size of my ramdisk, but other might not have this limitation). I set my "cleanup" value to 10, 000, 000.

3. By default, sqlite has journaling enabled, which is extremely useful when inserting data, in case something goes wrong. However, when it comes to the delete and vacuum stages, having journaling turned on is a double whammy--- It creates a copy of the ".db" as a journal, which exploded my ramdisk that caused the script to fail.

To get around this, I modified the cleanup section of code to the below:


if row_counter > start_row:
if row_counter % cleanup == 0:
print("Cleaning up!")
print("Turning journal off temporarily...")
c.execute("PRAGMA journal_mode = OFF")
print("Deleting null parent rows...")
sql = "DELETE FROM parent_reply WHERE parent IS NULL"
c.execute(sql)
print("committing changes")
connection.commit()
print("vacuuming...")
c.execute("VACUUM")
print("committing...")
connection.commit()
print("Turning journal back on (\"MEMORY\" mode)...")
c.execute("PRAGMA journal_mode = MEMORY")
connection.commit()
print("Back to loading and processing new rows!\n")



My end result was a DB with a filesize of 4.09GB (from the original source filesize of 54.5GB). I went through multiple iterations to find the approach that worked for me, and hope this might also help point some others in the right direction. Without my modifications and given my drive limitations, processing this data would have otherwise taken me over a week (or longer).

MaxTechEngineering
Автор

Hello sentdex, would it be possible to do a tutorial series on Reinforcement Learning? I would really be interested in how to use reinforcement learning on my own python games for example how to create a pong ai. (Without the help of OpenAI) Especially how to program experience replay for an AI.

danielhohandu
Автор

I didn't understand anything in this video but I am very intrigued!

greenteasunferncello
Автор

Harrison! Too many nested ifs! There's an operator called and in python. Use it my son!

EranM
Автор

Great video sentdex, thanks! I'm still wondering why you throw the data into a database? Couldn't we just throw it into a dataframe using pandas?

baptisteArno
Автор

Are you only training on one pair of comment and responses (with the highest score)? Would having multiple responses to a single comment not help the model be more expressive in some way, seems like limiting to the top comment pair might be tieing its hands a little. Of course might not matter, not really sure.

andrewm
Автор

I'm adding this comment here, since this is the video where you finally add comment_id = row['name'], though it's arguably more appropriate for an earlier video.

I used BigQuery, through which I exported the collections I wanted to my Google Cloud bucket, and then saved them locally; however, I could have queried SELECT * FROM to yield similar results. Instead of the wildcard, the desired columns can be listed, separated by commas. Query results, as opposed to entire collection tables, may be downloaded as JSON instead of exported.

What I noticed is that "name" is null in later collections, which causes a problem when attempting to assign the "name" element to a variable. Since "name" is a composite--or perhaps a portmanteau--of parent_id and id, I used this flexible workaround, as I observed the "name" is the first 3 characters of the parent_id combined with the id:

if row != 'name':
comment_id = parent_id[:3] + row['id']
else:
comment_id = row['name']

I'm a Python beginner, but this seems to work. Because the collections I'm using are primarily the later ones, I assume the null value for "name" first; however, if you're using primarily the earlier collections that have non-null names, then you should probably flip the if-else.

josephryle
Автор

I cannot get the database to run. When I hit f5 I get this error Unterminated string starting at: line 1 column 374 (char 373)

I have even tried copying and pasting your data with my correct file path for the DB and it still gives me that error. not sure what I am doing incorrectly.

thefantasynuttwork
Автор

Hello, great video but my question is about sql_transaction, is the function find_parent and find_exisiting_score and so one still work even the data is in the array sql_transaction?

nantals
Автор

Really excited for this new series! One question I have though is why you seem to do a lot of cleaning up of the data in the code? I feel like a DBMS would be better suited for doing big batch jobs of deleting all the comments without children / low score and so on and a batch write into the database would not take that long, even if you write a lot of stuff you'd later delete. Your way you do a query every time you add a row to your transaction builder (for the parent id) which will probably slow down your program by a big factor (especially since you're using SQLite and it needs to access the filesystem)

MrGurkentomate
Автор

Hello sentdex, i am making my graduation project about code generation using NLP by input normal language and output certain code or psedo-code, do you know where i can find a dataset cause i have searching for a while ?

ahmedbahaaeldin
Автор

Hey guys, I am learning neural networks and python. In this video at 8:52 he is copying the code from elsewhere. Can you please let me know where he is taking the code from? I tried in pythonprogramming.net but failed. It will be of great help. Thanks.

prasankrishnan
visit shbcf.ru