Building Database - Creating a Chatbot with Deep Learning, Python, and TensorFlow p.5

Показать описание

Welcome to part 5 of the chatbot with Python and TensorFlow tutorial series. Leading up to this tutorial, we've been working with our data and preparing the logic for how we want to insert it, now we're ready to start inserting.

Рекомендации по теме

Комментарии

For anyone facing problem with comment_id = row['name'], just checkout ur json file and look for value similar to parent_id, the key of that value will be the variable u will use for comment_id.
Also replace '?' with '{}' in the update query and make sure your table name is consistent throughout the code.
@sentdex Hope u pin this

murtazahaji

I'll definitely credit your videos for my final year project. Your python tutorials have been such a blessing. Thank you very much. Your videos are best !

elephant

It was awesome watching debugging in a code walkthrough video. It makes you seem so much more human

scwestby

for everyone trying to parse data from before 2007-09, replace comment_id = row['name'] with:
try:
comment_id = row['name']
except KeyError:
comment_id = 't1_' + row['id']

markstraatman

Stop making these videos so short dammit xD every time I watch, time flies like crazy.... shiieet...

HighInquisitorBonobotheGreat

Fixes for RC_2018-06 prooved
1. problem fix for (paired rows == 0) comment_id and parent_id and the NULL or null in the database of parent everytime:
parent_id =
comment_id = row['id']

I hope I could help you (then please like, so that more can see this)
Explanation:
if comment_id = row['name'] or comment_id = row['link_id'] it would not fit
the key 'name' is not defined in the RC_2018-06 file and the 'link_id' not fits with the parent_id to find the right question and answer
split is a function used on strings to split the string in to an array with the divide of the char '_' in this case it grabs the array by the index 1 (so after the 'tcp_')
Now parent_id is a valid string of for example 'ilawjhr23' and not 'tcp_ilawjhr23'
Thanks for reading the description ;-)!

chralt

I am really liking this so far, this is a lot more in depth than the chatbots I have made in the past.

jakerember

Hi Sentdex, am new with chatbots, trying to make one with your help, however upto this, when I am running the code, a db is created in the folder (2018-10) as this is the latest dataset, but db is not growing, it's stucked at 16KB still after 15 mins of last run.
Am I doing something wrong or I should wait, Please Help !

those.who.love.flying

So I just finished processing the comments for 2017-11, and came to the following conclusions:

1. Sqlite is convenient, but slow as frozen excrement on a spinning disk drive. To get around this, I setup a ramdisk (I'm cheap and didn't opt for an SSD, but I do have some RAM), and gave it ~8GB RAM, which greatly increased loading/row-pairing speed (about 100-150x faster than spinning disk).

2. The slowest part of this script is the deleting null parents and especially vacuuming. If you have the space for it (and an SSD), set your cleanup value to something like 40, 000, 000 rows (I was limited by the size of my ramdisk, but other might not have this limitation). I set my "cleanup" value to 10, 000, 000.

3. By default, sqlite has journaling enabled, which is extremely useful when inserting data, in case something goes wrong. However, when it comes to the delete and vacuum stages, having journaling turned on is a double whammy--- It creates a copy of the ".db" as a journal, which exploded my ramdisk that caused the script to fail.

To get around this, I modified the cleanup section of code to the below:

if row_counter > start_row:
if row_counter % cleanup == 0:
print("Cleaning up!")
print("Turning journal off temporarily...")
c.execute("PRAGMA journal_mode = OFF")
print("Deleting null parent rows...")
sql = "DELETE FROM parent_reply WHERE parent IS NULL"
c.execute(sql)
print("committing changes")
connection.commit()
print("vacuuming...")
c.execute("VACUUM")
print("committing...")
connection.commit()
print("Turning journal back on (\"MEMORY\" mode)...")
c.execute("PRAGMA journal_mode = MEMORY")
connection.commit()
print("Back to loading and processing new rows!\n")

My end result was a DB with a filesize of 4.09GB (from the original source filesize of 54.5GB). I went through multiple iterations to find the approach that worked for me, and hope this might also help point some others in the right direction. Without my modifications and given my drive limitations, processing this data would have otherwise taken me over a week (or longer).

MaxTechEngineering

Hello sentdex, would it be possible to do a tutorial series on Reinforcement Learning? I would really be interested in how to use reinforcement learning on my own python games for example how to create a pong ai. (Without the help of OpenAI) Especially how to program experience replay for an AI.

danielhohandu

I didn't understand anything in this video but I am very intrigued!

greenteasunferncello

Harrison! Too many nested ifs! There's an operator called and in python. Use it my son!

EranM

Great video sentdex, thanks! I'm still wondering why you throw the data into a database? Couldn't we just throw it into a dataframe using pandas?

baptisteArno

Are you only training on one pair of comment and responses (with the highest score)? Would having multiple responses to a single comment not help the model be more expressive in some way, seems like limiting to the top comment pair might be tieing its hands a little. Of course might not matter, not really sure.

andrewm

I'm adding this comment here, since this is the video where you finally add comment_id = row['name'], though it's arguably more appropriate for an earlier video.

I used BigQuery, through which I exported the collections I wanted to my Google Cloud bucket, and then saved them locally; however, I could have queried SELECT * FROM to yield similar results. Instead of the wildcard, the desired columns can be listed, separated by commas. Query results, as opposed to entire collection tables, may be downloaded as JSON instead of exported.

What I noticed is that "name" is null in later collections, which causes a problem when attempting to assign the "name" element to a variable. Since "name" is a composite--or perhaps a portmanteau--of parent_id and id, I used this flexible workaround, as I observed the "name" is the first 3 characters of the parent_id combined with the id:

if row != 'name':
comment_id = parent_id[:3] + row['id']
else:
comment_id = row['name']

I'm a Python beginner, but this seems to work. Because the collections I'm using are primarily the later ones, I assume the null value for "name" first; however, if you're using primarily the earlier collections that have non-null names, then you should probably flip the if-else.

josephryle

I cannot get the database to run. When I hit f5 I get this error Unterminated string starting at: line 1 column 374 (char 373)

I have even tried copying and pasting your data with my correct file path for the DB and it still gives me that error. not sure what I am doing incorrectly.

thefantasynuttwork

Hello, great video but my question is about sql_transaction, is the function find_parent and find_exisiting_score and so one still work even the data is in the array sql_transaction?

nantals

Really excited for this new series! One question I have though is why you seem to do a lot of cleaning up of the data in the code? I feel like a DBMS would be better suited for doing big batch jobs of deleting all the comments without children / low score and so on and a batch write into the database would not take that long, even if you write a lot of stuff you'd later delete. Your way you do a query every time you add a row to your transaction builder (for the parent id) which will probably slow down your program by a big factor (especially since you're using SQLite and it needs to access the filesystem)

MrGurkentomate

Hello sentdex, i am making my graduation project about code generation using NLP by input normal language and output certain code or psedo-code, do you know where i can find a dataset cause i have searching for a while ?

ahmedbahaaeldin

Hey guys, I am learning neural networks and python. In this video at 8:52 he is copying the code from elsewhere. Can you please let me know where he is taking the code from? I tried in pythonprogramming.net but failed. It will be of great help. Thanks.

prasankrishnan

Building Database - Creating a Chatbot with Deep Learning, Python, and TensorFlow p.5

How to Create Your First Database

How to Design a Database

Learn How to Create a Database | First Steps in SQL Tutorial

Creating a database

MySQL: How to create a DATABASE

Creating Company Database | SQL | Tutorial 12

The Power of Firebase for Your Web Development! 🚀 #firebase #shorts #firebasedatabase

How To Create a Database in Microsoft Access

How to create customer data in POS #retail #business #billings #excel #stockmarket #exceltips

How to create a database website with PHP and mySQL 01 - Intro

How to Create a Database | SQL Tutorial for Beginners | 2021

create table as select in MySQL database #shorts #mysql #database

Ms Access Database Development Process Tutorial PT 21 -- Tables, queries, reports and Relationships

SQL Database Design Tutorial for Beginners | Data Analyst Portfolio Project (1/3)

How to create a database front-end in 5 minutes

How to Create a Database Project in Visual Studio

Build a Database Programming Interface!

How to create a new database in XAMPP MySQL | 2021 Complete Guide

3 Simple Ways to Build a Database with Airtable [2024]

How to create a DATA PORTFOLIO that stands out #dataanalyst #portfolio #projects

Create Database and table in Microsoft SQL Server Management Studio #sql #sqlserver #sqlqueries

How to create Database in SQL #sql #code #coding #how_to #tutorial #database

Create a full Log-in page with just ChatGPT

SQL indexing best practices | How to make your database FASTER!