filmov
tv
How to import a 7M+ rows data set into PostgreSQL? (SQL for beginners)
![preview_player](https://i.ytimg.com/vi/DGtJAJUIxXw/maxresdefault.jpg)
Показать описание
This is already unzipped, so you won't need the dtrx step, either! (Check the steps below OR the article for the steps.)
-----------------------------------------------
Or start with SQL for Data Analysis - episode 1:
How to install SQL Workbench for postgreSQL:
-----------------------------------------------
STEPS -- UPDATED in 2022 (not the same as in the video):
1. Open Terminal and login (ssh) to your dataserver!
2. Download the “flight delays” data!
3. Format your data!
4. Now we have to give permission to our postgreSQL user to create tables and load data into them. This will need multiple steps. Here’s a gif first (note: my username is “dataguy” - your’s might be something
4.1. First sudo to the user called “postgres”:
sudo -u postgres -i
4.2. Then start postgreSQL:
psql
4.3. The prompt will change to this one postgres=#! Type:
alter user [your_user_name] superuser;
4.4. This turns your original user into a super user! Go back to your user!
Exit from postgreSQL:
\q
Then exit from the user called “postgres”:
exit
4.5. Finally access your original user’s postgreSQL database from the command line:
psql -d postgres
5. Now all you need to do is create the table by simply copy-pasting this lines into your terminal:
CREATE TABLE flight_delays (
year INTEGER,
month INTEGER,
dayofmonth INTEGER,
dayofweek INTEGER,
deptime INTEGER,
arrtime INTEGER,
flightnum INTEGER,
tailnum VARCHAR,
airtime INTEGER,
arrdelay INTEGER,
depdelay INTEGER,
origin VARCHAR,
dest VARCHAR,
distance INTEGER);
6. And finally copy the data from the csv file you have just downloaded!
7. Go back to SQL Workbench and make a simple SELECT statement… but make sure, that you use the LIMIT clause too. Now you have over 7.000.000 rows of data - even though postgreSQL can handle it easily, your computer might be frozen, if you try print all that data on your screen. So try something like this first:
SELECT * FROM test_test LIMIT 10;
-----------------------------------------------
-----------------------------------------------
Or start with SQL for Data Analysis - episode 1:
How to install SQL Workbench for postgreSQL:
-----------------------------------------------
STEPS -- UPDATED in 2022 (not the same as in the video):
1. Open Terminal and login (ssh) to your dataserver!
2. Download the “flight delays” data!
3. Format your data!
4. Now we have to give permission to our postgreSQL user to create tables and load data into them. This will need multiple steps. Here’s a gif first (note: my username is “dataguy” - your’s might be something
4.1. First sudo to the user called “postgres”:
sudo -u postgres -i
4.2. Then start postgreSQL:
psql
4.3. The prompt will change to this one postgres=#! Type:
alter user [your_user_name] superuser;
4.4. This turns your original user into a super user! Go back to your user!
Exit from postgreSQL:
\q
Then exit from the user called “postgres”:
exit
4.5. Finally access your original user’s postgreSQL database from the command line:
psql -d postgres
5. Now all you need to do is create the table by simply copy-pasting this lines into your terminal:
CREATE TABLE flight_delays (
year INTEGER,
month INTEGER,
dayofmonth INTEGER,
dayofweek INTEGER,
deptime INTEGER,
arrtime INTEGER,
flightnum INTEGER,
tailnum VARCHAR,
airtime INTEGER,
arrdelay INTEGER,
depdelay INTEGER,
origin VARCHAR,
dest VARCHAR,
distance INTEGER);
6. And finally copy the data from the csv file you have just downloaded!
7. Go back to SQL Workbench and make a simple SELECT statement… but make sure, that you use the LIMIT clause too. Now you have over 7.000.000 rows of data - even though postgreSQL can handle it easily, your computer might be frozen, if you try print all that data on your screen. So try something like this first:
SELECT * FROM test_test LIMIT 10;
-----------------------------------------------
Комментарии