filmov
tv
Reading data from Excel
Показать описание
VIDEO 3
Welcome to this video.
So far, we have seen the process of how to do data analysis and we have seen how to work with Jupiter notebook. We have looked at small examples of some calculations of 'x' and printing values. We also saw how to upload files to Jupiter.
In this section, we will go through how you can read data from external files.
First, we will deal with excel files. If you able to read excel files in Pandas, you will know how to read csv and html. The file that we will work with in this section contains grade of students. It includes student ID and scores for 4 questions in the exam.
How can we read such file in Python using Pandas? Here is how we do it.
First, we need to connect to pandas to our program. If you still remember how we used to import turtle, then you will find it similar. We used to connect to turtle using the command import.
Here we import pandas as pd. Instead of pd you can use anything, but using pd is the common practice. Most of the code you will find on the web uses pd. It becomes easier for you to refer to it.
We need to store the data in a variable that we will refer to when we need to do any pandas process. We will call the variable data. But, you can give it any name you like. Again, you need to remember the naming convention and rules for naming variables. You may need to go back to the section that discuss the rules of naming variables.
Now, we need to use pd to execute the function that reads excel files. The object pd represents pandas for us like t in the past was representing turtle. We used to tell it go forward, backward, right, left, etc. It was very nice and listening to us very well.
The pandas function that reads Excel files is read_excel(). You need to write it exactly the same. You have been working with python for some time. You will know the case of upper and lower is very important.
Also, please do not forget the underscore, the brackets and then the file name that you are trying to access followed by a comma and then you select which sheet you are planning to read. The excel file may contains multiple. It may include a sheet the list of produce, one for the list of customers, etc.
For example, if you want to read only Sheet 1, then you need a comma follows by the keyword sheet name, equal sign and single or double quote and whatever is the name of the sheet. This line will read for you the data and store it in data files from the excel sheet and will be stored in the variable data. If you do not specify the name of the sheet it will always read the first sheet whatever is the name of the first sheet.
If you want to a specific sheet, then you need to specify it correctly. This simple line will read the sheet, store it in the variable data and keep it ready for you to call at anytime. In other programming languages, such task is more complicated. Now we have the data ready for reviewing, cleaning, analysis, etc.
Let us assume we want to read a comma separated file csv. The line is very similar. The function name is read_csv(filename). If you want to read a html file, the function is read_html(url). The url should be the secure https otherwise it won't let you read the page.
Here are the two lines that we saw before. The first line import pandas and the second line reads the excel file and store it in the variable data. After reading data, the common practice is to display the first five rows using the function head(). It will allow us to look at the head of the data, which is the first five lines. Another practice is to display the last 5 rows using the function tail(). This process will give an idea of the structure of your data.
Is it structured properly? or there are some extra information on the top of the data that you do not need and what are the main columns for your data.
We will see later how we can change the index from the default to another column. For example, you want to use student ID as the index. You now have the first five rows of your data. This gave an idea of the structure of our data and the names of the columns.
In summary, a Pandas DataFrame is two-dimensional table that consists of columns and rows. That is basically how to read an excel file in Python using Pandas.
Welcome to this video.
So far, we have seen the process of how to do data analysis and we have seen how to work with Jupiter notebook. We have looked at small examples of some calculations of 'x' and printing values. We also saw how to upload files to Jupiter.
In this section, we will go through how you can read data from external files.
First, we will deal with excel files. If you able to read excel files in Pandas, you will know how to read csv and html. The file that we will work with in this section contains grade of students. It includes student ID and scores for 4 questions in the exam.
How can we read such file in Python using Pandas? Here is how we do it.
First, we need to connect to pandas to our program. If you still remember how we used to import turtle, then you will find it similar. We used to connect to turtle using the command import.
Here we import pandas as pd. Instead of pd you can use anything, but using pd is the common practice. Most of the code you will find on the web uses pd. It becomes easier for you to refer to it.
We need to store the data in a variable that we will refer to when we need to do any pandas process. We will call the variable data. But, you can give it any name you like. Again, you need to remember the naming convention and rules for naming variables. You may need to go back to the section that discuss the rules of naming variables.
Now, we need to use pd to execute the function that reads excel files. The object pd represents pandas for us like t in the past was representing turtle. We used to tell it go forward, backward, right, left, etc. It was very nice and listening to us very well.
The pandas function that reads Excel files is read_excel(). You need to write it exactly the same. You have been working with python for some time. You will know the case of upper and lower is very important.
Also, please do not forget the underscore, the brackets and then the file name that you are trying to access followed by a comma and then you select which sheet you are planning to read. The excel file may contains multiple. It may include a sheet the list of produce, one for the list of customers, etc.
For example, if you want to read only Sheet 1, then you need a comma follows by the keyword sheet name, equal sign and single or double quote and whatever is the name of the sheet. This line will read for you the data and store it in data files from the excel sheet and will be stored in the variable data. If you do not specify the name of the sheet it will always read the first sheet whatever is the name of the first sheet.
If you want to a specific sheet, then you need to specify it correctly. This simple line will read the sheet, store it in the variable data and keep it ready for you to call at anytime. In other programming languages, such task is more complicated. Now we have the data ready for reviewing, cleaning, analysis, etc.
Let us assume we want to read a comma separated file csv. The line is very similar. The function name is read_csv(filename). If you want to read a html file, the function is read_html(url). The url should be the secure https otherwise it won't let you read the page.
Here are the two lines that we saw before. The first line import pandas and the second line reads the excel file and store it in the variable data. After reading data, the common practice is to display the first five rows using the function head(). It will allow us to look at the head of the data, which is the first five lines. Another practice is to display the last 5 rows using the function tail(). This process will give an idea of the structure of your data.
Is it structured properly? or there are some extra information on the top of the data that you do not need and what are the main columns for your data.
We will see later how we can change the index from the default to another column. For example, you want to use student ID as the index. You now have the first five rows of your data. This gave an idea of the structure of our data and the names of the columns.
In summary, a Pandas DataFrame is two-dimensional table that consists of columns and rows. That is basically how to read an excel file in Python using Pandas.