Python Tutorial: DataFrames and their methods

preview_player
Показать описание

----

In just 3 lines of code, you're already looking at your data, represented in Python as a pandas DataFrame.

For you, this should look similar to spreadsheet data. At the top of the DataFrame, we have our column names.

Note, each column is one unique data type. The price column here is numeric. We can expect to perform mathematical operations on it later.

The color column is populated with text entries. Each row in this DataFrame is a specific observation of a fruit's name, color, and price in US dollars.

Finally on the left is the DataFrame's index. The index is a powerful component of the pandas DataFrame but beyond the scope of this course. Moving forward, I'll explicitly share when we're performing an action to avoid working with the index.

Just like we used the dot to access functions in the pandas package, like pd-dot-read-excel, we use the dot to access methods associated with DataFrames. DataFrame methods are like functions, but accessed from within our DataFrame object, with the dot. Let's take a closer look at each of these common DataFrame methods.

The dot-head method allows us to look at the first few rows of our DataFrame. It's very useful when you have hundreds or thousands of rows of data, but only want to look at the first few. By default, this method will display the first 5 rows of our DataFrame. You can see on the left, in our last line of code, we accessed this method by writing fruit-dot-head, followed by a set of parentheses. We place fruit-dot-head inside the print function so our results will display in the console.

We can pass an optional argument to the dot-head method if we wish to display an alternative number of rows. Here, we pass a 2 in order to display just the first two rows of data.

The dot-info method provides us with details on the number of entries, or rows, in our DataFrame, the total number of columns, the name of each column, and the data type of each column. Here on the right, we can see that our DataFrame has 8 rows, 3 columns, 2 columns with an object, or text data type, and one column with a float64, or numerical, data type. int64 is another common numerical data type. In short, int64 represents whole numbers, and float64 signifies numbers with decimal places.

The dot-describe method provides us with summary statistics for any numerical column in our DataFrame. Here, we see the mean, or average, the price for fruit in our data is around 2-point-28, and the max price is 5-point-27.

Finally, the sort-underscore-values method allows us to rearrange the rows in our DataFrame based on a column. Here, we've used sort-underscore-values to alphabetize our DataFrame according to the name column. In the code, you will also notice we've used the reset-underscore-index method. This is done so that our index remains ordered. In the exercises, this will be done for you.

We can also sort values in descending order by passing ascending equals False to the sort-underscore-values method. This code chunk outputs a DataFrame of the most expensive fruits in our data set. Also, note how we keep redefining fruit. First, fruit equals the data we load in from our file, then, fruit equals the data sorted by price, and so on.

Now it's your turn to put some methods to work.
Рекомендации по теме
join shbcf.ru