Python Tutorial: Filtering rows and creating columns

preview_player
Показать описание

----

Often times, you may want to filter your data for a specific observation or set of observations.

In spreadsheets, you are most likely familiar with the filtering functionality, where a drop-down menu allows you to reduce your data by ticking a box. How do we recreate this functionality in Python?

Recall our fruit DataFrame, with name, color, and price columns.

If we wanted to access just the name column, we would put brackets next to fruit, and place 'name' in quotations within those brackets. The result is a pandas Series object, which you can think of as just the contents of your column.

We can then use comparison operators, like "equal to" or "not equal to", to get logical True/False values for each entry in that column.

For example, on the left is our name column, and on the right are logical, or Boolean, True/False values that correspond to where the name column is equal to "Apple". Here, only the first entry is True, since the name is equal to Apple only in the first row.

Always remember that Python is case-sensitive so that capital A in Apple is very important.

To filter, we first reference our DataFrame, fruit, then, inside a set of brackets, we place our comparison. The result is a DataFrame that only contains rows where the comparison is True, in this case, where name is equal to Apple.

Or here, where we change our comparison to be where the price column is greater than one dollar. The result is a DataFrame where all entries have a price greater than one dollar.
Notice how when we filter, the index does not remain sequential or starts at 0.

n the exercises, you might see code like this, where the reset-underscore-index method is tacked on to the end of your comparison. Note how the index is now sequential and starting at 0.

Think of this basic pattern as "show me my DataFrame where this column is equal to that value".

Here is a look at what filtering in Python looks like, and what its equivalent is in a spreadsheet. Both achieve the same result, the Apple row of our fruit data. Note it is possible to filter on more than one condition at once, but that is beyond the scope of this course.

Shifting gears, what about if we wanted to create a new column? What if we bought two of each fruit? How could we make a cost column?

In a spreadsheet, the process would look something like this. Take each price cell and multiply by 2. Then drag the formula all the way down to the bottom of the data.

Fortunately, we have the same mathematical operators at our disposal in Python. So if we buy 2 of each fruit, we still multiply by 2, using the asterisk.

To add the cost column to our DataFrame, we simply define a cost column in fruit and designate its value as 2 times the price column.

What if our DataFrame had a quantity column that contained the quantity of each fruit purchased?

In a spreadsheet, we would multiply our price column by the quantity column.

With our DataFrame, it's actually not too different. Here is the result. In our code, on the left of the equals sign, we've defined our new column, cost, and on the right of the equals sign, we've multiplied the price column by the quantity column.

Now it's your turn to manipulate the movie theater sales data. Good luck!
Рекомендации по теме