Python Tutorial : Exploratory analysis of KPIs

Показать описание

---

Great work on the exercises! Let us pick up where we left off.

We want to determine which conversion rate metric is the most appropriate. Note that most companies will have many KPIs each serving a different purpose, and that here we are only working through one of these cases.

To calculate our potential KPIs and measure performance across different groups we will use the `groupby()` and `agg()` pandas methods. This lesson will focus on these methods and the next lesson will more fully explore applying them in practice.

We can call the `groupby()` method on a dataFrame to specify groups to aggregate over.

Here we will use it on our combined demographics and purchase data dataset.

The primary argument is `by` to which we provide a list of dataFrame fields that we want to group on. Here, the potentially relevant fields are "country", "device", "gender", and "age". Let us group by "country" and "device".

The next relevant argument is `axis` which specifies whether we are grouping by row or column values. The default value, "0", groups by columns, which is what we will do here, and for the remainder of the course.

The other argument of interest is `as_index`. By default, this argument is “True”, which means that the grouped by fields become indices. We want to set this to "False" so that this does not happen.

This returns a dataFrameGroupedBy object.

The next step is to aggregate over these groups.

The easiest way to do this is to call an aggregation method on the dataFrameGroupedBy object. Let’s call `mean` on the `price` value of our dataFrame.

The output is the mean amount paid per subscription across all purchasing users.

In this case rather than being calculated over the entire set of data, it is calculated over each-device country combination.

Any built in function similar to mean can be called on a dataFrameGroupedBy object. However more flexible options exist through the dot agg() method

The easiest way to use this method is to pass a function like mean to it. As we can see, this has the same result as when we called mean directly.

It can be further expanded by passing in a list of functions, like mean and median, and calculating both.

The true flexibility comes from a third type of argument.

We can pass in a dictionary where the keys are column names within our dataset, like "purchase" or "age” and the values are a list of functions to be applied over those columns; still broken out by groups.

Let us find the mean, minimum, and maximum value of each of purchase and age, as an example.

Another great flexibility of the agg method is that we can also pass our own functions in to aggregate over, not only built in ones.

Here is a function that finds the truncated mean value, that is it removes the top and bottom ten-percent of values before calculating the average. We can aggregate our age over the country and device groupings with this function.

The only distinction is that when inputting this function, we do not want its name in quotations as we did for the built in functions.

It is important to cover the techniques before proceeding to applying them.

In the next video we will look at how to use these to examine KPIs across cohorts and discuss why this is valuable.
Let’s practice these tools before moving onto that!

#DataCamp #PythonTutorial #CustomerAnalyticsandABTestinginPython #CustomerAnalyticsinPython #ABTestinginPython