Python Tutorial: Time cohorts

preview_player
Показать описание

---

Now we will learn about the most popular cohort analysis type - time cohorts. We will segment customers into acquisition cohorts based on the month they made their first purchase. We will then assign the cohort index to each purchase of the customer. It will represent the number of months since the first transaction.

Time-based cohorts group customers by the time they completed their first activity.

In this lesson, we will group customers into cohorts based on the month of their first purchase. Then we will mark each transaction based on its relative time period since the first purchase. In this example, we will calculate the number of months since the acquisition. In the next step, we will calculate metrics like retention or average spend value, and build this heatmap.

For example, this number means that 24% of the cohort which signed up in August 2011, was active 4 months later.

Column one here is the month of first purchase, therefore the retention rate is 100%. This is by definition, as customers had to be active on this month to be assigned to this cohort.

A little bit about data. We will use a 20% random sample from an Online retail dataset with half a million transactions.

This is a realistic dataset with customer transactions which is commonly used in segmentation.

Let's look at the first 5 rows of it.

The data contains 7 columns with customer transactions. The main ones we will use are date, price, and the customerID.
Now that we have loaded the data, let's build a simple cohort table for time-based cohorts.

First, we create a function that truncates a given date object to the first day of the month.

Then we apply it to the InvoiceDate and create an InvoiceMonth column.

Next, we create a groupby() object with CustomerID and use the InvoiceMonth column for further manipulation.

Finally, we use transform() together with a min() function to assign the smallest InvoiceMonth value to each customer. With just that, we have assigned the acquisition month cohort to each customer.

Let's look at the data. We have added two columns - InvoiceMonth and CohortMonth. Now, let's calculate the time offset!

Before we can calculate the time offset, we will first create a helper function that will extract integer values of the year, month and day from a datetime() object.

Now, we will calculate the number of months between any transaction and the first transaction for each customer. We will use the InvoiceMonth and CohortMonth values to do this.

We will start by creating two objects with year and month integer values from each of the InvoiceMonth and CohortMonth variables. Then we will calculate the differences in years and months between them.

Finally, we will convert the total difference to months by multiplying the year difference by 12 and adding them together.

You can see, there's a "+1" in the end. We do this so the first month is marked as 1 instead of 0 for easier interpretation.

You can see that the new column is added. Now, let's pull some metrics!

Now we will calculate the number of monthly active customers in each cohort.

First, we will create a groupby object with CohortMonth and CohortIndex.

Then, we will count number of customers in each group by applying pandas nunique() function.

Then, we reset the index and create a pandas pivot with CohortMonth in the rows, CohortIndex in the columns, and CustomerID counts as values.

Let's take a look at our table.

This is the result! We have created a table that will serve as the basis for the rest of this chapter.

In the next lesson, we will learn how to calculate retention rate - it's very simple and is just a few lines of code away! Now - it's your turn to build some cohorts!
Рекомендации по теме
Комментарии
Автор

GETTING AN ATTRIBUTE ERROR IN APPLYING THE FUNCTION THAT SAYS "Str object has no attribute year" although my dataset looks just same as in this video

oshinaagrawal