R Tutorial: Baseline Conversion Rates

Показать описание

---

In the previous lesson, we learned some of the principles of A/B testing and took a look at our preliminary dataset. Let's spend some more time looking at our pre-experiment, or baseline values for an experiment.

Before starting any A/B testing experiment you'll want to know your baseline value, or the current value before any experimental changes happen. Why is this? Well, let's come back to our hypothesis.

We said we expect a different photo to result in more conversions, but what does "more" really mean in this context?

Is it compared to the conversion rate in the last year? today? next week?

or what about relative to when the experiment is actually run? If you're not planning to run your experiment for a couple of months there could be other factors that change your conversion rates between now and when the experiment is run.

To have a clearly defined hypothesis and experiment you need to know what your baseline for comparison is, otherwise you can't really know if your experiment had an effect or not.

For our experiment, to start, we'll compute the current, pre-experiment conversion rate over all of the time.

As mentioned earlier, for most of our analyses we'll use a suite of packages referred to as the tidyverse. Here, these packages will help us manipulate and plot our baseline data.

We'll also read in our click_data just as we did in the previous exercises.

From here we can find the mean of our clicked_adopt_today column to see what percentage of the time people clicked, also known as our conversion rate. We can do that with the dplyr function summarize, using the pipe to connect our data to the function. We then use the mean() function to compute the conversion rate (averaging the 1s and 0s in the clicked_adopt_today column).

If we look at the value of this new summarized column we see that it is 0-point-277, so a conversion rate of 27-point-7%. Meaning about 27 out of 100 visitors to the website clicked "ADOPT TODAY" with the current homepage picture.

We've successfully computed our pre-experiment conversion rate. However, we computed the conversion rate over the entire year. Maybe in certain months, people are more likely to adopt than in other months. We'll compute conversion rates for each month to see if there is an effect of seasonality.

Instead of summarizing over the entire dataset

We'll add a group_by() from dplyr by month, so we can find the conversion rate from each month of the year. Currently, the visit_date column gives dates up to the day of the month. To round off just to the month we'll use the lubridate package, and the function month.

Now instead of getting a single number, our output is a conversion rate for each month of the year, as we can see in the updated dataframe.

To really understand how conversion rates change throughout the year, it's useful to plot our data. We can do that with the package ggplot2 from the tidyverse.

First, we'll save our summarized data in a data frame click_data_sum. Then we'll use that new data frame in a ggplot call, setting the x-axis to our month column and the y-axis to our computed conversion rate column.

To display the data we'll plot the dots and connect them by a line with geom_point() and geom_line().

From this plot, it's clear that conversion rates are not steady across the year. Instead, rates are much higher in the summer months and at the end of the year than during the rest of the year.

n the following exercises we'll get some practice looking at baseline values.

#DataCamp #RTutorial #ABTestinginR #BaselineConversionRates