Python Tutorial: Exploring your data using visualizations

preview_player
Показать описание

---
One of the most important parts of the EDA workflow is data visualization. It helps you better understand your data and allows you to effectively communicate insights to technical and non-technical stakeholders alike.

In Python, the seaborn library allows you to easily create informative and attractive plots. It builds on top of matplotlib, which you may have seen in other courses. Here, we'll use seaborn.

Let's say you wanted to visualize the distribution of the account lengths of your customers. Many machine learning algorithms make assumptions about how the data is distributed, so it's important to understand how the variables in your own dataset are distributed before you apply those algorithms. A histogram is an effective way to visualize the distribution of a variable, and you can create one using seaborn's distplot function, which is short for distribution plot.

First, import seaborn. Then, pass in the Account Length feature of the telco DataFrame to the distplot function. Remember to call plot dot show to display the plot.

You can see here that it resembles a bell curve, also known as the normal distribution. It turns out that many things we measure in the real-world are well approximated by the normal distribution, and many models actually make the assumption that your data is normally distributed.

Let's now visualize the differences in account length between churners and non-churners. An effective way to do this is using a box plot, which you can create using seaborn's box plot function by specifying the x, y, and data parameters as shown here. As you can see, there doesn't appear to be any noticeable difference in account length.

The line in the middle of each box represents the median. The colored boxes represent the middle 50% of the account lengths for each group. The values here range from the 25th to the 75th percentile and give a sense for the spread of the distribution. The floating points represent outliers,which you can remove using the "sym" parameter, as shown here.

Seaborn allows you to easily add a third variable to your plot. For example, we might be interested in visualizing whether the "International Plan" feature has an impact on account length or churn. You can add this information to the plot by specifying the "hue" parameter. From the plot, it looks like as far as predicting churn goes, it does not matter whether or not a customer had an international plan.

In the exercises, you will visualize the distributions of other features and investigate their influence on churn. Happy plotting!

#PythonTutorial #Churn #Modeling #Python #DataCamp
#MarketingAnalytics #Exploring #data #visualizations
Рекомендации по теме