filmov
tv
Python Tutorial: Descriptive Statistics

Показать описание
---
So now our dataset is ready to develop a predtictive algorithm. But before then, let's first get some quick descriptive insights.
The variable that is providing information whether an employee has left the company or not is the column **churn**. Basically, if the value of this column is 1 then an employee has churned, and if it is 0 then we have not obsereved turnover in this case. To calculate the turnover rate we have to count number of times this variable has the value 1 and 0 and then divide it by the total. If we multiply the result by 100 then the outcome will be the % of employees who left and stayed. This task is again accomplished in 3 steps:
- First we get the number of all the emplyees, which is basically the length of our data,
- Then, we count 1s and 0s in the column churn,
- Finally, we divide the counted values by the number of employees and multiple by 100 to get percentages.
As you can see around 76% of our emplyees stayed, while 24% have churned. Thus, we conclude that turnover rate is 24%.
Next, we are interested to learn what are the variables that are in a positive or negative linear relationship with our target. To see that, we will first of all develop the correlation matrix using the `corr()` method provided by **pandas** and then visualize the matrix using the `heatmap()` function by seaborn, a statistical visualization library. As you can see the target varaible **churn** has the highest negative correlation with satisfaction level. This shows that the increase in satisfaction level is associated with decrease in probability of turnover.
Now it's your turn to practice.
#DataCamp #PythonTutorial #Human #Resources #Analytics #Predicting #Employee #Churn #Python #Descriptive #Statistics