R Tutorial: Exploring the data

Показать описание

---
Data exploration is the first step in understanding your data.

Here is a quick view of the dataset you imported.

emp_id is the unique identifier for each employee.

status shows employment status of an employee. If an employee is currently working in the organization, they are marked as active while if an employee has left the organization, their status is marked as inactive.

The same also reflects in the turnover variable where 0 stands for active and 1 for inactive.

For inactive employees, the last working date in the organization is stored in last_working_date column.
For active employees, you will be using cutoff_date which is the study period end date.

Let's go ahead and calculate turnover rate to derive insights from data.

Turnover rate is the percentage of employees who left the organization in a given period of time.

To calculate turnover rate you need two numbers: the number of employees who left the organization during that period, i.e, count of all 1's and total number of employees in the organization during that period, i.e., sum of count of all 1's and 0's .

In other words, turnover rate is the mean of the turnover variable in your dataset.

First, let's look at the number of active and inactive employees using the count() function from dplyr.

count() gives you the number of rows of each unique value in a specified column.

You can calculate the turnover rate using the summarize() function. As mentioned before, you can take the mean of the turnover column to accomplish this.

Here you can see that approx 18% of employees are inactive which means 82% of employees are active in the dataset.

Turnover adversely affects efficiency, productivity, profitability and morale of the organization. To retain the talent it becomes imperative to find out where we are losing the most talent.

Employee turnover rate can vary across job levels, hence to calculate the level-wise turnover rate, you can use the group_by() function.

Visualizing your data generally helps when you are comparing several values.

You can plot a bar graph of the level wise turnover rate using the geom_col() layer and by placing level on the x-axis, turnover_level on the y-axis.

As you can see here, the turnover rate is highest at the Analyst and Specialist levels.

Now it's your turn to explore this data!

#DataCamp #RTutorial #HumanResources #Analytic #Churn #data