M4 PROJECT: What are the most influencing factors for employee attrition and whom are those leaveing.¶

#Importing the libriaries needed 
# Ignore the warnings
import warnings
warnings.filterwarnings('always')
warnings.filterwarnings('ignore')

# data visualisation and manipulation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
import seaborn as sns
#import missingno as msno

#configure
# sets matplotlib to inline and displays graphs below the corressponding cell.
%matplotlib inline  
style.use('fivethirtyeight')
sns.set(style='whitegrid',color_codes=True)

#import the necessary modelling algos.
#from sklearn.linear_model import LogisticRegression
#from sklearn.svm import LinearSVC
#from sklearn.svm import SVC
#from sklearn.neighbors import KNeighborsClassifier
#from sklearn.ensemble import RandomForestClassifier
#from sklearn.tree import DecisionTreeClassifier
#from sklearn.ensemble import GradientBoostingClassifier
#from sklearn.naive_bayes import GaussianNB

#model selection
#from sklearn.model_selection import train_test_split
#from sklearn.model_selection import KFold
#from sklearn.metrics import accuracy_score,precision_score,recall_score,confusion_matrix,roc_curve,roc_auc_score
#from sklearn.model_selection import GridSearchCV

#from imblearn.over_sampling import SMOTE

#preprocess.
#from sklearn.preprocessing import MinMaxScaler,StandardScaler,LabelEncoder,OneHotEncoder

# Common sklearn Model Helpers
#from sklearn import feature_selection
#from sklearn import model_selection
#from sklearn import metrics
# from sklearn.datasets import make_classification

# sklearn modules for performance metrics
#from sklearn.metrics import confusion_matrix, classification_report, precision_recall_curve
#from sklearn.metrics import auc, roc_auc_score, roc_curve, recall_score, log_loss
#from sklearn.metrics import f1_score, accuracy_score, roc_auc_score, make_scorer
#from sklearn.metrics import average_precision_score
# ann and dl libraraies
#from keras import backend as K
#from keras.models import Sequential
#from keras.layers import Dense
#from keras.optimizers import Adam,SGD,Adagrad,Adadelta,RMSprop
#from keras.utils import to_categorical

#import tensorflow as tf
import random as rn

1. Dataset Explaination¶

This dataset was a fictional dataset created by IBM to indentify important factors that may be influencing attrition for an employee. The dataset contains 1470 rows and 35 coulmns. Our project is focusing on to find the most importance metrics that influence attrition. Firstly we need to do some general statistics to get insight into the dataset, Second using machine learning to predict attrition. Maybe it will give findings that people do not usally think about regarding employee attrition.

Haveing a understanding of what make employees leave is important to know, if a person is leaving replacement cost could be high. Being aware of it will be easier to take action to improve to the employee attrition.

Some of the questions we want to cover during this project

What is the likelihood of an active employee leaving the company?
What are the key indicators of an employee leaving the company?
What policies or strategies can be adopted based on the results to improve employee retention?

Given that we have data on former employees, this is a standard supervised classification problem where the label is a binary variable, 0 (active employee), 1 (former employee). In this study, our target variable Y is the probability of an employee leaving the company.

Some important columns in the dataset with information about personal and employment details, explained in more:

Some important columns:¶

Attrition: Whether employees are still with the company or whether they’ve gone to work somewhere else.
Age: 18 to 60 years old
Gender: Female or Male
Department: Research & Development, Sales, Human Resources.
BusinessTravel: Travel_Rarely, Travel_Frequently, Non-Travel.
DistanceFromHome: Distance between the company and their home in miles.
MonthlyIncome: Employees' numeric monthly income.
MaritalStatus: Married, Single, Divorced.
Education: 1 'Below College' 2 'College' 3 'Bachelor' 4 'Master' 5 'Doctor'.
EducationField: Life Sciences， Medical， Marketing，Technical Degree，Other.
EnvironmentSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
RelationshipSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
JobInvolvement: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
JobRole: Sales Executive，Research Science, Laboratory Tec, Manufacturing, Healthcare Rep, etc
JobSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
OverTime: Whether they work overtime or not.
NumCompaniesWorked: Number of companies they worked for before joinging IBM.
PerformanceRating: 1 'Low' 2 'Good' 3 'Excellent' 4 'Outstanding'.
YearsAtCompany: Years they worked for IBM.
WorkLifeBalance: 1 'Bad' 2 'Good' 3 'Better' 4 'Best'.
YearsSinceLastPromotion: Years passed since their last promotion.

2. Data Preparation¶

df_employee = pd.read_csv('WA_Fn-UseC_-HR-Employee-Attrition.csv')

After loading the dataset into a dataframe, using the command under we can get a good understadning how dataset is put together. Some of functions used is listed under.

df_employee.head() 
df_employee.columns
df_employee.decribe()
df_employee.shape()
df_employee.info()

From the those commands we can see that the dataset contains no missing values. It is several numerical and categorical variables. From a HR prepective,these type of data about empoyees is unlikely to feature huge amount of missing data

#Taking a look at the data set
df_employee.head()

df_employee.columns

Index(['Age', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department',
       'DistanceFromHome', 'Education', 'EducationField', 'EmployeeCount',
       'EmployeeNumber', 'EnvironmentSatisfaction', 'Gender', 'HourlyRate',
       'JobInvolvement', 'JobLevel', 'JobRole', 'JobSatisfaction',
       'MaritalStatus', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked',
       'Over18', 'OverTime', 'PercentSalaryHike', 'PerformanceRating',
       'RelationshipSatisfaction', 'StandardHours', 'StockOptionLevel',
       'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance',
       'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion',
       'YearsWithCurrManager'],
      dtype='object')

df_employee.shape

(1470, 35)

df_employee.describe()

df_employee.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1470 entries, 0 to 1469
Data columns (total 35 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Age                       1470 non-null   int64 
 1   Attrition                 1470 non-null   object
 2   BusinessTravel            1470 non-null   object
 3   DailyRate                 1470 non-null   int64 
 4   Department                1470 non-null   object
 5   DistanceFromHome          1470 non-null   int64 
 6   Education                 1470 non-null   int64 
 7   EducationField            1470 non-null   object
 8   EmployeeCount             1470 non-null   int64 
 9   EmployeeNumber            1470 non-null   int64 
 10  EnvironmentSatisfaction   1470 non-null   int64 
 11  Gender                    1470 non-null   object
 12  HourlyRate                1470 non-null   int64 
 13  JobInvolvement            1470 non-null   int64 
 14  JobLevel                  1470 non-null   int64 
 15  JobRole                   1470 non-null   object
 16  JobSatisfaction           1470 non-null   int64 
 17  MaritalStatus             1470 non-null   object
 18  MonthlyIncome             1470 non-null   int64 
 19  MonthlyRate               1470 non-null   int64 
 20  NumCompaniesWorked        1470 non-null   int64 
 21  Over18                    1470 non-null   object
 22  OverTime                  1470 non-null   object
 23  PercentSalaryHike         1470 non-null   int64 
 24  PerformanceRating         1470 non-null   int64 
 25  RelationshipSatisfaction  1470 non-null   int64 
 26  StandardHours             1470 non-null   int64 
 27  StockOptionLevel          1470 non-null   int64 
 28  TotalWorkingYears         1470 non-null   int64 
 29  TrainingTimesLastYear     1470 non-null   int64 
 30  WorkLifeBalance           1470 non-null   int64 
 31  YearsAtCompany            1470 non-null   int64 
 32  YearsInCurrentRole        1470 non-null   int64 
 33  YearsSinceLastPromotion   1470 non-null   int64 
 34  YearsWithCurrManager      1470 non-null   int64 
dtypes: int64(26), object(9)
memory usage: 402.1+ KB

df_employee.isnull().sum()

Age                         0
Attrition                   0
BusinessTravel              0
DailyRate                   0
Department                  0
DistanceFromHome            0
Education                   0
EducationField              0
EmployeeCount               0
EmployeeNumber              0
EnvironmentSatisfaction     0
Gender                      0
HourlyRate                  0
JobInvolvement              0
JobLevel                    0
JobRole                     0
JobSatisfaction             0
MaritalStatus               0
MonthlyIncome               0
MonthlyRate                 0
NumCompaniesWorked          0
Over18                      0
OverTime                    0
PercentSalaryHike           0
PerformanceRating           0
RelationshipSatisfaction    0
StandardHours               0
StockOptionLevel            0
TotalWorkingYears           0
TrainingTimesLastYear       0
WorkLifeBalance             0
YearsAtCompany              0
YearsInCurrentRole          0
YearsSinceLastPromotion     0
YearsWithCurrManager        0
dtype: int64

Now is time to check if all variables will give some useful insights or some of them could be deleted. To check this it is possible to loop thought and check if unique value is 1, and them drop the columns.

#this fuction is not test out yet. but will be 
notneeded = []
for col in df_employee.columns:
    if len(df_employee[col].unique()) == 1:
        notneeded.append(col)
        df_employee.drop(col,inplace=True,axis=1)

print(notneeded)

['EmployeeCount', 'Over18', 'StandardHours']

df_employee.drop(['EmployeeNumber'], axis = 1, inplace = True)

print(df_employee.shape)
df_employee.head()

(1470, 31)

After running this loop, the columns dropped where EmployeeCount, Over18 and StandardHours Also the EmployeeNumber is just a number increaseing so that column is dropped. The Next step would be to look at how the different variables are correlated.

f, ax = plt.subplots(figsize=(20, 20))
corr = df_employee.corr()
sns.heatmap(corr, mask=np.zeros_like(corr, dtype=np.bool), cmap=sns.diverging_palette(220, 10, as_cmap=True),
            square=True, ax=ax, annot = True)

<AxesSubplot:>

Conclutions from Data preparing:¶

Several numerical and categorical columns with information about employee's personal and employment details
There are no missing values in this dataset, hence HR do not tend to have huge amout of missing data
Dropped the not useful cloumns
Important correlations:
- Age - TotalworkingYears
- Age - JobLevel
- JobLevel - MonthlyIncome
- JobLevel - TotalWorkingYears
- YearsAtCompany -YearsInCurrentRole
- MonthlyIncome - TotalWorkingYears

3. EDA (Exploratory Data Analysis)¶

3.1 General feature statistics¶

Starting the EDA of with some histograms of the for numerical features.

df_hrs = df_employee.copy()
df_hr_cat_name = df_employee.copy()
df_Anumber = df_employee.copy()

df_employee.hist(figsize=(20,20))
plt.show()

What we can see from the histograms is that many of them is right skwed
Age distribution is more towards the younger genration between 25 and 45 years old
A lot of people are in the compnay less than 10 years

Attrtion Rate¶

plt.figure(figsize=(20,5))
plt.subplot(1,2,1)
df_employee['Attrition'].value_counts().plot.pie(autopct='%1.1f%%')
plt.subplot(1,2,2)
sns.countplot(df_employee['Attrition'])
df_employee['Attrition'].value_counts().to_frame()

From the attrition rate graphs we can see that the majority is still there. Important mention, piechart and bar chart interpert the colors differently.

Travelling¶

plt.figure(figsize=(20,5))
plt.subplot(1,2,1)
df_employee['BusinessTravel'].value_counts().plot.pie(autopct='%1.1f%%')
plt.subplot(1,2,2)
sns.countplot(df_employee['BusinessTravel'])
#df_employee['BusinessTravel'].value_counts().to_frame()
print(df_employee.groupby('BusinessTravel')['Attrition'].value_counts())
#df_employee.groupby('BusinessTravel')['Attrition'].value_counts
#plt.subplot(1,3,3)
#df_employee.groupby('Attrition')['BusinessTravel'].value_counts(df_employee.Attrition.all()).plot.pie(autopct='%1.1f%%',figsize=(11,6))
#print(df_employee.groupby(['BusinessTravel','Gender'])['Attrition'].value_counts(100. * df_employee.Attrition.value_counts() / len(df_employee.Attrition)))

BusinessTravel     Attrition
Non-Travel         No           138
                   Yes           12
Travel_Frequently  No           208
                   Yes           69
Travel_Rarely      No           887
                   Yes          156
Name: Attrition, dtype: int64

The business travels have a clear amount that travel rarely, the person travel frequently or do not travel is small

OverTime¶

plt.figure(figsize=(20,5))
plt.subplot(1,2,1)
df_employee['OverTime'].value_counts().plot.pie(autopct='%1.1f%%')
plt.subplot(1,2,2)
sns.countplot(df_employee['OverTime'])
#df_employee['OverTime'].value_counts().to_frame()
df_employee.groupby('OverTime')['Attrition'].value_counts()

OverTime  Attrition
No        No           944
          Yes          110
Yes       No           289
          Yes          127
Name: Attrition, dtype: int64

1/3 of the people tend to have overtime

Department¶

plt.figure(figsize=(20,5))
plt.subplot(1,2,1)
df_employee['Department'].value_counts().plot.pie(autopct='%1.1f%%')
plt.subplot(1,2,2)
sns.countplot(df_employee['Department'])
#print(df_employee.groupby('Department')['Attrition'].value_counts())
df_employee.groupby('Department')['Attrition'].value_counts()
#pd.pivot_table(df_employee, values = 'Department', index='Attrition').reset_index()
#sns.countplot(df_employee.groupby('Department',)['Attrition'])
#df_employee['Department'].value_counts().to_frame()

Department              Attrition
Human Resources         No            51
                        Yes           12
Research & Development  No           828
                        Yes          133
Sales                   No           354
                        Yes           92
Name: Attrition, dtype: int64

The majority of the people are part of the research & Development department with 65%, also sales department are big with 30%

Education Level¶

#need to change the value to the column to get a better understanding of what the graph says 
df_employee.Education.replace({1: 'High School', 2:'College', 3:'Bachelor', 4:'Master', 5:'Doctorate'},inplace=True)
df_hrs.Education.replace({1: 'High School', 2:'College', 3:'Bachelor', 4:'Master', 5:'Doctorate'},inplace=True)
#df_employee.Education.replace({'High School':1, 'Undergrad':2,'Graduate':3, 'Post Graduate':4, 'Doctorate':5},inplace=True)

plt.figure(figsize=(20,5))
plt.subplot(1,2,1)
df_employee['Education'].value_counts().plot.pie(autopct='%1.1f%%')
plt.subplot(1,2,2)
sns.countplot(df_employee['Education'])
plt.xticks(rotation=45)
#df_employee['Education'].value_counts().to_frame()
df_employee.groupby('Education')['Attrition'].value_counts()

Education    Attrition
Bachelor     No           473
             Yes           99
College      No           238
             Yes           44
Doctorate    No            43
             Yes            5
High School  No           139
             Yes           31
Master       No           340
             Yes           58
Name: Attrition, dtype: int64

in the education level is a clear for persons having bacheolor andmaster degree

Education Field¶

plt.figure(figsize=(20,5))
plt.subplot(1,2,1)
df_employee['EducationField'].value_counts().plot.pie(autopct='%1.1f%%')
plt.subplot(1,2,2)
sns.countplot(df_employee['EducationField'])
plt.xticks(rotation=45)
#df_employee['EducationField'].value_counts().to_frame()
print(df_employee.groupby('EducationField')['Attrition'].value_counts().to_frame())

                            Attrition
EducationField   Attrition           
Human Resources  No                20
                 Yes                7
Life Sciences    No               517
                 Yes               89
Marketing        No               124
                 Yes               35
Medical          No               401
                 Yes               63
Other            No                71
                 Yes               11
Technical Degree No               100
                 Yes               32

There are two field that are dominant both medical and life science is

JobrRole¶

plt.figure(figsize=(20,10))
sns.catplot(y='JobRole', kind='count', aspect=4, data=df_employee)
print(df_employee['JobRole'].value_counts())
print(df_employee.groupby('JobRole')['Attrition'].value_counts())

Sales Executive              326
Research Scientist           292
Laboratory Technician        259
Manufacturing Director       145
Healthcare Representative    131
Manager                      102
Sales Representative          83
Research Director             80
Human Resources               52
Name: JobRole, dtype: int64
JobRole                    Attrition
Healthcare Representative  No           122
                           Yes            9
Human Resources            No            40
                           Yes           12
Laboratory Technician      No           197
                           Yes           62
Manager                    No            97
                           Yes            5
Manufacturing Director     No           135
                           Yes           10
Research Director          No            78
                           Yes            2
Research Scientist         No           245
                           Yes           47
Sales Executive            No           269
                           Yes           57
Sales Representative       No            50
                           Yes           33
Name: Attrition, dtype: int64

<Figure size 1440x720 with 0 Axes>

We can see the divition of different job roles, there are most sales excutive and least working in human resources.

# Changing numeric values to corresponding categorical values
df_employee['EnvironmentSatisfaction'] = df_employee['EnvironmentSatisfaction'].map({1: 'Low', 2: 'Medium', 3: 'High', 4: 'Very High'})
df_employee['JobInvolvement'] = df_employee['JobInvolvement'].map({1: 'Low', 2: 'Medium', 3: 'High', 4: 'Very High'})
df_employee['JobSatisfaction'] = df_employee['JobSatisfaction'].map({1: 'Low', 2: 'Medium', 3: 'High', 4: 'Very High'})
df_employee['RelationshipSatisfaction'] = df_employee['RelationshipSatisfaction'].map({1: 'Low', 2: 'Medium', 3: 'High', 4: 'Very High'})
df_employee['PerformanceRating'] = df_employee['PerformanceRating'].map({1: 'Low', 2: 'Good', 3: 'Excellent', 4: 'Outstanding'})
df_employee['WorkLifeBalance'] = df_employee['WorkLifeBalance'].map({1: 'Bad', 2: 'Good', 3: 'Better', 4: 'Best'})

plt.figure(figsize=(18,8))
plt.subplot(2,3,1)
sns.countplot(df_employee['EnvironmentSatisfaction'])
plt.subplot(2,3,2)
sns.countplot(df_employee['JobInvolvement'])
plt.subplot(2,3,3)
sns.countplot(df_employee['JobSatisfaction'])
plt.subplot(2,3,4)
sns.countplot(df_employee['RelationshipSatisfaction'])
plt.subplot(2,3,5)
sns.countplot(df_employee['PerformanceRating'])
plt.subplot(2,3,6)
sns.countplot(df_employee['WorkLifeBalance'])

<AxesSubplot:xlabel='WorkLifeBalance', ylabel='count'>

EnvironmentSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
RelationshipSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
JobInvolvement: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
JobSatisfaction: 1 'Low' 2 'Medium' 3 'High' 4 'Very High'.
PerformanceRating: 1 'Low' 2 'Good' 3 'Excellent' 4 'Outstanding'.
WorkLifeBalance: 1 'Bad' 2 'Good' 3 'Better' 4 'Best'.

Overall is the metrics, mostly from maybe surveys show that people are happy and scoring high on the metrics.

Wonder why the perfomance rating is only 3 and 4. More people are overall more satified with their situation.

We can see the divition of different job roles, there are most sales excutive and least working in human resources.

3.2 Attrition effected by Age and MaritalStatus?¶

1. Hypothesis: Single people tend to leave more often than married people?

2. Hypothesis: Male are more active leavers?

plt.figure(figsize=(20,5))
plt.subplot(1,2,1)
sns.violinplot(x ="Gender", y ="Age", hue ="Attrition",
data = df_hrs, split = True)
plt.subplot(1,2,2)
sns.violinplot(x ="Gender", y ="Age", hue ="OverTime",
data = df_hrs, split = True)
plt.xticks(rotation=45)
#df_employee['OverTime'].value_counts()

(array([0, 1]), [Text(0, 0, 'Female'), Text(1, 0, 'Male')])

plt.subplots(figsize=(20,5))
sns.histplot(data=df_employee, x="Age", hue="Attrition", multiple="stack",bins=5,kde=True)
#df_employee['Gender'].value_counts(normalize=True)
df_employee.groupby('Gender')['Attrition'].value_counts()

Gender  Attrition
Female  No           501
        Yes           87
Male    No           732
        Yes          150
Name: Attrition, dtype: int64

plt.subplots(figsize=(20,5))
sns.histplot(data=df_employee, x="Age", hue="MaritalStatus", multiple="stack",bins=5)
print(df_employee['MaritalStatus'].value_counts(normalize=True))
print(df_employee.groupby(['MaritalStatus','Gender'])['Attrition'].value_counts())

Married     0.457823
Single      0.319728
Divorced    0.222449
Name: MaritalStatus, dtype: float64
MaritalStatus  Gender  Attrition
Divorced       Female  No           108
                       Yes            9
               Male    No           186
                       Yes           24
Married        Female  No           241
                       Yes           31
               Male    No           348
                       Yes           53
Single         Female  No           152
                       Yes           47
               Male    No           198
                       Yes           73
Name: Attrition, dtype: int64

sns.histplot(data=df_employee, x="MaritalStatus", hue="Gender", multiple="stack",bins=5)

<AxesSubplot:xlabel='MaritalStatus', ylabel='Count'>

From the part 3.2 we can see:

that there are more men (150) leaving than women (87): Hypothesis 2 confirmed
45% Married Workers, 22% Singles and 31% Divorced.
More single leave more Married: Hypothesis 1 confirmed
Divorced women are stayers only 9 people leaving.

df_Anumber['Attrition'] = df_Anumber['Attrition'].map({'Yes': 1, 'No': 0})
#df_hr_cat_name['Attrition'] = df_hr_cat_name['Attrition'].map({'Yes': 1, 'No': 0})

3.3 Education, EducationField, JobRole and JobLevel¶

In this Chapther we are taking a look at the different JobRoles and Joblevels and connect that to Education and Education Field. And see who are more likely to leave 3. Hypothesis: Workers in lower JobLevel are morelikely to leave

4. Hypothesis: Lower Education get more training

plt.subplots(figsize=(18,5))
sns.countplot(df_employee.JobRole, hue=df_employee.JobLevel)
plt.xticks(rotation=10)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8]),
 [Text(0, 0, 'Sales Executive'),
  Text(1, 0, 'Research Scientist'),
  Text(2, 0, 'Laboratory Technician'),
  Text(3, 0, 'Manufacturing Director'),
  Text(4, 0, 'Healthcare Representative'),
  Text(5, 0, 'Manager'),
  Text(6, 0, 'Sales Representative'),
  Text(7, 0, 'Research Director'),
  Text(8, 0, 'Human Resources')])

plt.subplots(figsize=(20,5))
sns.histplot(data=df_employee, x="JobLevel" ,hue="JobRole", multiple="stack")
print(df_employee.groupby('JobLevel',)['JobRole'].value_counts(normalize=True))
print(df_employee.groupby(['JobLevel', 'JobRole'])['Attrition'].value_counts().sort_index())

JobLevel  JobRole                  
1         Research Scientist           0.430939
          Laboratory Technician        0.368324
          Sales Representative         0.139963
          Human Resources              0.060773
2         Sales Executive              0.436330
          Manufacturing Director       0.168539
          Healthcare Representative    0.146067
          Research Scientist           0.106742
          Laboratory Technician        0.104869
          Human Resources              0.024345
          Sales Representative         0.013109
3         Sales Executive              0.362385
          Manufacturing Director       0.206422
          Healthcare Representative    0.201835
          Research Director            0.128440
          Manager                      0.055046
          Human Resources              0.027523
          Laboratory Technician        0.013761
          Research Scientist           0.004587
4         Manager                      0.443396
          Research Director            0.245283
          Sales Executive              0.132075
          Manufacturing Director       0.094340
          Healthcare Representative    0.084906
5         Manager                      0.623188
          Research Director            0.376812
Name: JobRole, dtype: float64
JobLevel  JobRole                    Attrition
1         Human Resources            No            23
                                     Yes           10
          Laboratory Technician      No           144
                                     Yes           56
          Research Scientist         No           189
                                     Yes           45
          Sales Representative       No            44
                                     Yes           32
2         Healthcare Representative  No            75
                                     Yes            3
          Human Resources            No            13
          Laboratory Technician      No            51
                                     Yes            5
          Manufacturing Director     No            85
                                     Yes            5
          Research Scientist         No            55
                                     Yes            2
          Sales Executive            No           197
                                     Yes           36
          Sales Representative       No             6
                                     Yes            1
3         Healthcare Representative  No            39
                                     Yes            5
          Human Resources            No             4
                                     Yes            2
          Laboratory Technician      No             2
                                     Yes            1
          Manager                    No            10
                                     Yes            2
          Manufacturing Director     No            40
                                     Yes            5
          Research Director          No            28
          Research Scientist         No             1
          Sales Executive            No            62
                                     Yes           17
4         Healthcare Representative  No             8
                                     Yes            1
          Manager                    No            47
          Manufacturing Director     No            10
          Research Director          No            26
          Sales Executive            No            10
                                     Yes            4
5         Manager                    No            40
                                     Yes            3
          Research Director          No            24
                                     Yes            2
Name: Attrition, dtype: int64

From JobLevel and JobRole we can see that:

There are only Managers and Research Directors at level 5
The majority of people in Level 1 are only are low level workers
Only 2 roles Total Nr: 69, Roles: 62% Manager, 38% Resea

plt.subplots(figsize=(20,5))
sns.countplot(df_employee.JobRole, hue=df_employee.EducationField)
plt.xticks(rotation=10)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8]),
 [Text(0, 0, 'Sales Executive'),
  Text(1, 0, 'Research Scientist'),
  Text(2, 0, 'Laboratory Technician'),
  Text(3, 0, 'Manufacturing Director'),
  Text(4, 0, 'Healthcare Representative'),
  Text(5, 0, 'Manager'),
  Text(6, 0, 'Sales Representative'),
  Text(7, 0, 'Research Director'),
  Text(8, 0, 'Human Resources')])

The dominant Education field are Life science and medical. for sales executives marking are important

plt.subplots(figsize=(20,5))
sns.countplot(df_employee.JobRole, hue=df_employee.Education)
plt.xticks(rotation=10)

(array([0, 1, 2, 3, 4, 5, 6, 7, 8]),
 [Text(0, 0, 'Sales Executive'),
  Text(1, 0, 'Research Scientist'),
  Text(2, 0, 'Laboratory Technician'),
  Text(3, 0, 'Manufacturing Director'),
  Text(4, 0, 'Healthcare Representative'),
  Text(5, 0, 'Manager'),
  Text(6, 0, 'Sales Representative'),
  Text(7, 0, 'Research Director'),
  Text(8, 0, 'Human Resources')])

more people have bachelors ad masters degree

plt.subplots(figsize=(20,5))
sns.countplot(df_employee.EducationField, hue=df_employee.Education)
plt.xticks(rotation=10)

(array([0, 1, 2, 3, 4, 5]),
 [Text(0, 0, 'Life Sciences'),
  Text(1, 0, 'Other'),
  Text(2, 0, 'Medical'),
  Text(3, 0, 'Marketing'),
  Text(4, 0, 'Technical Degree'),
  Text(5, 0, 'Human Resources')])

Moving over to Traing for the employees

print(df_employee['TrainingTimesLastYear'].value_counts(['JobRole']))
print(df_employee.groupby(['TrainingTimesLastYear'])['Attrition'].value_counts(normalize=True).sort_index())
print(df_employee['TrainingTimesLastYear'].value_counts(['Attrition']))
plt.subplots(figsize=(20,5))
sns.countplot(df_employee.JobRole, hue=df_employee.TrainingTimesLastYear)
#print(df_employee['TrainingTimesLastYear'].value_counts(['JobRole']))

2    0.372109
3    0.334014
4    0.083673
5    0.080952
1    0.048299
6    0.044218
0    0.036735
Name: TrainingTimesLastYear, dtype: float64
TrainingTimesLastYear  Attrition
0                      No           0.722222
                       Yes          0.277778
1                      No           0.873239
                       Yes          0.126761
2                      No           0.820841
                       Yes          0.179159
3                      No           0.859470
                       Yes          0.140530
4                      No           0.788618
                       Yes          0.211382
5                      No           0.882353
                       Yes          0.117647
6                      No           0.907692
                       Yes          0.092308
Name: Attrition, dtype: float64
2    0.372109
3    0.334014
4    0.083673
5    0.080952
1    0.048299
6    0.044218
0    0.036735
Name: TrainingTimesLastYear, dtype: float64

<AxesSubplot:xlabel='JobRole', ylabel='count'>

plt.subplots(figsize=(20,5))
sns.countplot(df_employee.JobLevel, hue=df_employee.TrainingTimesLastYear)

<AxesSubplot:xlabel='JobLevel', ylabel='count'>

#df_employee.groupby(["JobRole"]).count().sort_values(["TrainingTimesLastYear"]==0)
#df1 = df_employee.melt(var_name='JobRole', value_name='TrainingTimesLastYear')
#df1
#df1 = df.melt(var_name='columns', value_name='index')
#df.apply(lambda x: x.value_counts())
print(df_employee['TrainingTimesLastYear'].value_counts()[0])
#df2
print(df_employee.JobRole[df_employee.TrainingTimesLastYear == 0].count())
#print(df_employee.groupby('JobLevel')['TrainingTimesLastYear'].value_counts()[0])

54
54

Training the employees have avarage on 2,3 times per year, indepentent on the JobLevel and JobRole

def highlight_max(s):
    is_max = s == s.max()
    return ['background-color: lightgreen' if v else '' for v in is_max]

#df.style.apply(highlight_max)

def highlight_min(s):
    is_min = s == s.min()
    return ['background-color: lightblue' if v else '' for v in is_min]

#df.style.apply(highlight_max)

TrainAtWork = df_hrs.groupby(['JobRole','Attrition'],as_index=False)[['JobInvolvement','JobSatisfaction','TrainingTimesLastYear','YearsInCurrentRole']].mean().sort_values(by=['TrainingTimesLastYear','JobSatisfaction'])
TrainAtWork.style.apply(highlight_max).apply(highlight_min)

pd.pivot_table(TrainAtWork, values = 'TrainingTimesLastYear', index='Attrition', columns = 'JobRole').reset_index()

plt.subplots(figsize=(20,5))
sns.histplot(data=df_employee, x="JobRole", hue="Attrition", multiple="stack")

<AxesSubplot:xlabel='JobRole', ylabel='Count'>

print(df_employee.groupby('Attrition')['JobRole'].value_counts(normalize=True).sort_index())
print(df_employee.groupby('Attrition')['JobLevel'].value_counts(normalize=True).sort_index())
print(df_employee.groupby(['Attrition','JobLevel'])['Education'].value_counts().sort_index().sort_values())
print(df_employee.groupby('Attrition')['EducationField'].value_counts(normalize=True).sort_index())

Attrition  JobRole                  
No         Healthcare Representative    0.098946
           Human Resources              0.032441
           Laboratory Technician        0.159773
           Manager                      0.078670
           Manufacturing Director       0.109489
           Research Director            0.063260
           Research Scientist           0.198702
           Sales Executive              0.218167
           Sales Representative         0.040552
Yes        Healthcare Representative    0.037975
           Human Resources              0.050633
           Laboratory Technician        0.261603
           Manager                      0.021097
           Manufacturing Director       0.042194
           Research Director            0.008439
           Research Scientist           0.198312
           Sales Executive              0.240506
           Sales Representative         0.139241
Name: JobRole, dtype: float64
Attrition  JobLevel
No         1           0.324412
           2           0.390916
           3           0.150852
           4           0.081914
           5           0.051906
Yes        1           0.603376
           2           0.219409
           3           0.135021
           4           0.021097
           5           0.021097
Name: JobLevel, dtype: float64
Attrition  JobLevel  Education  
Yes        5         Master           1
           4         Master           1
           2         Doctorate        1
           5         High School      1
           3         Doctorate        2
           1         Doctorate        2
No         5         Doctorate        2
Yes        2         High School      3
           5         Bachelor         3
           4         Bachelor         4
           3         College          4
                     High School      4
No         5         High School      5
           1         Doctorate        6
           3         Doctorate        7
           4         High School      8
Yes        3         Master           9
No         4         Doctorate        9
           5         College         13
Yes        3         Bachelor        13
           2         Bachelor        14
                     College         15
No         3         High School     16
           4         College         17
Yes        2         Master          19
No         5         Master          19
           2         Doctorate       19
Yes        1         High School     23
No         5         Bachelor        25
Yes        1         College         25
No         4         Master          27
Yes        1         Master          28
No         3         College         29
           4         Bachelor        40
           2         High School     44
           3         Master          49
Yes        1         Bachelor        65
No         1         High School     66
                     College         69
           3         Bachelor        85
           1         Master          93
           2         College        110
                     Master         152
                     Bachelor       157
           1         Bachelor       166
Name: Education, dtype: int64
Attrition  EducationField  
No         Human Resources     0.016221
           Life Sciences       0.419303
           Marketing           0.100568
           Medical             0.325223
           Other               0.057583
           Technical Degree    0.081103
Yes        Human Resources     0.029536
           Life Sciences       0.375527
           Marketing           0.147679
           Medical             0.265823
           Other               0.046414
           Technical Degree    0.135021
Name: EducationField, dtype: float64

plt.subplots(figsize=(20,5))
sns.countplot(df_employee.Attrition, hue=df_employee.JobRole)
df_employee.groupby('Attrition')['JobRole'].value_counts(normalize=True).sort_index()

Attrition  JobRole                  
No         Healthcare Representative    0.098946
           Human Resources              0.032441
           Laboratory Technician        0.159773
           Manager                      0.078670
           Manufacturing Director       0.109489
           Research Director            0.063260
           Research Scientist           0.198702
           Sales Executive              0.218167
           Sales Representative         0.040552
Yes        Healthcare Representative    0.037975
           Human Resources              0.050633
           Laboratory Technician        0.261603
           Manager                      0.021097
           Manufacturing Director       0.042194
           Research Director            0.008439
           Research Scientist           0.198312
           Sales Executive              0.240506
           Sales Representative         0.139241
Name: JobRole, dtype: float64

plt.subplots(figsize=(20,5))
sns.countplot(df_employee.Attrition, hue=df_employee.Education)

<AxesSubplot:xlabel='Attrition', ylabel='count'>

#plt.subplots(figsize=(20,5))
#sns.countplot(df_employee.Attrition, hue=df_employee.EducationField)

#plt.subplots(figsize=(20,5))
#sns.countplot(df_employee.MaritalStatus, hue=df_employee.JobRole)

3.4 JobSatisfaction and Commitment¶

plt.subplots(figsize=(20,5))
sns.countplot(df_employee.Attrition, hue=df_employee.JobInvolvement)

<AxesSubplot:xlabel='Attrition', ylabel='count'>

plt.subplots(figsize=(20,5))
sns.histplot(data=df_employee, x="JobRole", hue="JobInvolvement", multiple="stack",bins=5, kde=True)

<AxesSubplot:xlabel='JobRole', ylabel='Count'>

df_employee.groupby('JobLevel')['Age'].value_counts(bins=6).sort_index()

JobLevel                            
1         (17.958, 24.833]               87
          (24.833, 31.667]              192
          (31.667, 38.5]                157
          (38.5, 45.333]                 68
          (45.333, 52.167]               24
          (52.167, 59.0]                 15
2         (21.961, 28.333]               65
          (28.333, 34.667]              172
          (34.667, 41.0]                174
          (41.0, 47.333]                 75
          (47.333, 53.667]               28
          (53.667, 60.0]                 20
3         (26.965999999999998, 32.5]     40
          (32.5, 38.0]                   72
          (38.0, 43.5]                   34
          (43.5, 49.0]                   38
          (49.0, 54.5]                   22
          (54.5, 60.0]                   12
4         (28.968999999999998, 34.0]      5
          (34.0, 39.0]                    6
          (39.0, 44.0]                   23
          (44.0, 49.0]                   25
          (49.0, 54.0]                   31
          (54.0, 59.0]                   16
5         (38.978, 42.5]                 18
          (42.5, 46.0]                   14
          (46.0, 49.5]                    8
          (49.5, 53.0]                   15
          (53.0, 56.5]                   10
          (56.5, 60.0]                    4
Name: Age, dtype: int64

workForce =df_hrs.groupby(['JobRole','Attrition'], as_index=False)[['PerformanceRating','JobSatisfaction','EnvironmentSatisfaction','WorkLifeBalance']].mean().sort_values(by=['JobRole'])
workForce.style.apply(highlight_max).apply(highlight_min)

workForceD =df_hrs.groupby(['Department','Attrition'], as_index=False)[['PerformanceRating','JobSatisfaction','EnvironmentSatisfaction','WorkLifeBalance']].mean().sort_values(by=['Department'])
workForceD.style.apply(highlight_max).apply(highlight_min)

workForceG =df_hrs.groupby(['Gender','Attrition'], as_index=False)[['PerformanceRating','JobSatisfaction','EnvironmentSatisfaction','WorkLifeBalance']].mean().sort_values(by=['Gender'])
workForceG.style.apply(highlight_max).apply(highlight_min)

workForceE =df_hrs.groupby(['Education','Attrition'], as_index=False)[['PerformanceRating','JobSatisfaction','EnvironmentSatisfaction','WorkLifeBalance']].mean().sort_values(by=['Education'])
workForceE.style.apply(highlight_max).apply(highlight_min)

#workForcelevel =df_hrs.groupby(['JobLevel','Education','Attrition'], as_index=False)[['PerformanceRating','JobSatisfaction','EnvironmentSatisfaction','WorkLifeBalance']].mean().sort_values(by=['JobLevel'])
#workForcelevel

3.5 Enpowerment at Work¶

enpowerments =df_hrs.groupby(['JobRole','Attrition'], as_index=False)[['PerformanceRating','StockOptionLevel','JobInvolvement','MonthlyIncome','YearsSinceLastPromotion','YearsAtCompany']].mean().sort_values(by=['JobRole'])
enpowerments.style.apply(highlight_max).apply(highlight_min)

enpowermentsD =df_hrs.groupby(['Department','Attrition'], as_index=False)[['PerformanceRating','JobInvolvement','MonthlyIncome','YearsSinceLastPromotion','YearsAtCompany']].mean().sort_values(by=['Department'])
enpowermentsD.style.apply(highlight_max).apply(highlight_min)

enpowermentsG =df_hrs.groupby(['Gender','Attrition'], as_index=False)[['PerformanceRating','JobInvolvement','MonthlyIncome','YearsSinceLastPromotion','YearsAtCompany']].mean().sort_values(by=['Gender'])
enpowermentsG.style.apply(highlight_max).apply(highlight_min)

plt.subplots(figsize=(20,5))
sns.countplot(df_employee.JobRole, hue=df_employee.StockOptionLevel)
print(df_employee['JobRole'].value_counts(['StockOptionLevel']))
print(df_employee.groupby('StockOptionLevel')['JobRole'].value_counts().sort_index(ascending=True))

Sales Executive              0.221769
Research Scientist           0.198639
Laboratory Technician        0.176190
Manufacturing Director       0.098639
Healthcare Representative    0.089116
Manager                      0.069388
Sales Representative         0.056463
Research Director            0.054422
Human Resources              0.035374
Name: JobRole, dtype: float64
StockOptionLevel  JobRole                  
0                 Healthcare Representative     51
                  Human Resources               24
                  Laboratory Technician        108
                  Manager                       37
                  Manufacturing Director        59
                  Research Director             33
                  Research Scientist           136
                  Sales Executive              139
                  Sales Representative          44
1                 Healthcare Representative     59
                  Human Resources               21
                  Laboratory Technician        110
                  Manager                       57
                  Manufacturing Director        60
                  Research Director             31
                  Research Scientist           104
                  Sales Executive              125
                  Sales Representative          29
2                 Healthcare Representative     13
                  Human Resources                3
                  Laboratory Technician         21
                  Manager                        5
                  Manufacturing Director        20
                  Research Director             11
                  Research Scientist            34
                  Sales Executive               44
                  Sales Representative           7
3                 Healthcare Representative      8
                  Human Resources                4
                  Laboratory Technician         20
                  Manager                        3
                  Manufacturing Director         6
                  Research Director              5
                  Research Scientist            18
                  Sales Executive               18
                  Sales Representative           3
Name: JobRole, dtype: int64

pd.pivot_table(df_employee, values = 'StockOptionLevel', index='Attrition', columns = 'JobRole').reset_index()

plt.subplots(figsize=(20,5))
sns.countplot(df_employee.Attrition, hue=df_employee.StockOptionLevel)
df_employee.groupby('Attrition',)['StockOptionLevel'].value_counts(normalize=True).sort_index()

Attrition  StockOptionLevel
No         0                   0.386861
           1                   0.437956
           2                   0.118410
           3                   0.056772
Yes        0                   0.649789
           1                   0.236287
           2                   0.050633
           3                   0.063291
Name: StockOptionLevel, dtype: float64

Have you reached Stock OptionLevel 2 then the chance for attirion is much lower

df_hrs[df_hrs.JobLevel ==3].groupby('Education', as_index=False)[['JobSatisfaction']].mean().sort_values(by=['JobSatisfaction'])

#need to change the value to the column to get a better understanding of what the graph says 
#df_employee.Education.replace({1: 'High School', 2:'Undergrad', 3:'Graduate', 4:'Post Graduate', 5:'Doctorate'},inplace=True)
sns.lineplot(x = 'JobLevel', y = 'Attrition', data=df_employee, hue='Education')

<AxesSubplot:xlabel='JobLevel', ylabel='Attrition'>

df_employee.groupby('StockOptionLevel')['Age'].value_counts(bins=6).sort_index()

StockOptionLevel                            
0                 (17.956999999999997, 25.0]     74
                  (25.0, 32.0]                  169
                  (32.0, 39.0]                  187
                  (39.0, 46.0]                  102
                  (46.0, 53.0]                   66
                  (53.0, 60.0]                   33
1                 (21.961, 28.333]               84
                  (28.333, 34.667]              145
                  (34.667, 41.0]                169
                  (41.0, 47.333]                101
                  (47.333, 53.667]               53
                  (53.667, 60.0]                 44
2                 (21.962999999999997, 28.0]     25
                  (28.0, 34.0]                   47
                  (34.0, 40.0]                   44
                  (40.0, 46.0]                   21
                  (46.0, 52.0]                   15
                  (52.0, 58.0]                    6
3                 (23.964, 29.833]               20
                  (29.833, 35.667]               23
                  (35.667, 41.5]                 20
                  (41.5, 47.333]                 11
                  (47.333, 53.167]                6
                  (53.167, 59.0]                  5
Name: Age, dtype: int64

Looking at which age tend to have different joblevels is really clear that young people start in level 1 most, of them is in this level for 10 years. If you show some encouagement you will get to an other level faster. most of the people for level 3 are in their late 30s. from 40 years old, both level 4 and 5 are more dominant.

From the two graph above we can see that the youngest people working are single and are morlelylike to leave the company. there are also more people around 30 years old leave but the their relation ship tend to be married, but people then are seeking a new direction or a new job after working a few years at one place.

# Need to Have Numeric Values for Attrition

happy_job = df_Anumber.groupby('JobRole', as_index=False)[['JobSatisfaction','EnvironmentSatisfaction','JobInvolvement']].mean().sort_values(by=['JobInvolvement'])
happy_job.style.apply(highlight_max).apply(highlight_min)

df_Anumber['OverTime'] = df_Anumber['OverTime'].map({'Yes': 1, 'No': 0})

df_employee.groupby('JobRole', as_index=False)[['Age']].mean().sort_values(by=['Age'])

df_Anumber[df_Anumber.JobLevel ==3].groupby('Education', as_index=False)[['JobSatisfaction']].mean().sort_values(by=['JobSatisfaction'])

df_employee['Attrition'] = df_employee['Attrition'].map({'Yes': 1, 'No': 0})
df_Anumber.Education.replace({'High School':1, 'Collage':2,'Bachelor':3, 'Master':4, 'Doctorate':5},inplace=True)

role_income = df_employee.groupby('JobRole', as_index=False)[['MonthlyIncome', 'Attrition']].mean().sort_values(
    by=['MonthlyIncome'])
role_income.style.apply(highlight_max).apply(highlight_min)

People eith doctor degree on level 3 have lower overall job satifacation than other education levels

df_Anumber[df_Anumber.JobLevel ==4].groupby('Education', as_index=False)[['YearsSinceLastPromotion','TrainingTimesLastYear','JobSatisfaction']].mean().sort_values(by=['YearsSinceLastPromotion'])

plt.figure(figsize=(20,10))
sns.boxplot(data=df_employee, x='JobRole', y='YearsAtCompany',hue='Attrition')

<AxesSubplot:xlabel='JobRole', ylabel='YearsAtCompany'>

plt.figure(figsize=(20,10))
sns.boxplot(data=df_employee, x='JobRole', y='MonthlyIncome',hue='Attrition')

<AxesSubplot:xlabel='JobRole', ylabel='MonthlyIncome'>

It takes time for people with doctor degree to reach level 4 among people with higher education

plt.figure(figsize=(20,10))
sns.boxplot(data=df_employee, x='JobLevel', y='MonthlyIncome',hue='Attrition')

<AxesSubplot:xlabel='JobLevel', ylabel='MonthlyIncome'>

People in level 4 that leave has much less income than those who stay

3.7 Surprisng/Unique Findings¶

These plots need Attrition as a numeric feature.

sns.catplot(x = 'NumCompaniesWorked', y = 'Attrition', data=df_employee, aspect= 3, kind = 'bar')

<seaborn.axisgrid.FacetGrid at 0x7f1918b51310>

Have you worked for 2-4 companiesyou are less likely to leave.

sns.factorplot(x = 'NumCompaniesWorked', y = 'Attrition', hue = 'Gender', data=df_employee, aspect= 3, kind = 'bar')

<seaborn.axisgrid.FacetGrid at 0x7f1918bfa610>

Splitting on gender, we can clearly see that the attrtion rate stays up for male working for many companies, but woman are lower

sns.factorplot(x = 'JobLevel', y = 'Attrition', hue = 'Education', data=df_employee, aspect= 4, ci=None)

<seaborn.axisgrid.FacetGrid at 0x7f1914b60550>

A high education at job level 3 is increasing the attrition rate.

fig, ax1 = plt.subplots(figsize=(20,6))

sns.lineplot(data = role_income, x='JobRole',y='Attrition', sort = False, ax=ax1)
ax2 = ax1.twinx()

sns.barplot(data = role_income, x='JobRole', y='MonthlyIncome', alpha=0.7, ax=ax2)

<AxesSubplot:xlabel='JobRole', ylabel='MonthlyIncome'>

Sales have hghest attrition and lower income, attrition increase for HR, due to higher income and good jobsatifaction, least attrition for those with most income

From the EDA we obtained some interesting findings.

From the EDA we can some interesting things

SAles representatives tend to be promoted pretty fast, but most of them have low income and are more likely to quite due to to their relative high job satisfation. and worklife balance, but their education siuation says they are undergradeuated so that why they leave.
Many doctors are left on the level 3, for them to reach 4 it take a really long time compared to other people with higher education
have you worked for some companies, you are less likely to leave

END Part 1¶

sns.factorplot(x = 'Education', y = 'YearsSinceLastPromotion', hue = 'Attrition', data=df_employee, aspect= 4, ci=None)

<seaborn.axisgrid.FacetGrid at 0x7f19148f8d90>

sns.factorplot(x = 'JobRole', y = 'YearsSinceLastPromotion', hue = 'Department', data=df_employee, aspect= 4, ci=None)

<seaborn.axisgrid.FacetGrid at 0x7f1914982c10>

sns.factorplot(x = 'JobRole', y = 'TrainingTimesLastYear', hue = 'Attrition', data=df_employee, aspect= 4, ci=None)

<seaborn.axisgrid.FacetGrid at 0x7f191480c350>

sns.factorplot(x = 'JobRole', y = 'YearsSinceLastPromotion', hue = 'Attrition', data=df_employee, aspect= 4, ci=None)

<seaborn.axisgrid.FacetGrid at 0x7f1914756f50>

sns.factorplot(x = 'YearsSinceLastPromotion', y = 'TrainingTimesLastYear', hue = 'Attrition', data=df_employee, aspect= 4, ci=None)

<seaborn.axisgrid.FacetGrid at 0x7f19147efa10>

sns.factorplot(x = 'YearsSinceLastPromotion', y = 'TrainingTimesLastYear', hue = 'JobInvolvement', data=df_employee, aspect= 4, ci=None)

<seaborn.axisgrid.FacetGrid at 0x7f1914663450>

sns.factorplot(x = 'TrainingTimesLastYear', y = 'YearsSinceLastPromotion', hue = 'JobRole', data=df_employee, aspect= 4, ci=None)
#sns.factorplot(x = 'TrainingTimesLastYear', y = 'YearsSinceLastPromotion', hue = 'Education', data=df_employee, aspect= 4, ci=None)

<seaborn.axisgrid.FacetGrid at 0x7f191450df10>

sns.factorplot(x = 'YearsSinceLastPromotion', y = 'TrainingTimesLastYear', hue = 'Attrition', data=df_Anumber, aspect= 4, ci=None)

<seaborn.axisgrid.FacetGrid at 0x7f191460f950>

sns.factorplot(x = 'TrainingTimesLastYear', y = 'JobSatisfaction', hue = 'JobRole', data=df_Anumber, aspect= 4, ci=None)

<seaborn.axisgrid.FacetGrid at 0x7f1914449c90>

Created in deepnote.com Created in Deepnote

	Age	DailyRate	DistanceFromHome	Education	EmployeeCount	EmployeeNumber	EnvironmentSatisfaction	HourlyRate	JobInvolvement	JobLevel	...	RelationshipSatisfaction	StandardHours	StockOptionLevel	TotalWorkingYears	TrainingTimesLastYear	WorkLifeBalance	YearsAtCompany	YearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager
count	1470.000000	1470.000000	1470.000000	1470.000000	1470.0	1470.000000	1470.000000	1470.000000	1470.000000	1470.000000	...	1470.000000	1470.0	1470.000000	1470.000000	1470.000000	1470.000000	1470.000000	1470.000000	1470.000000	1470.000000
mean	36.923810	802.485714	9.192517	2.912925	1.0	1024.865306	2.721769	65.891156	2.729932	2.063946	...	2.712245	80.0	0.793878	11.279592	2.799320	2.761224	7.008163	4.229252	2.187755	4.123129
std	9.135373	403.509100	8.106864	1.024165	0.0	602.024335	1.093082	20.329428	0.711561	1.106940	...	1.081209	0.0	0.852077	7.780782	1.289271	0.706476	6.126525	3.623137	3.222430	3.568136
min	18.000000	102.000000	1.000000	1.000000	1.0	1.000000	1.000000	30.000000	1.000000	1.000000	...	1.000000	80.0	0.000000	0.000000	0.000000	1.000000	0.000000	0.000000	0.000000	0.000000
25%	30.000000	465.000000	2.000000	2.000000	1.0	491.250000	2.000000	48.000000	2.000000	1.000000	...	2.000000	80.0	0.000000	6.000000	2.000000	2.000000	3.000000	2.000000	0.000000	2.000000
50%	36.000000	802.000000	7.000000	3.000000	1.0	1020.500000	3.000000	66.000000	3.000000	2.000000	...	3.000000	80.0	1.000000	10.000000	3.000000	3.000000	5.000000	3.000000	1.000000	3.000000
75%	43.000000	1157.000000	14.000000	4.000000	1.0	1555.750000	4.000000	83.750000	3.000000	3.000000	...	4.000000	80.0	1.000000	15.000000	3.000000	3.000000	9.000000	7.000000	3.000000	7.000000
max	60.000000	1499.000000	29.000000	5.000000	1.0	2068.000000	4.000000	100.000000	4.000000	5.000000	...	4.000000	80.0	3.000000	40.000000	6.000000	4.000000	40.000000	18.000000	15.000000	17.000000

	JobRole	Attrition	JobInvolvement	JobSatisfaction	TrainingTimesLastYear	YearsInCurrentRole
11	Research Director	Yes	3.000000	2.500000	1.000000	15.000000
3	Human Resources	Yes	2.500000	2.166667	2.083333	2.000000
7	Manager	Yes	2.200000	2.400000	2.200000	7.800000
1	Healthcare Representative	Yes	2.666667	2.777778	2.222222	5.000000
9	Manufacturing Director	Yes	2.600000	2.600000	2.600000	3.500000
15	Sales Executive	Yes	2.526316	2.526316	2.649123	4.192982
13	Research Scientist	Yes	2.510638	2.425532	2.659574	2.191489
5	Laboratory Technician	Yes	2.532258	2.435484	2.661290	2.129032
12	Research Scientist	No	2.853061	2.840816	2.665306	3.481633
2	Human Resources	No	2.775000	2.675000	2.700000	3.475000
8	Manufacturing Director	No	2.688889	2.688889	2.755556	5.081481
0	Healthcare Representative	No	2.737705	2.786885	2.786885	4.852459
10	Research Director	No	2.769231	2.705128	2.820513	6.064103
6	Manager	No	2.804124	2.721649	2.845361	6.381443
14	Sales Executive	No	2.754647	2.802974	2.869888	4.996283
17	Sales Representative	Yes	2.454545	2.484848	2.939394	1.242424
4	Laboratory Technician	No	2.746193	2.771574	3.040609	3.538071
16	Sales Representative	No	2.780000	2.900000	3.060000	2.520000

JobRole	Attrition	Healthcare Representative	Human Resources	Laboratory Technician	Manager	Manufacturing Director	Research Director	Research Scientist	Sales Executive	Sales Representative
0	No	2.786885	2.700000	3.040609	2.845361	2.755556	2.820513	2.665306	2.869888	3.060000
1	Yes	2.222222	2.083333	2.661290	2.200000	2.600000	1.000000	2.659574	2.649123	2.939394

	JobRole	Attrition	PerformanceRating	JobSatisfaction	EnvironmentSatisfaction	WorkLifeBalance
0	Healthcare Representative	No	3.155738	2.786885	2.819672	2.704918
1	Healthcare Representative	Yes	3.111111	2.777778	2.111111	2.666667
2	Human Resources	No	3.150000	2.675000	2.675000	2.925000
3	Human Resources	Yes	3.083333	2.166667	2.333333	2.916667
4	Laboratory Technician	No	3.147208	2.771574	2.822335	2.817259
5	Laboratory Technician	Yes	3.209677	2.435484	2.387097	2.403226
6	Manager	No	3.206186	2.721649	2.814433	2.762887
7	Manager	Yes	3.000000	2.400000	1.800000	3.000000
9	Manufacturing Director	Yes	3.000000	2.600000	2.600000	2.700000
8	Manufacturing Director	No	3.200000	2.688889	2.940741	2.770370
10	Research Director	No	3.102564	2.705128	2.487179	2.858974
11	Research Director	Yes	3.000000	2.500000	3.000000	3.000000
12	Research Scientist	No	3.151020	2.840816	2.746939	2.669388
13	Research Scientist	Yes	3.255319	2.425532	2.617021	2.723404
14	Sales Executive	No	3.130112	2.802974	2.732342	2.858736
15	Sales Executive	Yes	3.105263	2.526316	2.385965	2.543860
16	Sales Representative	No	3.160000	2.900000	2.760000	2.780000
17	Sales Representative	Yes	3.121212	2.484848	2.696970	3.060606

	Department	Attrition	PerformanceRating	JobSatisfaction	EnvironmentSatisfaction	WorkLifeBalance
0	Human Resources	No	3.156863	2.705882	2.764706	2.921569
1	Human Resources	Yes	3.083333	2.166667	2.333333	2.916667
2	Research & Development	No	3.157005	2.769324	2.787440	2.748792
3	Research & Development	Yes	3.195489	2.458647	2.473684	2.578947
4	Sales	No	3.144068	2.810734	2.734463	2.836158
5	Sales	Yes	3.108696	2.521739	2.467391	2.739130

	Age	Attrition	BusinessTravel	DailyRate	Department	DistanceFromHome	Education	EducationField	EmployeeCount	EmployeeNumber	...	RelationshipSatisfaction	StandardHours	StockOptionLevel	TotalWorkingYears	TrainingTimesLastYear	WorkLifeBalance	YearsAtCompany	YearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager
0	41	Yes	Travel_Rarely	1102	Sales	1	2	Life Sciences	1	1	...	1	80	0	8	0	1	6	4	0	5
1	49	No	Travel_Frequently	279	Research & Development	8	1	Life Sciences	1	2	...	4	80	1	10	3	3	10	7	1	7
2	37	Yes	Travel_Rarely	1373	Research & Development	2	2	Other	1	4	...	2	80	0	7	3	3	0	0	0	0
3	33	No	Travel_Frequently	1392	Research & Development	3	4	Life Sciences	1	5	...	3	80	0	8	3	3	8	7	3	0
4	27	No	Travel_Rarely	591	Research & Development	2	1	Medical	1	7	...	4	80	1	6	3	3	2	2	2	2

	Gender	Attrition	PerformanceRating	JobSatisfaction	EnvironmentSatisfaction	WorkLifeBalance
0	Female	No	3.157685	2.728543	2.782435	2.760479
1	Female	Yes	3.172414	2.425287	2.367816	2.781609
2	Male	No	3.150273	2.812842	2.763661	2.795082
3	Male	Yes	3.146667	2.493333	2.520000	2.586667

	Education	Attrition	PerformanceRating	JobSatisfaction	EnvironmentSatisfaction	WorkLifeBalance
0	Bachelor	No	3.135307	2.701903	2.864693	2.737844
1	Bachelor	Yes	3.181818	2.414141	2.353535	2.686869
2	College	No	3.172269	2.823529	2.785714	2.785714
3	College	Yes	3.159091	2.477273	2.340909	2.659091
4	Doctorate	No	3.209302	2.744186	2.604651	2.767442
5	Doctorate	Yes	3.000000	2.000000	3.000000	3.200000
6	High School	No	3.187050	2.820144	2.726619	2.755396
7	High School	Yes	3.129032	2.709677	2.838710	2.870968
8	Master	No	3.144118	2.841176	2.670588	2.850000
9	Master	Yes	3.137931	2.465517	2.500000	2.448276

	JobRole	Attrition	PerformanceRating	StockOptionLevel	JobInvolvement	MonthlyIncome	YearsSinceLastPromotion	YearsAtCompany
0	Healthcare Representative	No	3.155738	0.844262	2.737705	7453.557377	2.885246	8.188525
1	Healthcare Representative	Yes	3.111111	0.666667	2.666667	8548.222222	4.111111	10.888889
2	Human Resources	No	3.150000	0.725000	2.775000	4391.750000	1.400000	5.675000
3	Human Resources	Yes	3.083333	0.833333	2.500000	3715.750000	0.833333	4.166667
4	Laboratory Technician	No	3.147208	0.913706	2.746193	3337.223350	1.548223	5.604061
5	Laboratory Technician	Yes	3.209677	0.516129	2.532258	2919.258065	1.016129	3.161290
6	Manager	No	3.206186	0.752577	2.804124	17201.484536	4.835052	14.371134
7	Manager	Yes	3.000000	0.600000	2.200000	16797.400000	4.800000	15.600000
9	Manufacturing Director	Yes	3.000000	0.800000	2.600000	7365.500000	1.700000	8.700000
8	Manufacturing Director	No	3.200000	0.814815	2.688889	7289.925926	2.148148	7.518519
10	Research Director	No	3.102564	0.871795	2.769231	15947.346154	2.910256	10.538462
11	Research Director	Yes	3.000000	0.000000	3.000000	19395.500000	14.000000	26.500000
12	Research Scientist	No	3.151020	0.836735	2.853061	3328.122449	1.440816	5.265306
13	Research Scientist	Yes	3.255319	0.446809	2.510638	2780.468085	1.851064	4.319149
14	Sales Executive	No	3.130112	0.881041	2.754647	6804.617100	2.360595	7.665428
15	Sales Executive	Yes	3.105263	0.526316	2.526316	7489.000000	3.070175	6.701754
16	Sales Representative	No	3.160000	0.740000	2.780000	2798.440000	1.360000	3.460000
17	Sales Representative	Yes	3.121212	0.454545	2.454545	2364.727273	0.606061	2.090909

	Department	Attrition	PerformanceRating	JobInvolvement	MonthlyIncome	YearsSinceLastPromotion	YearsAtCompany
0	Human Resources	No	3.156863	2.803922	7345.980392	2.000000	7.960784
1	Human Resources	Yes	3.083333	2.500000	3715.750000	0.833333	4.166667
2	Research & Development	No	3.157005	2.771739	6630.326087	2.179952	7.171498
3	Research & Development	Yes	3.195489	2.556391	4108.075188	1.872180	4.954887
4	Sales	No	3.144068	2.762712	7232.240113	2.395480	7.745763
5	Sales	Yes	3.108696	2.467391	5908.456522	2.195652	5.510870

	Gender	Attrition	PerformanceRating	JobInvolvement	MonthlyIncome	YearsSinceLastPromotion	YearsAtCompany
0	Female	No	3.157685	2.746507	7019.429142	2.339321	7.459082
1	Female	Yes	3.172414	2.528736	4769.735632	2.034483	5.919540
2	Male	No	3.150273	2.786885	6704.964481	2.162568	7.307377
3	Male	Yes	3.146667	2.513333	4797.160000	1.893333	4.673333

	JobRole	JobSatisfaction	EnvironmentSatisfaction	JobInvolvement
8	Sales Representative	2.734940	2.734940	2.650602
4	Manufacturing Director	2.682759	2.917241	2.682759
2	Laboratory Technician	2.691120	2.718147	2.694981
1	Human Resources	2.557692	2.596154	2.711538
7	Sales Executive	2.754601	2.671779	2.714724
0	Healthcare Representative	2.786260	2.770992	2.732824
3	Manager	2.705882	2.764706	2.774510
5	Research Director	2.700000	2.500000	2.775000
6	Research Scientist	2.773973	2.726027	2.797945

	JobRole	Age
8	Sales Representative	30.361446
2	Laboratory Technician	34.096525
6	Research Scientist	34.236301
1	Human Resources	35.500000
7	Sales Executive	36.889571
4	Manufacturing Director	38.296552
0	Healthcare Representative	39.809160
5	Research Director	44.000000
3	Manager	46.764706

	JobRole	MonthlyIncome	Attrition
8	Sales Representative	2626.000000	0.397590
2	Laboratory Technician	3237.169884	0.239382
6	Research Scientist	3239.972603	0.160959
1	Human Resources	4235.750000	0.230769
7	Sales Executive	6924.279141	0.174847
4	Manufacturing Director	7295.137931	0.068966
0	Healthcare Representative	7528.763359	0.068702
5	Research Director	16033.550000	0.025000
3	Manager	17181.676471	0.049020

	Education	YearsSinceLastPromotion	TrainingTimesLastYear	JobSatisfaction
3	4	3.821429	2.714286	2.571429
1	2	4.823529	2.352941	3.176471
2	3	4.863636	2.431818	2.636364
4	5	5.555556	2.777778	2.888889
0	1	7.500000	2.875000	2.625000

	Attrition
No	1233
Yes	237

	Education	JobSatisfaction
4	5	2.333333
3	4	2.655172
2	3	2.663265
1	2	2.787879
0	1	2.800000