Machine Learning algorithms require all inputs to be numeric, so we should convert all our categorical variables into numeric variables by encoding the categories. Before that, please make sure that you have imputed all the missing values in all the categorical variables. We will use LabelEncoder which is present in Scikit Learn library to encode and transform categorical variables.
Consider a Load Prediction dataset. We will encode and transform all the categorical variables to numeric variables.
Step 1: Import the required libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
Step 2: Load the dataset
dataset = pd.read_csv(“C:/train_loan_prediction.csv”)
Step 3: Encode categorical variables using LabelEncoder
Categorical variables are Gender, Married, Dependents, Education, Self_Employed, Property_Area, Loan_Status. Lets encode and transform all these categorical variables to numeric variables in one go using following Python code.
categorical_vars = [‘Gender','Married','Dependents','Education','Self_Employed','Property_Area','Loan_Status']
label_encoder = LabelEncoder()
for i in categorical_vars:
dataset[i] = label_encoder.fit_transform(dataset[i])
Now, look at the datatypes of variables:
dataset.dtypes
You will see that datatype of all the categorical variables has been changed from object to other datatypes like int32, float64 etc. So, now our dataset is ready for Machine Leaning algorithms.
Related: Difference between Label Encoder and One Hot Encoder
Consider a Load Prediction dataset. We will encode and transform all the categorical variables to numeric variables.
Step 1: Import the required libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
Step 2: Load the dataset
dataset = pd.read_csv(“C:/train_loan_prediction.csv”)
Step 3: Encode categorical variables using LabelEncoder
Categorical variables are Gender, Married, Dependents, Education, Self_Employed, Property_Area, Loan_Status. Lets encode and transform all these categorical variables to numeric variables in one go using following Python code.
categorical_vars = [‘Gender','Married','Dependents','Education','Self_Employed','Property_Area','Loan_Status']
label_encoder = LabelEncoder()
for i in categorical_vars:
dataset[i] = label_encoder.fit_transform(dataset[i])
Now, look at the datatypes of variables:
dataset.dtypes
You will see that datatype of all the categorical variables has been changed from object to other datatypes like int32, float64 etc. So, now our dataset is ready for Machine Leaning algorithms.
Related: Difference between Label Encoder and One Hot Encoder