Frequency Table displays the frequency of occurrence of each and every category in a feature. This provides a very useful information while analyzing the categorical variables. Pandas library provides value_counts function for this.
Consider a Load Prediction dataset. We will try find out frequency of occurrence of each and every category in all the variables.
Step 1: Import the required libraries
import pandas as pd
import numpy as np
Step 2: Load the dataset
dataset = pd.read_csv(“C:/train_loan_prediction.csv”)
Step 3: Find datatype of all variables
dataset.info()
dataset.dtypes
We find that columns Loan_ID, Gender, Married, Dependents, Education, Self_Employed are of object type (categorical variables).
Step 4: Draw Frequency Table
#Find all the categorical variables
categorical_columns = [x for x in dataset.dtypes.index if dataset.dtypes[x]=='object']
#Exclude Load_ID column
categorical_columns = [x for x in categorical_columns if x not in [‘Loan_ID']]
#Print frequency of categories
for col in categorical_columns:
print(‘nFrequency of Categories for variable %s'%col)
print(dataset[col].value_counts())
Consider a Load Prediction dataset. We will try find out frequency of occurrence of each and every category in all the variables.
Step 1: Import the required libraries
import pandas as pd
import numpy as np
Step 2: Load the dataset
dataset = pd.read_csv(“C:/train_loan_prediction.csv”)
Step 3: Find datatype of all variables
dataset.info()
dataset.dtypes
We find that columns Loan_ID, Gender, Married, Dependents, Education, Self_Employed are of object type (categorical variables).
Step 4: Draw Frequency Table
#Find all the categorical variables
categorical_columns = [x for x in dataset.dtypes.index if dataset.dtypes[x]=='object']
#Exclude Load_ID column
categorical_columns = [x for x in categorical_columns if x not in [‘Loan_ID']]
#Print frequency of categories
for col in categorical_columns:
print(‘nFrequency of Categories for variable %s'%col)
print(dataset[col].value_counts())
Run the above code and observe the results. It displays frequency of all the categorical variables (how many times a particular category exists in a feature).
Similar: How to use pandas value_counts() function to impute missing values?