Implement a Linear Classification Model using TensorFlow Estimator

Home » News » Implement a Linear Classification Model using TensorFlow Estimator
Lets see how can we perform linear classification using TensorFlow library in Python. We will use LinearClassifier function from TensorFlow Estimator. We will use California Census Data and try to predict what class of income (>50k or <=50k) people belong to. You can download this dataset from here. This dataset has 32561 observations and 15 features. You can also download my Jupyter notebook containing below code from here. So, lets get started.


Step 1: Import required libraries


import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score


Step 2: Load and explore the dataset


dataset = pd.read_csv(‘adult.csv')
dataset.head()
dataset.size
dataset.shape
dataset.columns
dataset.dtypes
dataset.describe()


Step 3: Drop fnlwgt column

We are not going to use this column as it does not seem to contribute any relevant information in our prediction. So, better drop it.


dataset.drop(‘fnlwgt', axis=1, inplace=True)


Step 4: Convert label into 0 and 1


dataset[‘income'].unique()


Output: array([‘<=50K', ‘>50K'], dtype=object)


It means, we have only two string labels. Lets convert these into numeric labels (0 and 1).


def label_fix(label):
    if label == ‘<=50K':
        return 0
    else:     
        return 1


dataset[‘income'] = dataset[‘income'].apply(label_fix)
dataset.head()
dataset[‘income'].unique()
dataset[‘income'].value_counts()


Step 5: Split dataset into training and testing set


X = dataset.drop(‘income', axis=1)
y = dataset[‘income']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)


Step 6: Create Feature Columns


All the independent variables need to be converted into a proper type of tensor. The estimator needs to have a list of features to train the model. Hence, the column's data requires to be converted into a tensor. 


We need to create feature columns for our numeric and categorical data. Feature columns act as the intermediaries between raw data and TensorFlow Estimators.


Convert numeric columns into feature columns.


tf.feature_column.numeric_column: Use this to convert numeric column into feature columns.


Convert categorical columns into feature columns.


tf.feature_column.categorical_column_with_hash_bucket: Use this if you don’t know the set of possible values for a categorical column in advance and there are too many of them.


tf.feature_column.categorical_column_with_vocabulary_list: Use this if you know the set of all possible feature values of a column and there are only a few of them


So, lets convert our all the columns into feature columns as discussed above.


workclass = tf.feature_column.categorical_column_with_hash_bucket(‘workclass', hash_bucket_size=1000)


education = tf.feature_column.categorical_column_with_hash_bucket(‘education', hash_bucket_size=1000)


marital_status = tf.feature_column.categorical_column_with_hash_bucket(‘marital_status', hash_bucket_size=1000)


occupation = tf.feature_column.categorical_column_with_hash_bucket(‘occupation', hash_bucket_size=1000)


relationship = tf.feature_column.categorical_column_with_hash_bucket(‘relationship', hash_bucket_size=1000)


race = tf.feature_column.categorical_column_with_hash_bucket(‘race', hash_bucket_size=1000)


sex = tf.feature_column.categorical_column_with_vocabulary_list(‘sex', [‘Female', ‘Male'])


native_country = tf.feature_column.categorical_column_with_hash_bucket(‘native_country', hash_bucket_size=1000)


age = tf.feature_column.numeric_column(‘age')


education_num = tf.feature_column.numeric_column(‘education_num')


capital_gain = tf.feature_column.numeric_column(‘capital_gain')


capital_loss = tf.feature_column.numeric_column(‘capital_loss')


hours_per_week = tf.feature_column.numeric_column(‘hours_per_week')


feature_columns = [workclass, education, marital_status, occupation, relationship, race, sex, native_country, age, education_num, capital_gain, capital_loss, hours_per_week]


Step 7: Create Input Function


We now create an input function that would feed Pandas DataFrame into our classifier model. It requires you to specify the features, labels and batch size. It also has a special argument called shuffle,which allows the model to read the records in a random order, thereby improving model performance. You can also specify number of epochs you want to use.


input_fn = tf.estimator.inputs.pandas_input_fn(x=X_train, y=y_train, batch_size=128, num_epochs=None, shuffle=True)


I have set the batch size of 128 and None for number of epochs. By default number of epochs is 1.


Step 8: Create a model using feature columns and input function


model = tf.estimator.LinearClassifier(feature_columns = feature_columns)
model.train(input_fn = input_fn, steps=1000)


Let the optimizer perform 1000 steps.


Step 9: Make predictions


pred_fn = tf.estimator.inputs.pandas_input_fn(x=X_test, batch_size=len(X_test), shuffle=False)
predictions = list(model.predict(input_fn = pred_fn))
predictions[0]
final_preds = []
for pred in predictions:
    final_preds.append(pred[‘class_ids'][0])
final_preds[:10]
df=pd.DataFrame({‘Actual':y_test, ‘Predicted':final_preds})  
df 


Step 10: Check accuracy


print(classification_report(y_test, final_preds))
print(confusion_matrix(y_test, final_preds))
print(accuracy_score(y_test, final_preds))


We got around 82.5% accuracy. You can play around with hyper-parameters like number of epochs, number of steps, batch size etc. to improve the accuracy.

Leave a Reply

Your email address will not be published. Required fields are marked *

New Providers
Binolla

The Broker
More then 2 million businesses
See Top 10 Broker

gamehag

Online game
More then 2 million businesses
See Top 10 Free Online Games

New Games
Lies of P

$59.99 Standard Edition
28% Save Discounts
See Top 10 Provider Games

COCOON

$24.99 Standard Edition
28% Save Discounts
See Top 10 Provider Games

New Offers
Commission up to $1850 for active user of affiliate program By Exness

Top Points © Copyright 2023 | By Topoin.com Media LLC.
Topoin.info is a site for reviewing the best and most trusted products, bonus, offers, business service providers and companies of all time.

Discover more from Topoin

Subscribe now to keep reading and get access to the full archive.

Continue reading