One hot encoding vs label encoding in Machine Learning

4 mins read26.2K Views Comment

Updated on Jan 24, 2023 15:37 IST

As in the previous blog, we come to know that the machine learning model can’t process categorical variables. So when we have categorical variables in our dataset then we need to convert them into numerical variables. So for that, there are many ways to convert categorical values into numerical values. Each approach has its own trade-offs and impact on the feature set. In the previous blog, I explained one hot encoding(dummy variables). I suggest you go through one hot encoding first in detail before starting this blog. Hereby, I will focus on 2 main methods: One-Hot-Encoding and Label-Encoding.

One hot Encoding
Label encoding
When to use One-Hot encoding and Label encoding?
Difference between One-Hot encoding and Label encoding
Label encoding python
Assignment
Endnotes

Many of you might be confused between these two — Label Encoding and One Hot Encoding. The basic purpose of these two techniques is the same i.e. conversion of categorical variables to numerical variables. But the application is different. So let’s understand the difference between these two with a simple example.

Recommended online courses

Best-suited Machine Learning courses for you

Learn Machine Learning with these high-rated online courses

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

Amity OnlineDegree

Total Fees

₹1.7 L

Duration

2 years

MCA with specialization in Machine Learning & Artificial Intelligence (ML & AI)

Amity OnlineDegree

Total Fees

₹2.5 L

Duration

2 years

MCA in Machine Learning

Amity University Online, NoidaDegree

Total Fees

₹2.5 L

Duration

2 years

Advance Certification in Applied Data Science, Machine Learning & IoT

IIT GuwahatiCertificate

4.0

Total Fees

₹95 K

Duration

9 months

Professional Certificate Course In Generative AI And Machine Learning

IIT KanpurCertificate

Total Fees

₹1.53 L

Duration

11 months

IIT Roorkee - Post Graduate Certificate Program in Data Science & Machine Learning (Online)

TimesProCertificate

4.0

Total Fees

₹2 L

Duration

10 months

Data Science & Machine Learning Course

Coding NinjasCertificate

4.8

Total Fees

₹34.65 K

Duration

11 months

M.Sc. in Machine Learning and AI

upGradDegree

Total Fees

₹5.6 L

Duration

18 months

Full Stack Machine Learning & AI Program

Jigsaw AcademyCertificate

Total Fees

– / –

Duration

8 hours

IIT Roorkee & Wiley Post Graduate Certification in AI for BFSI

IIT RoorkeeCertificate

Total Fees

– / –

Duration

6 months

One hot Encoding

Encoding is the action of converting. One-hot encoding converts the categorical data into numeric data by splitting the column into multiple columns. The numbers are replaced by 1s and 0s, depending on which column has what value. In our example, we’ll get four new columns, one for each country — India, Australia, Russia, and America.

Note: If you want to study one hot encoding with proper example and python code then click here.

Label encoding

This approach is very simple and it involves converting each value in a column into a number.

Consider a dataset having many more columns, to understand label-encoding, we will focus on one categorical column only i.e. State which is having below values.

Different labels are there in a State feature that is assigned different numeric values.

So for its implementation, all we have to do is:

Import the LabelEncoder class from the sklearn library
Fit and transform the first column of the data
Replacing the existing text data with the new encoded data.

Don’t worry we will learn it by coding also. Very simple code!!!

Also explore:

Top 10 concepts and Technologies in Machine learning

This blog will make you acquainted with new technologies in machine learning

Read Later

Handling Categorical Variables with One-Hot Encoding

Handling categorical variables with one-hot encoding involves converting non-numeric categories into binary columns. Each category becomes a column with ‘1’ for the presence of that category and ‘0’ for others,...read more

Read Later

When to use One-Hot encoding and Label encoding?

Depending upon the data encoding technique is selected. For example, we have encoded different state names into numerical data in the above example. This categorical data is having no relation, of any kind, between the rows. Then we can use Lable encoding.

Label encoder is used when:

The number of categories is quite large as one-hot encoding can lead to high memory consumption.
When the order does not matter in categorical feature.

One Hot encoder is used when:

When the order does not matter in categorical features
Categories in a feature are fewer.

Note: The model will misunderstand the data to be in some kind of order, 0 < 1 < 2. For e.g. In the above six classes’ example for “State” column, the model misunderstood a relationship between these values as follows: 0 < 1 < 2 < 3 < 4 < 5. To overcome this problem, we can use one-hot encoding as explained below.

Difference between One-Hot encoding and Label encoding?

Label Encoding	One-hot Encoding
1. The categorical values are labeled into numeric values by assigning each category to a number	1. A column with categorical values is split into multiple columns.
2. Different columns are not added. Rather different categories are converted into numeric values. So fewer computations.	2. It will add more columns and will be computationally heavy
3. Unique information is there	3. Redundant information is there
4. Different integers are used to represent data	4. Only 0 and 1 are used to represent data

Also read:

Hyperparameter Tuning: Beginners Tutorial

Read Later

Cross-validation techniques

Read Later

GridSearchCV and RandomizedSearchCV:Python code

In this tutorial, we shall introduce GridSearchCV and RandomsearchCV and their implementation in Python. In this tutorial, we shall introduce GridSearchCV and RandomsearchCV and their implementation in Python. In this...read more

Read Later

Label encoding python

Implemented this code using a dataset named adult.csv from Kaggle. It is census data. The goal of this machine learning project is to predict whether a person makes over 50K a year or not given their demographic variation.

1. Importing the Libraries

import pandas as pd
import numpy as np
Copy code

2. Reading the file

df = pd.read_csv("adult.csv")
df
Copy code

Output:

2. Importing label encoder

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
Copy code

3. Checking the different columns of the dataset

On checking we come to know there is space between quotes and the name of column as shown in the output. So while writing code we also have to give space.

df.columns
Copy code

Output:

4. Fitting and transform

df[' relationship']=le.fit_transform(df[' relationship'])
df
Copy code

After running this piece of code, if you check the value of ‘relationship’, you’ll see that the categories have been replaced by the numbers 0, 1,2,3,4, and 5.

Now you can write the same code for different categorical columns like work class, marital status, occupation, race, sex,native_country, income.

Note: Don’t apply the label encoding on the education column because in this category order matters.

We will talk about it in detail quoting different examples in the next blog.

Assignment

It’s my suggestion that simple reading code won’t help you. I suggest you download the adult.csv file from Kaggle(freely available).

Try to convert other categorical features into numerical features(I have converted only one feature)by using Lable Encoding
Try implementing an algorithm of your choice and find the prediction accuracy.

Endnotes

Congrats on making it to the end!! You should have an idea of what Label encoding is and why it is used and how to use it. We have to first handle categorical variables before moving to other steps like training model, hyperparameter tuning, cross-validation, evaluating the model, etc.

Will be coming up with another blog related to how to handle ordinal features. Stay tuned!!!

If you liked my blog consider hitting the stars.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

One hot encoding vs label encoding in Machine Learning

Table of contents

Best-suited Machine Learning courses for you

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

MCA with specialization in Machine Learning & Artificial Intelligence (ML & AI)

MCA in Machine Learning

Advance Certification in Applied Data Science, Machine Learning & IoT

Professional Certificate Course In Generative AI And Machine Learning

IIT Roorkee - Post Graduate Certificate Program in Data Science & Machine Learning (Online)

Data Science & Machine Learning Course

M.Sc. in Machine Learning and AI

Full Stack Machine Learning & AI Program

IIT Roorkee & Wiley Post Graduate Certification in AI for BFSI

One hot Encoding

Label encoding

When to use One-Hot encoding and Label encoding?

Label encoder is used when:

One Hot encoder is used when:

Difference between One-Hot encoding and Label encoding?

Also read:

Label encoding python

1. Importing the Libraries

2. Reading the file

2. Importing label encoder

3. Checking the different columns of the dataset

4. Fitting and transform

Assignment

Endnotes

Top Picks & New Arrivals