Online Learning Platform

Data Analysis Using Python > Data Processing > Recoding using One-Hot Encoding

Recoding using One-Hot Encoding

What is One-Hot Encoding?

One-hot encoding is a technique to convert categorical data into a numerical format which is the demand of all algorithms in machine learning. It works by creating a new binary (0 or 1) column for each unique category in the original data. For a given row, the column corresponding to its category is marked with a '1', while all other new columns for that category are '0'. 

How it works

Identify unique categories

First, identify all unique values within a categorical column (e.g., "Male," "Female," "Trans"). 

Create new binary columns

Create a new column for each of these unique categories. For example, if your original column was "gender," you would create "gender_Male," "gender_Female," and "gender_Trans" columns. 

Assign values

For each row, place a '1' in the new column that matches the original category and a '0' in all other new columns for that category. 

Codes:

import pandas as pd

data = pd.DataFrame({

    'gender': ['Male','Trans', 'Female', 'Female', 'Male', 'Male','Trans']

})

gender_dummies = pd.get_dummies(data['gender'])

 

Female

Male

Trans

0

False

True

False

1

False

False

True

2

True

False

False

3

True

False

False

4

False

True

False

5

False

True

False

6

False

False

True

data = pd.concat([data, gender_dummies], axis=1)

data

 

gender

Female

Male

Trans

0

Male

False

True

False

1

Trans

False

False

True

2

Female

True

False

False

3

Female

True

False

False

4

Male

False

True

False

5

Male

False

True

False

6

Trans

False

False

True

 

 

Prev
Recoding: Simple replacement using loc
Next
Recoding using Label Encoder
Feedback
ABOUT

Statlearner


Statlearner STUDY

Statlearner