How to Calculate Adjusted R-Squared

4 mins read90 Views Comment

Updated on Nov 16, 2023 15:26 IST

Ever wondered how well your regression model truly fits your data, especially when multiple variables come into play? Adjusted R-squared—a metric that goes beyond traditional R-squared to offer deeper insights. But what makes it different from R-squared? This article will discuss all.

In the previous article, we discussed how to calculate the r-squared value for the machine learning algorithm. In this article, we will discuss another evaluation metric, i.e., adjusted r-squared, and will also discuss some examples to know why we need adjusted r-squared.
But before that let’s have a quick introduction of r-squared.

Table of Content

Understanding R-Squared
- Limitations of R-squared
The Need for Adjusted R-Squared
- Mathematical Formula of Adjusted R-Squared
- Interpretation of Adjusted R-Squared Formula
Calculating Adjusted R-Squared
Difference Between R-squared and Adjusted R-Squared

Recommended online courses

Best-suited Machine Learning courses for you

Learn Machine Learning with these high-rated online courses

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

Amity OnlineDegree

Total Fees

₹1.7 L

Duration

2 years

MCA with specialization in Machine Learning & Artificial Intelligence (ML & AI)

Amity OnlineDegree

Total Fees

₹2.5 L

Duration

2 years

MCA in Machine Learning

Amity University Online, NoidaDegree

Total Fees

₹2.5 L

Duration

2 years

Advance Certification in Applied Data Science, Machine Learning & IoT

IIT GuwahatiCertificate

4.0

Total Fees

₹95 K

Duration

9 months

Professional Certificate Course In Generative AI And Machine Learning

IIT KanpurCertificate

Total Fees

₹1.53 L

Duration

11 months

IIT Roorkee - Post Graduate Certificate Program in Data Science & Machine Learning (Online)

TimesProCertificate

4.0

Total Fees

₹2 L

Duration

10 months

Data Science & Machine Learning Course

Coding NinjasCertificate

4.8

Total Fees

₹34.65 K

Duration

11 months

M.Sc. in Machine Learning and AI

upGradDegree

Total Fees

₹5.6 L

Duration

18 months

Full Stack Machine Learning & AI Program

Jigsaw AcademyCertificate

Total Fees

– / –

Duration

8 hours

IIT Roorkee & Wiley Post Graduate Certification in AI for BFSI

IIT RoorkeeCertificate

Total Fees

– / –

Duration

6 months

What is R-Squared?

R-squared, also known as the coefficient of determination, describes the proportion of the variance in a dependent variable explained by an independent variable or variable in a linear regression model.

It is calculated by dividing the explained variation by the total variation or 1- (Unexplained Variation/Total Variation).

Mathematical Formula of R-Squared

R-Squared = 1- (SSR/SST)

where,

SSR: Sum of Squared Residual (The sum of Squared Error)

SST: Total sum of squares (sum of squared deviation from the mean)

Note:

The value of R-squared ranges between 0 and 1.
0 means that the model doesn’t explain any variation in the dependent variable.
1 means that the model explains all the variations.

Limitations of R-Squared

The value of r-squared will increase as the number of independent variables are added, regardless of whether they are relevant or not. This can lead to overfitting.
It is not the best metric for comparing models, especially when the models have a different number of predictors.
A high value of r-squared doesn’t necessarily mean the model is adequate.
R-squared is highly sensitive to outliers. A few outliers can significantly decrease the value of the R-squared value.

Why we need adjusted R-Squared?

As we mentioned earlier, the value of the adjusted r-squared increases if new variables are added. It doesn’t matter whether the added variable is correlated or not. To overcome this, an adjusted R-squared metric comes into existence that provides a more accurate measure of the model’s goodness of fit.

As the word suggests, adjusted r-squared adjusts for the number of predictors in the model, ensuring that only significant predictors enhance its value.

It penalizes the model for the inclusion of irrelevant predictors. This makes it a more robust metric, especially when evaluating the model with various predictors.

Adjusted R-Squared Formula

Adjusted R-Squared = 1- [(1 – R²) (n – 1)/ (n – k – 1)]

where,

n: number of data points

k: number of independent variables

R: R-squared value

Interpretation of Adjusted R-Squared Formula

If the value of the R-squared doesn’t increase significantly with the addition of a new independent variable, then the value of the adjusted R-squared value will decrease.
If the value of the R-squared significantly increases with adding a new independent variable, the value of the adjusted R-squared will also increase.

Note: It is recommended to use adjusted r-squared when multiple variables exist in the regression model. This would allow us to compare models with different numbers of independent variables.

Until now, you have a clear understanding of what adjusted r-squared is, its formula, and the need of adjusted r-squared over r-squared to evaluate the performance of machine learning model.

How to Calculate the Adjusted R-Squared?

Problem Statement: Create a dataset, build two linear regression models (simple linear regression model and multiple regression model) and then calculate the value of R² and adjusted R² in both the cases.

Solution

Step-1: Create a Sample dataset

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Create a synthetic dataset
np.random.seed(0)
n_samples = 100
StudyHours = np.random.uniform(1, 10, n_samples)
Extracurricular = np.random.randint(0, 5, n_samples)
FinalExamScores = 50 + 3 * StudyHours + 2 * Extracurricular + np.random.normal(0, 5, n_samples)

# Create a DataFrame from the data
data = pd.DataFrame({'StudyHours': StudyHours, 'Extracurricular': Extracurricular, 'FinalExamScores': FinalExamScores})
data.head()
Copy code

Output

Step-2: Split the data into predictors (X) and target (Y)

# Split the data into predictors (X) and target (y)
X = data[['StudyHours', 'Extracurricular']]
y = data['FinalExamScores']
Copy code

Step-3: Create a Linear Regression Model with one Predictor

# Create and fit a simple linear regression model with one predictor (StudyHours)
model_simple = LinearRegression()
model_simple.fit(X[['StudyHours']], y)
y_pred_simple = model_simple.predict(X[['StudyHours']])

# Calculate R-squared for the simple model
mse_simple = mean_squared_error(y, y_pred_simple)
r_squared_simple = 1 - (mse_simple / np.var(y))

# Calculate Adjusted R-squared for the simple model
n = len(y)
p_simple = 1  # Number of predictors in the simple model
adjusted_r_squared_simple = 1 - (1 - r_squared_simple) * (n - 1) / (n - p_simple - 1)

# Print R-squared and Adjusted R-squared values for both models
print("Simple Model:")
print(f"R-squared: {r_squared_simple:.4f}")
print(f"Adjusted R-squared: {adjusted_r_squared_simple:.4f}\n")
Copy code

Output

Step-4: Create a Linear Regression Model with one Predictor

# Create and fit a more complex linear regression model with two predictors (StudyHours and Extracurricular)
model_complex = LinearRegression()
model_complex.fit(X, y)
y_pred_complex = model_complex.predict(X)

# Calculate R-squared for the complex model
mse_complex = mean_squared_error(y, y_pred_complex)
r_squared_complex = 1 - (mse_complex / np.var(y))

# Calculate Adjusted R-squared for the complex model
p_complex = 2  # Number of predictors in the complex model
adjusted_r_squared_complex = 1 - (1 - r_squared_complex) * (n - 1) / (n - p_complex - 1)

print("Complex Model:")
print(f"R-squared: {r_squared_complex:.4f}")
print(f"Adjusted R-squared: {adjusted_r_squared_complex:.4f}")
Copy code

Output

Explnation

From the above, we get the value of R-square and adjusted r-squared increases significantly with the addition of one more variable (“Extracurricular”). This implies that the added variable has some correlation with the predictor and the target variable.

Difference Between R-Squared and Adjusted R-Squared

Parameter	R-Squared	Adjusted R-Squared
Definition	Proportion of variance in the dependent variable explained by the independent variable(s).	R-Squared adjusted for the number of predictors in the model.
Value Range	Between 0 and 1.	Can be negative, but typically between 0 and 1.
Response to Adding Predictors	Always increases or remains the same.	Can increase or decrease based on the usefulness of the added predictor.
Purpose	Measures overall goodness of fit.	Measures goodness of fit while accounting for model complexity.
Calculation	R-Squared = 1- (SSR/SST)	Adjusted R-Squared = 1- [(1 – R²) (n – 1)/ (n – k – 1)]
Best for	Simple linear regression with one predictor.	Multiple regression models with several predictors.
Interpretation	Higher value indicates more variance explained by the model.	Higher value indicates a better fit, especially when comparing models with different numbers of predictors.

About the Author

Vikram Singh

How to Calculate Adjusted R-Squared

Table of Content

Best-suited Machine Learning courses for you

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

MCA with specialization in Machine Learning & Artificial Intelligence (ML & AI)

MCA in Machine Learning

Advance Certification in Applied Data Science, Machine Learning & IoT

Professional Certificate Course In Generative AI And Machine Learning

IIT Roorkee - Post Graduate Certificate Program in Data Science & Machine Learning (Online)

Data Science & Machine Learning Course

M.Sc. in Machine Learning and AI

Full Stack Machine Learning & AI Program

IIT Roorkee & Wiley Post Graduate Certification in AI for BFSI

What is R-Squared?

Limitations of R-Squared

Why we need adjusted R-Squared?

Adjusted R-Squared Formula

Interpretation of Adjusted R-Squared Formula

How to Calculate the Adjusted R-Squared?

Difference Between R-Squared and Adjusted R-Squared

Top Picks & New Arrivals