R-Squared vs. Adjusted R-Squared

R-Squared vs. Adjusted R-Squared

5 mins read5.9K Views Comment
Vikram
Vikram Singh
Assistant Manager - Content
Updated on Aug 23, 2023 15:01 IST

Adjusted r squared is similar to r-squared and measures the variation in the target variable. Still, unlike r-squared, it takes only those independent variables with some significance and penalizes you for adding features that are not significant for predicting the dependent variable. In this article we will briefly discuss the difference between r-squared and adjusted r-squared with the help of examples.

2022_11_MicrosoftTeams-image-81-1.jpg

When we implement a linear regression algorithm to create any model, we use r-squared to know how well the linear regression model fits the data. The previous article briefly discusses how to calculate R Squared in Linear Regression. But the R Squared has its limitations.

In the case of multivariate linear regression, if you keep adding different variables, the value of the r-square will either remain the same or increase, irrespective of the significance of the variable.

So, how to deal with this?

Here, adjusted R-Squared comes into the picture. Adjusted r-squared calculates the R squared from only those variables whose addition in the model is significant. It also penalizes you for adding variables that do not improve the existing model.

So, let’s dive deep to learn more about adjusted r-squared and how it differs from r-squared.

Table of Content

Recommended online courses

Best-suited Machine Learning courses for you

Learn Machine Learning with these high-rated online courses

2.5 L
2 years
2.5 L
2 years
1.53 L
11 months
34.65 K
11 months
5.6 L
18 months
– / –
8 hours
– / –
6 months

What is adjusted r squared?

Adjusted r squared is similar to r-squared and measures the variation in the target variable (or dependent variable). Still, unlike r-squared, it takes only those independent variables with some significance and penalizes you for adding features (independent variables) that are not significant for predicting the dependent variable.
It helps to determine whether adding or removing any feature improves the existing model.

Programming Online Courses and Certification Python Online Courses and Certifications
Data Science Online Courses and Certifications Machine Learning Online Courses and Certifications

Adjusted r-squared Formula

The mathematical Formula of adjusted r-squared uses r-square, so let’s check out the Formula of r – squared:

2022_11_image-62.jpg

where:

SSR is the sum of squared residuals (i.e., the sum of squared errors)

SST is the total sum of squares (i.e., the sum of squared deviations from the mean)

Now, the mathematical formula adjusted r-squared is:

2022_11_image-63.jpg

where

n: number of data points in the dataset

k: number of independent variables

R2: r-squared value of the dataset

Overfitting and Underfitting with a real-life example
Overfitting and Underfitting with a real-life example
Regression Analysis in Machine Learning
Regression Analysis in Machine Learning
In this article, we will discuss Regression analysis in Machine Learning which is one of the  important concepts used in building machine learning models.
Overfitting in machine learning:Python code example
Overfitting in machine learning:Python code example
Overfitting is a situation that happens when a machine learning algorithm becomes too specific to the training data and doesn’t generalize well to new data.This article focus on Overfitting in...read more

Important Result from the Formula of adjusted r-squared

  1. The value of the adjusted r-squared decreases when the number of independent variables doesn’t significantly increase in the value of the r-squared.
  2. The value of adjusting r-squared increases/decreases when the number of variables and the value of r-squared increases.

Note 

  • It is advised to use adjusted r squared while using linear regression for multivariable.
  • Adding a non-significant variable in the model increases the difference between r-squared and adjusted r-squared.
  • The value of the r-squared can’t be less than zero, but the value of the adjusted r-squared can be negative.
What is Programming What is Python
What is Data Science What is Machine Learning

Calculate r-squared and adjusted r-squared

Example -1: Calculate r-square and adjusted r-squared manually.

Let’s take the same dataset we have taken in calculating r-squared.

xi yi
11 90
10 45
2 19
8 35
4 25
20 80
1 2
9 3
5 33
2023_06_r-squared.jpg

Such that,

SSR = 1681.24 and SST = 6408.89

Now calculating R-squared, 

R2 = 1 – (SSR/SST) = 1 – (1681.24 / 6408.89) = 1-0.26233 = 0.73767

Hence, R2 = 0.73767

Now, let’s take we have three independent variables, i.e. k = 3 and here n = 9.

So, adjusted R2 will be

2022_11_image-65.jpg

Hence, the value of the adjusted r – squared for the given dataset is 0.584, which is very less than the corresponding r-squared value.

Now, we know how to calculate the adjusted R2 square mathematically, so for the next example we will take a dataset (car model dataset) and check the difference between the values of r-squared and adjusted r-squared at different number of independent variable.

Example -2: Compare the value of r-squared and adjusted r-squared on different number of independent variables.

1. Import the Dataset

 
#Import libraries
import pandas as pd
import numpy as np
#import dataset
mt = pd.read_csv('mtcars.csv')
mt.head()
Copy code
2022_11_image-66.jpg

2. Calculate values for one independent variable

 
#import statsmodel.api to calculate r- square dnd adjacent r-squared
import statsmodels.api as sm
#take only one features at a time : mpg
x_opt = mt.iloc[:, 1:2]
y = mt.hp
regressor_OLS = sm.OLS(endog = y, exog = x_opt).fit()
regressor_OLS.summary()
Copy code
2022_11_image-67.jpg

3. Calculate values for two independent variables

 
#import statsmodel.api to calculate r- square dnd adjacent r-squared
import statsmodels.api as sm
#take only two features at a time : mpg and cyl
x_opt = mt.iloc[:, 1:3]
y = mt.hp
regressor_OLS = sm.OLS(endog = y, exog = x_opt).fit()
regressor_OLS.summary()
Copy code
2022_11_image-68.jpg

4. Calculate values for three independent variables

 
#import statsmodel.api to calculate r- square dnd adjacent r-squared
import statsmodels.api as sm
#take only three features at a time : mpg, cyl and disp
x_opt = mt.iloc[:, 1:4]
y = mt.hp
regressor_OLS = sm.OLS(endog = y, exog = x_opt).fit()
regressor_OLS.summary()
Copy code
2022_11_image-69.jpg

Now, combining the results of all three, we will get:

2022_11_image-70.jpg

Difference between r-squared and adjusted r-squared

r-squared adjusted r-squared
Definition It measures the proportion of the variation in your dependent variable explained by all of your independent variables in the model. It measures the proportion of variation explained by only those independent variables that really help in explaining the dependent variable. 
When to use Simple Linear Regression. Single and Multiple Linear Regression both.
Value The value of the r-squared always increases with the addition of independent variables.Value ranges between 0 to 1. The Value of the adjusted r-squared may increase or decrease depending on the significance of the independent variable.Value of the adjusted r-squared can be negative when the value of the r-squared is very close to zero.
Formula

Conclusion

In this article, we have briefly discussed what is adjusted r-squared, how to calculate adjusted -squared and what’s the difference between r-squared and adjusted r-squared.

Hope, this article will clear your all the doubts related to the r-squared and adjusted r-squared and when to use them.

Adjusted r squared is similar to r-squared and measures the variation in the target variable (or dependent variable). Still, unlike r-squared, it takes only those independent variables with some significance and penalizes you for adding features (independent variables) that are not significant for predicting the dependent variable.

Related Reads

How to Generate Random Numbers in Python?
How to Generate Random Numbers in Python?
In Data Science or Statistics, there are various instances where a programmer might need to work with random inputs. Python offers many predefined functions to generate and use random data....read more
Python Program to Check Leap Year
Python Program to Check Leap Year
Do you want to know how to check leap year using python? Then this article will provide you three methods to find it out.Do you want to know how to...read more
Find the Second Occurrence of a Substring in Python String
Find the Second Occurrence of a Substring in Python String
During Python programming, you will come across instances where you would need to find the second occurrence of a substring in Python a given string. For example, finding the second...read more
How to Check if a Python String is a Palindrome
How to Check if a Python String is a Palindrome
A palindrome is a sequence of the word, phrases, or numbers that read the same backward as forward. In this article, we will discuss different methods how to check if...read more
Conditional Statements in Python – Python Tutorial
Conditional Statements in Python – Python Tutorial
A conditional statement as the name suggests itself, is used to handle conditions in your program. These statements guide the program while making decisions based on the conditions encountered by...read more
Data Science Interview Questions and Answers for 2024
Data Science Interview Questions and Answers for 2024
So, if you plan to become a data scientist, you need to prepare well and create a fabulous impression on prospective employers with your knowledge. This write-up provides some important...read more
50+ Machine Learning Interview Questions and Answers
50+ Machine Learning Interview Questions and Answers
Are you ready to take the machine learning world by storm? Then let’s start with nailing your interview! In this blog, we’ll help you unlock the secrets to crushing those...read more
Statistics Interview Questions for Data Scientists
Statistics Interview Questions for Data Scientists
In this article, Statistics Interview Questions for Data Scientists are listed. It starts with defining Statistics and ends with describing Empirical Rule.
Coding Interview Question
Coding Interview Question
For any technical job interview, there must be a technical interview round. In that round you will be asked questions related to different data structure and algorithm, and your coding...read more
Top Data Engineer Interview Questions and Answers
Top Data Engineer Interview Questions and Answers
Data engineering is one of the highest in-demand job profiles. If you are someone looking out to start a career in data engineering or want to switch careers to become...read more
100+ SQL Interview Questions and Answers for 2023
100+ SQL Interview Questions and Answers for 2023
Here’s the list of top SQL interview questions. Apart from guiding you in your interviews, the detailed answers provided in this article will give you a basic understanding of different...read more

FAQs

What is Adjusted r-squared?

Adjusted r squared is similar to r-squared and measures the variation in the target variable (or dependent variable). Still, unlike r-squared, it takes only those independent variables with some significance and penalizes you for adding features (independent variables) that are not significant for predicting the dependent variable.

What are some of the important result from the formula of adjusted r-squared?

The value of the adjusted r-squared decreases when the number of independent variables doesnu2019t significantly increase in the value of the r-squared. The value of adjusting r-squared increases/decreases when the number of variables and the value of r-squared increases.

What is r-squared?

It measures the proportion of the variation in your dependent variable explained by all of your independent variables in the model.

When to use r-squared and when to use adjusted r-squared?

r-squared is used for linear regression, whereas adjusted r-squared is used for both Simple Linear Regression.

What is the range of values of both r-squared and adjusted r-squared?

The value of the r-squared always increases with the addition of independent variables. Value ranges between 0 to 1. The Value of the adjusted r-squared may increase or decrease depending on the significance of the independent variable. Value of the adjusted r-squared can be negative when the value of the r-squared is very close to zero.

About the Author
author-image
Vikram Singh
Assistant Manager - Content

Vikram has a Postgraduate degree in Applied Mathematics, with a keen interest in Data Science and Machine Learning. He has experience of 2+ years in content creation in Mathematics, Statistics, Data Science, and Mac... Read Full Bio