Difference Between Linear and Multiple Regression

5 mins read10.3K Views Comment

Assistant Manager - Content

Updated on Sep 16, 2024 11:58 IST

Linear regression examines the relationship between one predictor and an outcome, while multiple regression delves into how several predictors influence that outcome. Both are essential tools in predictive analytics, but knowing their differences ensures effective and accurate modelling. Dive in to discover the core distinctions and when to use each approach.

Table of Content

Difference Between Linear and Multiple Regression: Linear Regression vs. Multiple Regression
What is Linear Regression?
What is Multiple Regression?
Example of Linear and Multiple Regression

Recommended online courses

Best-suited Mlops courses for you

Learn Mlops with these high-rated online courses

MLOps (Machine Learning Operations) Fundamentals

CourseraCertificate

4.7

Total Fees

Free

Duration

16 hours

Cloud Machine Learning Engineering and MLOps

Duke UniversityCertificate

Total Fees

Free

Duration

12 hours

MLOps Platforms: Amazon SageMaker and Azure ML

Duke UniversityCertificate

Total Fees

Free

Duration

12 hours

DevOps, DataOps, MLOps

Duke UniversityCertificate

Total Fees

Free

Duration

26 hours

MLOps | Machine Learning Operations Specialization

Duke UniversityCertificate

Total Fees

Free

Duration

6 months

Difference Between Linear Regression and Multiple Regression: Linear Regression vs Multiple Regression

Parameter	Linear (Simple) Regression	Multiple Regression
Definition	Models the relationship between one dependent and one independent variable.	Models the relationship between one dependent and two or more independent variables.
Equation	Y = C₀ + C₁X + e	Y = C₀ + C₁X₁ + C₂X₂ + C₃X₃ + ….. + C_nX_n + e
Complexity	It is simpler to deal with one relationship.	More complex due to multiple relationships.
Use Cases	Suitable when there is one clear predictor.	Suitable when multiple factors affect the outcome.
Assumptions	Linearity, Independence, Homoscedasticity, Normality	Same as linear regression, with the added concern of multicollinearity.
Visualization	Typically visualized with a 2D scatter plot and a line of best fit.	Requires 3D or multi-dimensional space, often represented using partial regression plots.
Risk of Overfitting	Lower, as it deals with only one predictor.	Higher, especially if too many predictors are used without adequate data.
Multicollinearity Concern	Not applicable, as there’s only one predictor.	A primary concern; having correlated predictors can affect the model’s accuracy and interpretation.
Applications	Basic research, simple predictions, understanding a singular relationship.	Complex research, multifactorial predictions, studying interrelated systems.

What is Linear Regression?

Linear regression is a statistical method used to model the relationship between a dependent variable and one independent variable. It aims to establish a linear relationship between these variables and can be used for both prediction and understanding the nature of the relationship.

Mathematical Equation

The mathematical representation of simple linear regression is:

Y = C₀ + C₁X + e

where,

Y: Dependent Variable (target variable)
X: Independent Variable (input variable)
C₀: Intercept (value of Y when X=0)
C₁: Slope of line
e: Error term

Assumptions of Linear Regression

Here are some assumption that must be satisfied for the linear regression model to be valid.

Linearity: The relationship between the independent and dependent variables should be linear.
Independence: Observations should be independent of each other.
Homoscedasticity: The variance of the errors should be the same across all levels of the independent variables.
Normality: The dependent variable is normally distributed for a fixed value of the independent variable.
No Multicollinearity: This is more pertinent for multiple regression, where all independent variables should be independent.

Limitations of Linear Regression

Outliers: This can significantly impact the slope and intercept of the regression line.
Non-linearity: Linear regression assumes a linear relationship, but this assumption may sometimes not hold.
Correlation ≠ Causation: Just because two variables have a linear relationship doesn’t mean changes in one cause changes in the other.

What is Multiple Regression?

Multiple regression is an extension of simple linear regression. It models the relationship between one dependent variable and two or more independent variables. The primary purpose is to understand how the dependent variable changes as the independent variables change.

Mathematical Equation

The mathematical representation of multiple regression is:

Y = C₀ + C₁X₁ + C₂X₂ + C₃X₃ + ….. + C_nX_n + e

where,

Y: Dependent Variable (target variable)
X₁, X₂, X₃,…, X_n: Independent Variable (input variable)
C₀: Intercept (value of Y when X=0)
C₁, C₂, C₃, C₄, C₅, …., C_n: Slope of line
e: Error term

Assumptions of Multiple Regression

Linearity: A linear relationship exists between the dependent and independent variables.
Independence: Observations are independent of each other.
No multicollinearity: Independent variables aren’t too highly correlated with each other.
Homoscedasticity: Constant variance of the errors.
No Autocorrelation: The residuals (errors) are independent.
Normality: The dependent variable is normally distributed for any fixed value of the independent variables.

Limitations of Multiple Regression

Overfitting: Including too many independent variables can lead to a model that fits the training data too closely.
Omitted Variable Bias: Leaving out a significant independent variable can bias the coefficients of other variables.
Endogeneity occurs when an independent variable is correlated with the error term, leading to biased coefficient estimates.

Until now, you clearly understand what linear and multiple regression are, their mathematical equations, assumption, and their limitations. You also have a better understanding of how linear regression and multiple regression are different from each other. Now it’s time for an example that will give you an idea of calculating the value of linear and multiple regression using Python.

Example of Linear and Multiple Regression

Problem Statement: Suppose we have data for a retail company. The company wants to understand how their advertising expenses in various channels (e.g., TV, Radio) impact sales.

Linear Regression: Predict sales using only TV advertising expenses.
Multiple Regression: Predict sales using both TV and Radio advertising expenses.

Step-1: Generate a random dataset

import numpy as np
import pandas as pd

# Sample data generation
np.random.seed(0)
tv = 100 + 50 * np.random.rand(100)
radio = 50 + 25 * np.random.rand(100)
sales = 200 + 3*tv + 1.5*radio + 30*np.random.randn(100)

data = pd.DataFrame({'TV': tv, 'Radio': radio, 'Sales': sales})

# show the first five results
data.head()
Copy code

Output

Step-2: Split the dataset into training and test dataset

#split the data into training and testing sets

from sklearn.model_selection import train_test_split

train, test = train_test_split(data, test_size=0.2, random_state=0)
Copy code

Step-3: Evaluating Mean Squared Error for Linear Regression

#Linear Regression
# Using only TV expenses for prediction

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

X_train_tv = train[['TV']]
y_train = train['Sales']
X_test_tv = test[['TV']]
y_test = test['Sales']

linear_model = LinearRegression().fit(X_train_tv, y_train)
linear_pred = linear_model.predict(X_test_tv)

# Evaluation
linear_rmse = np.sqrt(mean_squared_error(y_test, linear_pred))
Copy code

Step-4: Evaluating Mean Squared Error for Multiple Regression

#Multiple Regression
# Using both TV and Radio expenses for prediction
X_train_multi = train[['TV', 'Radio']]
X_test_multi = test[['TV', 'Radio']]

multiple_model = LinearRegression().fit(X_train_multi, y_train)
multiple_pred = multiple_model.predict(X_test_multi)

# Evaluation
multiple_rmse = np.sqrt(mean_squared_error(y_test, multiple_pred))
Copy code

Step-5: Print the results

# Error Metrics
print(f"Linear Regression RMSE: {linear_rmse:.2f}")
print(f"Multiple Regression RMSE: {multiple_rmse:.2f}")
Copy code

Output

Linear Regression RMSE: 27.18

Multiple Regression RMSE: 25.27

Explanation

From the above result, we have the value of RMSE for linear regression is greater than the RMSE value for multiple regression. This implies multiple regression gives a better fit to the data.
Typically adding more relevant predictors (features) can enhance a model’s performance, but you must be cautious about overfitting. Also, if the features are correlated, it can introduce multi-collinearity.

Now, let’s see how the plots of linear and multiple regression looks like:

Linear Regression

# For Linear Regression
plt.scatter(X_test_tv, y_test, color='blue', label='True values')
plt.scatter(X_test_tv, linear_pred, color='red', label='Predicted values')
plt.xlabel('TV Expenses')
plt.ylabel('Sales')
plt.title('Linear Regression: TV vs Sales')
plt.legend()
plt.show()

# Error Metrics
print(f"Linear Regression RMSE: {linear_rmse:.2f}")
print(f"Multiple Regression RMSE: {multiple_rmse:.2f}")
Copy code

Output

Multiple Regression

from mpl_toolkits.mplot3d import Axes3D

# Setting up the 3D plot
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')

# Scatter plot of actual data
ax.scatter(train['TV'], train['Radio'], train['Sales'], color='blue', marker='o', alpha=0.5, label='True values')

# Creating a meshgrid for the plane
x_surf = np.linspace(train['TV'].min(), train['TV'].max(), 100)
y_surf = np.linspace(train['Radio'].min(), train['Radio'].max(), 100)
x_surf, y_surf = np.meshgrid(x_surf, y_surf)

# Predicting the values from the meshed grid
vals = pd.DataFrame({'TV': x_surf.ravel(), 'Radio': y_surf.ravel()})
predicted_sales = multiple_model.predict(vals)
ax.plot_surface(x_surf, y_surf, predicted_sales.reshape(x_surf.shape), color='None', alpha=0.3)

# Labeling the axes
ax.set_xlabel('TV Expenses')
ax.set_ylabel('Radio Expenses')
ax.set_zlabel('Sales')
ax.set_title('Multiple Regression: Sales predicted by TV and Radio Expenses')
ax.legend()

plt.show()
Copy code

Output

About the Author

Vikram Singh

Assistant Manager - Content

Vikram has a Postgraduate degree in Applied Mathematics, with a keen interest in Data Science and Machine Learning. He has experience of 2+ years in content creation in Mathematics, Statistics, Data Science, and Mac... Read Full Bio

Difference Between Linear and Multiple Regression

Table of Content

Best-suited Mlops courses for you

MLOps (Machine Learning Operations) Fundamentals

Cloud Machine Learning Engineering and MLOps

MLOps Platforms: Amazon SageMaker and Azure ML

DevOps, DataOps, MLOps

MLOps | Machine Learning Operations Specialization

Difference Between Linear Regression and Multiple Regression: Linear Regression vs Multiple Regression

What is Linear Regression?

Mathematical Equation

Assumptions of Linear Regression

Limitations of Linear Regression

What is Multiple Regression?

Mathematical Equation

Assumptions of Multiple Regression

Limitations of Multiple Regression

Example of Linear and Multiple Regression

Linear Regression

Multiple Regression

Top Picks & New Arrivals