Difference between Linear and Nonlinear Regression – Shiksha Online

5 mins read7.1K Views Comment

Updated on Mar 16, 2022 09:52 IST

The article covers the difference between Linear and Nonlinear regression algorithms

As a Data Scientist, you’re very likely to perform regression analysis while building predictive models based on numerical data. Regression helps you determine the relationships between the data features and having a clear understanding of it would allow you to choose the correct model that deploys the best possible solution.

Regression algorithms fall under the umbrella of Supervised Learning Algorithms that use labeled data (aka training datasets) to train models to predict outcomes as accurately as possible. In regression, all such models will have the same basic form, i.e., y = f(x).

The Regression algorithms can be divided into linear and non-linear types. We are going to understand the application of both in this article and also explore the difference between linear and nonlinear regression.

We will cover the following sections:

Defining Regression Problems
Overview of Linear Regression
Linear Regression Model Demonstrated
Overview of Non-linear Regression
Non-linear Regression Model Demonstrated
Conclusion

Defining Regression Problems

Regression techniques predict continuous numerical outcome variables based on the independent variable(s). For example, temperature prediction can be a type of regression problem – predicting the outside temperature in degrees Celsius based on the data recorded for previous days.

Few more examples of regression problems –

Price of a liter of petrol
Value of a stock
The popularity of a newly released album
Sales revenue generated by a business

Recommended online courses

Best-suited Machine Learning courses for you

Learn Machine Learning with these high-rated online courses

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

Amity OnlineDegree

Total Fees

₹1.7 L

Duration

2 years

MCA in Machine Learning Online

Amity OnlineDegree

Total Fees

₹2.5 L

Duration

2 years

MCA in Machine Learning

Amity University Online, NoidaDegree

Total Fees

₹2.5 L

Duration

2 years

Advance Certification in Applied Data Science, Machine Learning & IoT

IIT GuwahatiCertificate

4.0

Total Fees

₹95 K

Duration

9 months

Professional Certificate Course In Generative AI And Machine Learning

IIT KanpurCertificate

Total Fees

₹1.53 L

Duration

11 months

IIT Roorkee - Post Graduate Certificate Program in Data Science & Machine Learning (Online)

TimesProCertificate

4.0

Total Fees

₹2 L

Duration

10 months

Data Science & Machine Learning Course

Coding NinjasCertificate

4.8

Total Fees

₹34.65 K

Duration

11 months

M.Sc. in Machine Learning and AI

upGradDegree

Total Fees

₹5.6 L

Duration

18 months

IIT Roorkee & Wiley Post Graduate Certification in AI for BFSI

IIT RoorkeeCertificate

Total Fees

– / –

Duration

6 months

Full Stack Machine Learning & AI Program

Jigsaw AcademyCertificate

Total Fees

– / –

Duration

8 hours

Overview of Linear Regression

Regression algorithms attempt to approximate a mapping function ‘f‘ based on the existing input data such that when new data ‘x‘ is fed to the model, the numerical or continuous output ‘y‘ can be predicted as accurately as possible.

When dealing with linear regression problems, our goal is to find the best fit line for our data such that the equation y = f(x) becomes linear:

This line, as you can see above, is called the regression line, with slope ‘m‘ and intercept ‘c‘ which are called the regression coefficients. You can learn more about linear regression models in machine learning here.

Linear Regression Model Demonstrated

Let’s take an example scenario – You have been provided with the data on China’s GDP. We are going to predict the GDP value (y) based on the years (x). You can find the dataset used in this example here.

Let’s display our dataset:

mport pandas as pd
 
#Read the dataset
data=pd.read_csv(china_gdp.csv')
 
#Display the first five rows
print(data.head())
Copy code

As you can see, the GDP values look like either logistic or exponential functions. Let’s visualize the given data through a scatter plot for easier interpretation:

X = data['Year']
y = data['Value']
 
import matplotlib.pyplot as plt
%matplotlib inline
 
plt.figure(figsize=(8,5))
plt.scatter(X, y)
plt.ylabel('GDP Value')
plt.xlabel('Year')
plt.title('GDP of China')
Copy code

Linear Regression Model Demonstrated graph

Plot a regression line

We can see that the data is not following a linear trend, right? Let’s plot a regression line through this graph:

m, b = np.polyfit(X, y, 1)
plt.plot(X, m*X+b, c='r')
plt.show()
Copy code

From the above scatter plot, it is clearly visible that the linear regression line is not doing justice for our dataset. For scenarios like these, where the data shows a curvy trend, the linear regression will not produce accurate results.

Let’s anyway build a linear regression model and get the prediction values:

from sklearn.model_selection import train_test_split
 
#Divide the dataset into independent and dependent variables
X=data.iloc[:,:-1]
y=data.iloc[:,-1]
 
#Split the data into training and testing set
train_X,test_X,train_y,test_y=train_test_split(X,y,test_size=0.2, shuffle=True) #Data was splitted as 80% train data and 20% test data.
 
train_y = train_y.values.reshape(-1,1)
test_y = test_y.values.reshape(-1,1)
 
from sklearn.linear_model import LinearRegression
regr = LinearRegression()
regr.fit(train_X,train_y)
 
y_pred = regr.predict(test_X)
print('Predictions for test data:', y_pred)
Copy code

Evaluate the model:

#Evaluate the linear regression model
from sklearn.metrics import r2_score,mean_squared_error
print("r2 score:", r2_score(y_pred,test_y))
print("mean absolute error:", mean_squared_error(y_pred,test_y))
Copy code

Now, let’s try properly fitting the regression line (curve, actually) to our non-linear data, shall we?

Overview of Non-linear Regression

Non-linear Regression algorithms, as their name suggests, model a non-linear relationship between the dependent (outcome) and independent (predictor) variable(s). They are generally used for predicting growth rates over a period of time.

Essentially any relationship that is not linear can be termed as non-linear, and is usually represented by the polynomial of ‘k‘ degrees (maximum power of x):

Where, a, b, c, and d are the Model’s Coefficients or Parameters.

Non-linear regression modeling is more complicated than linear regression modeling because the mapping function ‘f‘, called the sigmoid function here, is created through a series of approximations or iterations.

When f(x) is non-linear, it could involve:

Exponential functions:

Logarithmic functions:

Quadratic functions:

Sigmoid/Logistic functions:

Non-linear Regression Model Demonstrated

Now let’s come back to our example – from an initial look at the above plot, we can see that the GDP growth is slow during the initial years and suddenly increases after the year 1990.

So, we can determine that the logistic function could be a good approximation as illustrated below:

X = np.arange(-5,5.0, 0.1)
Y = 1.0 / (1.0 + np.exp(-X))
 
plt.figure(figsize=(8,5))
plt.plot(X,Y) 
plt.ylabel('Dependent Variable')
plt.xlabel('Indepdendent Variable')
plt.show()
Copy code

Non-linear Regression Model Demonstrated

Now, let’s build our non-linear regression model:

#Build the model
def sigmoid(x, Beta_1, Beta_2):
     y = 1 / (1 + np.exp(-Beta_1*(x-Beta_2)))
     return y
 
#Fit a sample sigmoid line to the data
beta_1 = 0.10
beta_2 = 1990.0
 
#logistic function
Y_pred = sigmoid(x_data, beta_1 , beta_2)
 
plt.figure(figsize=(8,5))
 
#plot initial prediction against datapoints
plt.plot(x_data, Y_pred*15000000000000.)
plt.plot(x_data, y_data, 'ro')
Copy code

Let’s normalize the data and find the best parameters for our model.

The curve_fit() function uses non-linear least squares to fit our sigmoid function to the data:

#Normalize the data
xdata =x_data/max(x_data)
ydata =y_data/max(y_data)
 
#Find best parameters to fit the line
from scipy.optimize import curve_fit
popt, pcov = curve_fit(sigmoid, xdata, ydata)
#print the final parameters
print(" beta_1 = %f, beta_2 = %f" % (popt[0], popt[1]))
Copy code

Let’s plot our model:

#Plot the regression model
x = np.linspace(1960, 2015, 55)
x = x/max(x)
plt.figure(figsize=(8,5))
y = sigmoid(x, *popt)
plt.plot(xdata, ydata, 'ro', label='data points')
plt.plot(x,y, linewidth=3.0, label='fitted line')
plt.legend(loc='best')
plt.ylabel('GDP')
plt.xlabel('Year')
plt.show()
Copy code

As we can see, our sigmoid curve now fits the data well.

Let’s build our model accordingly and predict the outcome variable:

#Split the data into training and testing set
msk = np.random.rand(len(data)) < 0.8
X_train = xdata[msk]
X_test = xdata[~msk]
y_train = ydata[msk]
y_test = ydata[~msk]
 
#Build the model using training set
popt, pcov = curve_fit(sigmoid, X_train, y_train)
 
#Predict using testing set
y_pred2 = sigmoid(X_test, *popt)
print('Predictions for test data:', y_pred2)
Copy code

Evaluate the model:

#Evaluate the non-linear regression model
from sklearn.metrics import r2_score
print("r2 score: %.2f" % r2_score(y_pred2 , y_test) )
print("mean absolute error: %.2f" % np.mean(np.absolute(y_pred2 - y_test)))
Copy code

From the above output, we can clearly conclude that the overall R Square value has increased to 0.95 with a minimized mean absolute error.

Endnotes

Regression algorithms are instrumental in solving Machine Learning problems. Knowing the different varieties of regression techniques to implement on your dataset would help you attain good accuracy with a minimum error rate. Hope this article on the difference between linear and nonlinear regression helped you understand the concepts better. Artificial Intelligence & Machine Learning is an increasingly growing domain that has hugely impacted big businesses worldwide. Interested in being a part of this frenzy? Explore related articles here.

About the Author

Shiksha Online

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio

Difference between Linear and Nonlinear Regression – Shiksha Online

Defining Regression Problems

Best-suited Machine Learning courses for you

Master of Computer Applications with specialization in Machine Learning and Artificial Intelligence (Online MCA)

MCA in Machine Learning Online

MCA in Machine Learning

Advance Certification in Applied Data Science, Machine Learning & IoT

Professional Certificate Course In Generative AI And Machine Learning

IIT Roorkee - Post Graduate Certificate Program in Data Science & Machine Learning (Online)

Data Science & Machine Learning Course

M.Sc. in Machine Learning and AI

IIT Roorkee & Wiley Post Graduate Certification in AI for BFSI

Full Stack Machine Learning & AI Program

Overview of Linear Regression

Linear Regression Model Demonstrated

Let’s display our dataset:

Plot a regression line

Evaluate the model:

Overview of Non-linear Regression

Non-linear Regression Model Demonstrated

Now, let’s build our non-linear regression model:

Let’s normalize the data and find the best parameters for our model.

Let’s plot our model:

Endnotes

Top Picks & New Arrivals