Difference between Linear and Nonlinear Regression – Shiksha Online

Difference between Linear and Nonlinear Regression – Shiksha Online

5 mins read7.1K Views Comment
Updated on Mar 16, 2022 09:52 IST

The article covers the difference between Linear and Nonlinear regression algorithms

2022_03_Linear-vs-Nonlinear-Regression.jpg

As a Data Scientist, you’re very likely to perform regression analysis while building predictive models based on numerical data. Regression helps you determine the relationships between the data features and having a clear understanding of it would allow you to choose the correct model that deploys the best possible solution.

Regression algorithms fall under the umbrella of Supervised Learning Algorithms that use labeled data (aka training datasets) to train models to predict outcomes as accurately as possible. In regression, all such models will have the same basic form, i.e., y = f(x).

The Regression algorithms can be divided into linear and non-linear types. We are going to understand the application of both in this article and also explore the difference between linear and nonlinear regression.

We will cover the following sections:

Defining Regression Problems

Regression techniques predict continuous numerical outcome variables based on the independent variable(s). For example, temperature prediction can be a type of regression problem – predicting the outside temperature in degrees Celsius based on the data recorded for previous days.

Defining Regression Problems

Few more examples of regression problems –

  • Price of a liter of petrol
  • Value of a stock
  • The popularity of a newly released album
  • Sales revenue generated by a business
Recommended online courses

Best-suited Machine Learning courses for you

Learn Machine Learning with these high-rated online courses

2.5 L
2 years
2.5 L
2 years
1.53 L
11 months
34.65 K
11 months
5.6 L
18 months
– / –
8 hours
– / –
6 months

Overview of Linear Regression

Regression algorithms attempt to approximate a mapping function fbased on the existing input data such that when new data xis fed to the model, the numerical or continuous output y can be predicted as accurately as possible.

Overview of Linear Regression

When dealing with linear regression problems, our goal is to find the best fit line for our data such that the equation y = f(x) becomes linear:

linear regression 1
linear regression graph

This line, as you can see above, is called the regression line, with slope mand intercept cwhich are called the regression coefficients. You can learn more about linear regression models in machine learning here.

Linear Regression Model Demonstrated

Let’s take an example scenario – You have been provided with the data on China’s GDP. We are going to predict the GDP value (y) based on the years (x). You can find the dataset used in this example here.

Let’s display our dataset:


 
mport pandas as pd
#Read the dataset
data=pd.read_csv(china_gdp.csv')
#Display the first five rows
print(data.head())
Copy code
Linear Regression Model Demonstrated

As you can see, the GDP values look like either logistic or exponential functions. Let’s visualize the given data through a scatter plot for easier interpretation: 


 
X = data['Year']
y = data['Value']
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(8,5))
plt.scatter(X, y)
plt.ylabel('GDP Value')
plt.xlabel('Year')
plt.title('GDP of China')
Copy code
Linear Regression Model Demonstrated graph

Plot a regression line

We can see that the data is not following a linear trend, right? Let’s plot a regression line through this graph:


 
m, b = np.polyfit(X, y, 1)
plt.plot(X, m*X+b, c='r')
plt.show()
Copy code
Plot a regression line

From the above scatter plot, it is clearly visible that the linear regression line is not doing justice for our dataset. For scenarios like these, where the data shows a curvy trend, the linear regression will not produce accurate results. 

Let’s anyway build a linear regression model and get the prediction values:


 
from sklearn.model_selection import train_test_split
#Divide the dataset into independent and dependent variables
X=data.iloc[:,:-1]
y=data.iloc[:,-1]
#Split the data into training and testing set
train_X,test_X,train_y,test_y=train_test_split(X,y,test_size=0.2, shuffle=True) #Data was splitted as 80% train data and 20% test data.
train_y = train_y.values.reshape(-1,1)
test_y = test_y.values.reshape(-1,1)
from sklearn.linear_model import LinearRegression
regr = LinearRegression()
regr.fit(train_X,train_y)
y_pred = regr.predict(test_X)
print('Predictions for test data:', y_pred)
Copy code
get the prediction values

Evaluate the model:


 
#Evaluate the linear regression model
from sklearn.metrics import r2_score,mean_squared_error
print("r2 score:", r2_score(y_pred,test_y))
print("mean absolute error:", mean_squared_error(y_pred,test_y))
Copy code

Now, let’s try properly fitting the regression line (curve, actually) to our non-linear data, shall we?

Overview of Non-linear Regression

Non-linear Regression algorithms, as their name suggests, model a non-linear relationship between the dependent (outcome) and independent (predictor) variable(s). They are generally used for predicting growth rates over a period of time.

Essentially any relationship that is not linear can be termed as non-linear, and is usually represented by the polynomial of kdegrees (maximum power of x):

Overview of Non-linear Regression

Where, a, b, c, and d are the Model’s Coefficients or Parameters.

Non-linear regression modeling is more complicated than linear regression modeling because the mapping function f, called the sigmoid function here, is created through a series of approximations or iterations.

When f(x) is non-linear, it could involve:

  • Exponential functions:
Exponential functions
  • Logarithmic functions:
Logarithmic functions:
  • Quadratic functions:
Quadratic functions:
  • Sigmoid/Logistic functions:
Sigmoid/Logistic functions:

Non-linear Regression Model Demonstrated

Now let’s come back to our example – from an initial look at the above plot, we can see that the GDP growth is slow during the initial years and suddenly increases after the year 1990. 

So, we can determine that the logistic function could be a good approximation as illustrated below:


 
X = np.arange(-5,5.0, 0.1)
Y = 1.0 / (1.0 + np.exp(-X))
plt.figure(figsize=(8,5))
plt.plot(X,Y)
plt.ylabel('Dependent Variable')
plt.xlabel('Indepdendent Variable')
plt.show()
Copy code
Non-linear Regression Model Demonstrated

Now, let’s build our non-linear regression model:


 
#Build the model
def sigmoid(x, Beta_1, Beta_2):
y = 1 / (1 + np.exp(-Beta_1*(x-Beta_2)))
return y
#Fit a sample sigmoid line to the data
beta_1 = 0.10
beta_2 = 1990.0
#logistic function
Y_pred = sigmoid(x_data, beta_1 , beta_2)
plt.figure(figsize=(8,5))
#plot initial prediction against datapoints
plt.plot(x_data, Y_pred*15000000000000.)
plt.plot(x_data, y_data, 'ro')
Copy code
build the non-linear regression model:

Let’s normalize the data and find the best parameters for our model.

The curve_fit() function uses non-linear least squares to fit our sigmoid function to the data:


 
#Normalize the data
xdata =x_data/max(x_data)
ydata =y_data/max(y_data)
#Find best parameters to fit the line
from scipy.optimize import curve_fit
popt, pcov = curve_fit(sigmoid, xdata, ydata)
#print the final parameters
print(" beta_1 = %f, beta_2 = %f" % (popt[0], popt[1]))
Copy code

Let’s plot our model:


 
#Plot the regression model
x = np.linspace(1960, 2015, 55)
x = x/max(x)
plt.figure(figsize=(8,5))
y = sigmoid(x, *popt)
plt.plot(xdata, ydata, 'ro', label='data points')
plt.plot(x,y, linewidth=3.0, label='fitted line')
plt.legend(loc='best')
plt.ylabel('GDP')
plt.xlabel('Year')
plt.show()
Copy code
Plot the model:

As we can see, our sigmoid curve now fits the data well.

Let’s build our model accordingly and predict the outcome variable:


 
#Split the data into training and testing set
msk = np.random.rand(len(data)) < 0.8
X_train = xdata[msk]
X_test = xdata[~msk]
y_train = ydata[msk]
y_test = ydata[~msk]
#Build the model using training set
popt, pcov = curve_fit(sigmoid, X_train, y_train)
#Predict using testing set
y_pred2 = sigmoid(X_test, *popt)
print('Predictions for test data:', y_pred2)
Copy code
build the model

Evaluate the model:


 
#Evaluate the non-linear regression model
from sklearn.metrics import r2_score
print("r2 score: %.2f" % r2_score(y_pred2 , y_test) )
print("mean absolute error: %.2f" % np.mean(np.absolute(y_pred2 - y_test)))
Copy code

From the above output, we can clearly conclude that the overall R Square value has increased to 0.95 with a minimized mean absolute error.

Endnotes

Regression algorithms are instrumental in solving Machine Learning problems. Knowing the different varieties of regression techniques to implement on your dataset would help you attain good accuracy with a minimum error rate. Hope this article on the difference between linear and nonlinear regression helped you understand the concepts better. Artificial Intelligence & Machine Learning is an increasingly growing domain that has hugely impacted big businesses worldwide. Interested in being a part of this frenzy? Explore related articles here.


Top Trending Articles:

Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is a Data Scientist | What is Data Analyst

About the Author

This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio