Difference between Linear and Nonlinear Regression – Shiksha Online
The article covers the difference between Linear and Nonlinear regression algorithms
As a Data Scientist, you’re very likely to perform regression analysis while building predictive models based on numerical data. Regression helps you determine the relationships between the data features and having a clear understanding of it would allow you to choose the correct model that deploys the best possible solution.
Regression algorithms fall under the umbrella of Supervised Learning Algorithms that use labeled data (aka training datasets) to train models to predict outcomes as accurately as possible. In regression, all such models will have the same basic form, i.e., y = f(x).
The Regression algorithms can be divided into linear and non-linear types. We are going to understand the application of both in this article and also explore the difference between linear and nonlinear regression.
We will cover the following sections:
- Defining Regression Problems
- Overview of Linear Regression
- Linear Regression Model Demonstrated
- Overview of Non-linear Regression
- Non-linear Regression Model Demonstrated
- Conclusion
Defining Regression Problems
Regression techniques predict continuous numerical outcome variables based on the independent variable(s). For example, temperature prediction can be a type of regression problem – predicting the outside temperature in degrees Celsius based on the data recorded for previous days.
Few more examples of regression problems –
- Price of a liter of petrol
- Value of a stock
- The popularity of a newly released album
- Sales revenue generated by a business
Best-suited Machine Learning courses for you
Learn Machine Learning with these high-rated online courses
Overview of Linear Regression
Regression algorithms attempt to approximate a mapping function ‘f‘ based on the existing input data such that when new data ‘x‘ is fed to the model, the numerical or continuous output ‘y‘ can be predicted as accurately as possible.
When dealing with linear regression problems, our goal is to find the best fit line for our data such that the equation y = f(x) becomes linear:
This line, as you can see above, is called the regression line, with slope ‘m‘ and intercept ‘c‘ which are called the regression coefficients. You can learn more about linear regression models in machine learning here.
Linear Regression Model Demonstrated
Let’s take an example scenario – You have been provided with the data on China’s GDP. We are going to predict the GDP value (y) based on the years (x). You can find the dataset used in this example here.
Let’s display our dataset:
mport pandas as pd #Read the datasetdata=pd.read_csv(china_gdp.csv') #Display the first five rowsprint(data.head())
As you can see, the GDP values look like either logistic or exponential functions. Let’s visualize the given data through a scatter plot for easier interpretation:
X = data['Year']y = data['Value'] import matplotlib.pyplot as plt%matplotlib inline plt.figure(figsize=(8,5))plt.scatter(X, y)plt.ylabel('GDP Value')plt.xlabel('Year')plt.title('GDP of China')
Plot a regression line
We can see that the data is not following a linear trend, right? Let’s plot a regression line through this graph:
m, b = np.polyfit(X, y, 1)plt.plot(X, m*X+b, c='r')plt.show()
From the above scatter plot, it is clearly visible that the linear regression line is not doing justice for our dataset. For scenarios like these, where the data shows a curvy trend, the linear regression will not produce accurate results.
Let’s anyway build a linear regression model and get the prediction values:
from sklearn.model_selection import train_test_split #Divide the dataset into independent and dependent variablesX=data.iloc[:,:-1]y=data.iloc[:,-1] #Split the data into training and testing settrain_X,test_X,train_y,test_y=train_test_split(X,y,test_size=0.2, shuffle=True) #Data was splitted as 80% train data and 20% test data. train_y = train_y.values.reshape(-1,1)test_y = test_y.values.reshape(-1,1) from sklearn.linear_model import LinearRegressionregr = LinearRegression()regr.fit(train_X,train_y) y_pred = regr.predict(test_X)print('Predictions for test data:', y_pred)
Evaluate the model:
#Evaluate the linear regression modelfrom sklearn.metrics import r2_score,mean_squared_errorprint("r2 score:", r2_score(y_pred,test_y))print("mean absolute error:", mean_squared_error(y_pred,test_y))
Now, let’s try properly fitting the regression line (curve, actually) to our non-linear data, shall we?
Overview of Non-linear Regression
Non-linear Regression algorithms, as their name suggests, model a non-linear relationship between the dependent (outcome) and independent (predictor) variable(s). They are generally used for predicting growth rates over a period of time.
Essentially any relationship that is not linear can be termed as non-linear, and is usually represented by the polynomial of ‘k‘ degrees (maximum power of x):
Where, a, b, c, and d are the Model’s Coefficients or Parameters.
Non-linear regression modeling is more complicated than linear regression modeling because the mapping function ‘f‘, called the sigmoid function here, is created through a series of approximations or iterations.
When f(x) is non-linear, it could involve:
- Exponential functions:
- Logarithmic functions:
- Quadratic functions:
- Sigmoid/Logistic functions:
Non-linear Regression Model Demonstrated
Now let’s come back to our example – from an initial look at the above plot, we can see that the GDP growth is slow during the initial years and suddenly increases after the year 1990.
So, we can determine that the logistic function could be a good approximation as illustrated below:
X = np.arange(-5,5.0, 0.1)Y = 1.0 / (1.0 + np.exp(-X)) plt.figure(figsize=(8,5))plt.plot(X,Y) plt.ylabel('Dependent Variable')plt.xlabel('Indepdendent Variable')plt.show()
Now, let’s build our non-linear regression model:
#Build the modeldef sigmoid(x, Beta_1, Beta_2): y = 1 / (1 + np.exp(-Beta_1*(x-Beta_2))) return y #Fit a sample sigmoid line to the databeta_1 = 0.10beta_2 = 1990.0 #logistic functionY_pred = sigmoid(x_data, beta_1 , beta_2) plt.figure(figsize=(8,5)) #plot initial prediction against datapointsplt.plot(x_data, Y_pred*15000000000000.)plt.plot(x_data, y_data, 'ro')
Let’s normalize the data and find the best parameters for our model.
The curve_fit() function uses non-linear least squares to fit our sigmoid function to the data:
#Normalize the dataxdata =x_data/max(x_data)ydata =y_data/max(y_data) #Find best parameters to fit the linefrom scipy.optimize import curve_fitpopt, pcov = curve_fit(sigmoid, xdata, ydata)#print the final parametersprint(" beta_1 = %f, beta_2 = %f" % (popt[0], popt[1]))
Let’s plot our model:
#Plot the regression modelx = np.linspace(1960, 2015, 55)x = x/max(x)plt.figure(figsize=(8,5))y = sigmoid(x, *popt)plt.plot(xdata, ydata, 'ro', label='data points')plt.plot(x,y, linewidth=3.0, label='fitted line')plt.legend(loc='best')plt.ylabel('GDP')plt.xlabel('Year')plt.show()
As we can see, our sigmoid curve now fits the data well.
Let’s build our model accordingly and predict the outcome variable:
#Split the data into training and testing setmsk = np.random.rand(len(data)) < 0.8X_train = xdata[msk]X_test = xdata[~msk]y_train = ydata[msk]y_test = ydata[~msk] #Build the model using training setpopt, pcov = curve_fit(sigmoid, X_train, y_train) #Predict using testing sety_pred2 = sigmoid(X_test, *popt)print('Predictions for test data:', y_pred2)
Evaluate the model:
#Evaluate the non-linear regression modelfrom sklearn.metrics import r2_scoreprint("r2 score: %.2f" % r2_score(y_pred2 , y_test) )print("mean absolute error: %.2f" % np.mean(np.absolute(y_pred2 - y_test)))
From the above output, we can clearly conclude that the overall R Square value has increased to 0.95 with a minimized mean absolute error.
Endnotes
Regression algorithms are instrumental in solving Machine Learning problems. Knowing the different varieties of regression techniques to implement on your dataset would help you attain good accuracy with a minimum error rate. Hope this article on the difference between linear and nonlinear regression helped you understand the concepts better. Artificial Intelligence & Machine Learning is an increasingly growing domain that has hugely impacted big businesses worldwide. Interested in being a part of this frenzy? Explore related articles here.
Top Trending Articles:
Data Analyst Interview Questions | Data Science Interview Questions | Machine Learning Applications | Big Data vs Machine Learning | Data Scientist vs Data Analyst | How to Become a Data Analyst | Data Science vs. Big Data vs. Data Analytics | What is Data Science | What is a Data Scientist | What is Data Analyst
This is a collection of insightful articles from domain experts in the fields of Cloud Computing, DevOps, AWS, Data Science, Machine Learning, AI, and Natural Language Processing. The range of topics caters to upski... Read Full Bio