Linear Regression with Python
The terminology of linear specifically explains about linear relationship between variables. the linear relationship is represented with a line on a scatter diagram.
In this article, I will try to explain basic concept of regression model and its implication through python. Following are the essential libraries to import.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn import metrics
from sklearn.model_selection import train_test_split
Lets create a dataframe with pandas library with three variables as Price, Quantity, and Sales.
To visualise first five rows of the dataframe, we apply df.head() for following output.
To explore the dataframe little bit in number of rows and columns, we apply df.shape, so we have 12 rows in the dataset along with 3 columns.
To analyse basic statistical details of dataset, we can apply the function df.describe() with following output. We can easily analyse the minimum, maximum, mean and standard deviation for each variable in the dataset along with total number of observations.
Now, it's time to analyse the relationship between variables. Let's find out correlation matrix through heatmap. The relationship between quantity and sales is significantly high. So, take these two variables as dependent and independent for regression model.
The next step is to name attributes (independent variable(s)) and label (dependent variable) data set. Although dataset has three columns but we are interested to consider only two columns as sales and quantity. So, we need to predict sales based on the quantity variable. Therefore, our “X” variable with Quantity and “Y” variable will be Sales.
Splitting dataset is an important function. in regression model. In the following dataset, the ratio of data size is considered as 0.7 (70%) for training purpose and 0.3 (30%) for testing purpose. The next step is to fit regression model for analysis.
Now, it is the time to extract results. The first value is of co-efficient of determinant to analyse, how well model can predict the outcome. The model score is computed as 82.31% (close to 100%). Therefore, model can well predict.
The next step is to extract intercept and coefficient of model. For that purpose, the following functions can be useful. It can analyse that for every value of change in the quantity, there will be a change in the sales by 99. 37.
We have trained the regression model, and now we can predict values. For that test data will be used to analyse the accuracy rate of prediction. Following command can be useful for that.
Lets have a comparison of actual and predicted values with the help of following dataframe. Same dataframe can be analysed in the following figure.
In the same way, we can also plot a straight line with test data with following command.