Linear Regression Assumptions

Linear regression is used when the dataset has a linear correlation. Before building a linear regression model assumptions are to be validated. If the assumptions are violated, different methodologies must be used.
Simple linear regression has one independent variable (predictor) and a dependent variable(response) and multiple linear regression has more than one predictor to predict response.
The simple linear regression equation is represented as
Y = β0 + β1 (X1 )+ε
Multiple linear regression equation is represented as
Y = β0 + β1 (X1 )+β2 (X 2)+β3(X3)+β4(X4)+ε
Assumptions of Linear Regression Analysis are :
- Linearity
- No Heteroskedasticity
- No omitted variable bias
- Normality of error
- No autocorrelation
- No multicollinearity
1. Linearity
For linear regression analysis, there must be a linear relationship between predictor(s) and response. Linear relations can be visualized by plotting data points using a scatter plot. In the case of multiple regression, for each pair of variables scatter plots are to be plotted.
Below Graph 1 shows how a linear relationship dataset will look like in a scatter plot.

But if we get a graph like Graph 2, which has an exponential curve then log transformation can be used for transforming it into linear.

In case there is no linear relationship between data points or if the data cannot be transformed into linear, then non -Linear regression analysis has to be performed.
2. No Heteroskedasticity
For a linear relationship if the variance is not a constant and it increases as the predictor (X) increases, then it is called heteroskedasticity. In such cases, the standard error in the output cannot be relied on but still, coefficients will be unbiased.
The best way to detect heteroskedasticity is by using a scatter plot for residuals vs X value. Heteroskedasticity can be avoided by performing log transformation or by investigating for omitted variable bias or identifying outliers and trying to remove them.
3. No omitted variable bias
The predictor(s) shouldn’t be correlated with the error term. It can be represented below equation.
𝜎𝑋𝜀= 0: ∀𝑥,𝜀
If predictor(s) are correlated it is called omitted variable bias. It happens when a predictor is excluded and it gets reflected in the error term which leads to biased and counterintuitive estimates. It can be detected by checking the correlation between predictors.
When salary is estimated, based only on years of education omitted variable bias occurs. Because salary depends on many other predictors like the type of education, socio-economic status extra. But still, it can be used for predictions.
4. Normality of error
We assume error ε, is normally distributed, i.e, the mean is equal to zero ( the sum of all errors will be equal to zero (0) or almost 0) and variance 𝜎2 of error terms are equal.
ε~ N(0,𝜎2)
Normality is violated when the variance of the errors is not consistent across observations like when a linear regression model is used to express insurance payout as a function of the age of the customer. Insurance will not be claimed by all individuals who have opted for insurance. Therefore, a large number of zero insurance payouts will be there along with a few, very high amounts of the insurance payout.
If the sample size is very large, the central limit theorem would apply and the variance of the errors will be consistent across observations. But for small sample size, the standard errors of the output will be impacted.
This can be easily detected by plotting a histogram of the residuals.
5. No Autocorrelation
Error term values should not have any identifiable relationship.
𝜎𝜀𝑖𝜀𝑗= 0: ∀𝑖≠𝑗
If there exists any relationship between values of error term then autocorrelation comes into the picture. Autocorrelation is also known as serial correlation which affects the standard errors without affecting coefficients that will be unbiased. It is not observed in cross-sectional data but visible in time series data like a stock price. For stock price analysis, the Day of the week effect phenomena says there will be a high return on Fridays and a low return on Mondays. Errors on Monday will be biased downwards and errors on Friday will be biased upwards.
The main cause of autocorrelation is due to omitted variables or incorrect functional forms.
The common way to detect autocorrelation is to plot a scatter plot with all residuals and look for patterns. If there are no patterns then it implies there is no autocorrelation. The Durbin-Watson test method can also be used to find autocorrelation.
If there is a pattern, autocorrelation exists and it is best to avoid linear regression models and go for an autoregressive model or moving average model or autoregressive moving average model, or autoregressive integrated moving average model.
6. No multicollinearity
If multiple linear regression is represented by
Y = β0 + β1 (X1 )+β2 (X 2)+β3(X3)+β4(X4)+ε
then ideally no predictor should be explained by another predictor.
Multicollinearity occurs when predictors are themselves correlated. To find if multicollinearity exists between predictors correlation between all pairs of predictors is to be checked. In case the correlation coefficient is very high it clearly indicates the predictor is explained by another predictor.
Multicollinearity can also be detected by using the variance inflation factor (VIF). To find VIF auxiliary regression is to be performed for all predictors.
For the above equation, auxiliary regression equation for X1 will be
X1= β0* +β2 *(X 2)+β3* (X3)+β4* (X4)+ε*
It will help to understand how much X1 is been explained by other predictors. If R-Square for this model is RX1 then Variance Inflation Factor,
VIF = 1/(1-R-SquareX1 )
This implies higher the R-squareX1, the higher will be VIF.
The higher value of VIF indicates X1 is explained by other predictors.
Likewise, the VIF of all other preditors has to be calculated to check if that predictor is explained by another predictor.
To overcome this, we will have to find out if two predictors are giving the same information, in that case, remove one of them. While removing predators, omitted variable bias has to be taken care of. Another method is to transform the correlated predictors into one predictor.