Heteroscedasticity correction 

Taking the information shown in the following video into account, where data from 20 companies on profits (B) and sales (V) estimates a linear model to explain profits in terms of sales, the presence of heteroscedasticity is contrasted and Ordinary Least Squares (OLS) applied.

It is immediate that the model estimated by OLS is B(t) = 10'2229 + 0'0112223 * V(t) for t=1,2,3...,20. In addition we obtain a coefficient of determination of 0'642619.

Representing the residuals by observation number shows that for different observation groups (for example, 1 to 7, 8 to 16 and 17 to 20) gives a different dispersion, which makes us think that the disturbances have constant variance. Similarly, on the graph of residuals compared to the variable that assumes heteroscedasticity occurs (sales), we can see that sales increases with the increasing dispersion of residuals. All this tells me that, based on graphical methods, heteroscedasticity is present in the model considered..

From White’s test (analytical method), we decided to reject the null hypothesis of homoscedasticity in the model as the p-value obtained is 0'04256. This is because the p-value is defined as the minimum value of significance from which the null hypothesis is rejected. That is to say, for values ​​greater than 0'04256, we reject the null hypothesis and for smaller values ​​it is not rejected. Since in this case we are working at a 5% significance level, it is clear that 0.05 is greater than 0'04256, so that the decision to make is to reject the null hypothesis, leading to the presence of heteroscedasticity in the model.

As we know, to implement White’s test, we need to raise an auxiliary regression that explains the squared residuals from the original variables, their squares and cross-products excluding repetitions. In this case, the regressors in the auxiliary variable will be a constant, sales and its square (all other possibilities are repetitions of those). Thus the OLS estimate of this regression is B(t) = -5'17545 + 0'0833682 * V(t) - 0'000132827 * V(t)^2 para t=1,2,...,20, with a coefficient of determination of 0'315679. Finally, since the experimental statistic is obtained by multiplying the number of observations by the auxiliary regression’s coefficient of determination, it is clear that this value is 20 * 0'315679 = 6'313575.

Since heteroscedasticity is present in the model, the OLS estimate is not optimal. Therefore, to correct this problem, as we know, we must transform the original data. Watching the video, it is clear that the way to transform the data is to divide by the square root of V. From this transformed data, we re-perform the OLS estimate obtaining B(t) = 10'2147 + 0'063917 * V(t) with a coefficient of determination of 0'993125. In addition, performing the test again gives White’s test a 0'197533 p-value, which when greater than 0.05 indicates that we do not reject the null hypothesis of homoscedasticity in variances in the disturbances. Therefore, the problem has been corrected.

The Camtasia Studio video content presented here requires JavaScript to be enabled and the latest version of the Macromedia Flash Player. If you are you using a browser with JavaScript disabled please enable it now. Otherwise, please update your version of the free Flash Player by downloading here.