Econometric analysis

Here we address the estimation and validation of an econometric model and the study of the performance or non-performance of the basic hypotheses relative to normality, heteroscedasticity and autocorrelation of the random disturbance and the linear independence between the explanatory variables in the regression. An excellent piece of work on the diagnosis of these hypotheses using the R programming environment provided by Quick-R, is available at  http://www.statmethods.net/stats/rdiagnostics.html.  Note that since the random disturbance is not observable, the analysis will be performed for any errors or residuals, since they are the ones that estimate the random disturbance.

TTo perform this analysis, lmtest and car packages will be installed (via Install package(s)... from the Packages menu) to perform, respectively, heteroscedasticity and autocorrelation analysis. The remaining commands are available using the basic distribution of R. For example, use the command:

  • lm to analyze the econometric model used.
  • ks.test to study the normality of the residuals using the Kolmogorov-Smirnov contrast.
  • ncv.test for the study of heteroscedasticity using the Breusch-Pagan contrast.
  • dwtest for the study of the autocorrelation using Durbin-Watson contrast.
  • vif for the study of multicollinearity from enlargement factor of the variance of each estimated coefficient.

Learn more about these functions can be obtained from the help() command.

Since making decisions from graphical representations is not ideal, because they would be subjective and easily manipulated, we focus on analytical methods that are available for studying the performance of the underlying assumptions. In any case, the link referenced in the initial summary can find information related to graphical procedures.

Via following link you will find a function such that from the data and regression formula, you can perform the analysis for estimation and corresponding validation, and test the normality, heteroscedasticity and autocorrelation of the residuals and analyze the linear independence of the explanatory variables.

The following video deals with the analysis of an econometric model aiming to study the relationship between household consumption and family income, debt and number of children.

Firstly, previous data will be kept in the working directory (specified in R using Change dir... from the File menu) in the file datos.txtThe first row specifies the variable name and the rest of the observations, separated (both variables and observations) by a semicolon. The decimal delimiter is the comma. You can load data into the program using the following command:

> datos = read.table(file="datos.txt", header=TRUE, dec=",", sep=";")
> names(datos)
> attach(datos)

Using the command attach() the datavariables are made available simply by typing their name. For this reason, the screen displays the name of each variable present in the data file using the command names().

The next step is to specify the functional form of the regression we want to analyze, in this case:

> funcion = Consumo ~ Renta + Deuda + Hijos

Note that, by default, we are considering an independent term in the regression. To use this function GUIME.LM R must be known so that we place a copy in the working directory in a file called  funcion.txt and it loads it into memory using the command R:

> source("funcion.txt")

So, without any further commands

> GUIME.LM(funcion, datos) 

we get the following results stored in the file AnálisisR.txtFollowing these results we conclude that all coefficients of the variables are significant because they have an associated p-value less than 0.05 in which case we reject the null hypothesis that the coefficient is zero. At the same time, we also reject the null hypothesis that all coefficients are zero simultaneously, as the p-value associated with the F of 2,645•10^-9 is less than 0.05, so that the correction made is valid. Moreover, the coefficient of determination indicates that the estimate explains 90.26% of the variability of consumption.

On the other hand, the p-values ​​associated with the Kolmogorov-Smirnov and Durbin-Watson contrasts, 0.9827 and 0.7679, are greater than 0.05, so we do not reject the null hypothesis of normality and uncorrelated residuals. While the p-value, 0.002016154, associated with the Breusch-Pagan contrast suggests, to be less than 0.05, we reject the null hypothesis that the variance of the random disturbance is constant. Finally, since the factors of enlargement of the variances of the estimated coefficients are less than 10, we conclude that there is no multicollinearity.

Therefore, since there is heteroscedasticity in the model, the estimates obtained are not optimal and the above findings remain in doubt pending resolution of the problem.

The Camtasia Studio video content presented here requires a more recent version of the Adobe Flash Player. If you are using a browser with JavaScript disabled please enable it now. Otherwise, please update your version of the free Flash Player by downloading here.