Econometric analysis
Here we address the estimation and
validation of an econometric model and the study of the performance or non-performance
of the basic hypotheses relative to normality, heteroscedasticity and autocorrelation of the random
disturbance and the linear independence between the explanatory variables in the regression. An excellent piece of work on the diagnosis of
these hypotheses using the R programming
environment provided by Quick-R,
is available at http://www.statmethods.net/stats/rdiagnostics.html. Note that since the random disturbance is not observable,
the analysis will be performed for any
errors or residuals, since they are the ones that
estimate the random disturbance.
TTo perform this analysis, lmtest and car packages will be installed (via Install package(s)... from the Packages menu) to perform, respectively, heteroscedasticity and autocorrelation analysis. The remaining commands are available using the basic distribution of R. For example, use the command:
- lm to analyze the econometric model used.
- ks.test to study the normality of the residuals using the Kolmogorov-Smirnov contrast.
- ncv.test for the study of heteroscedasticity using the Breusch-Pagan contrast.
- dwtest for the study of the autocorrelation using Durbin-Watson contrast.
- vif for the study of multicollinearity from enlargement factor of the variance of each estimated coefficient.
Learn more about these
functions
can be obtained from the help()
command.
Since making decisions from graphical representations is not ideal, because they would be subjective and easily manipulated, we focus on analytical methods that are available for studying the performance of the underlying assumptions. In any case, the link referenced in the initial summary can find information related to graphical procedures.
Via following link you will find a function such that from the data and regression formula, you can perform the analysis for estimation and corresponding validation, and test the normality, heteroscedasticity and autocorrelation of the residuals and analyze the linear independence of the explanatory variables.
The following video deals with the analysis of an econometric model aiming to study the relationship between household consumption and family income, debt and number of children.
Firstly, previous data will be kept in the working directory (specified
in R using Change
dir... from the File menu)
in the file datos.txt. The
first row specifies
the variable name and the rest of
the observations, separated (both
variables and observations) by a
semicolon. The decimal delimiter is the comma.
You can load data into the program using the following command:
> datos = read.table(file="datos.txt", header=TRUE, dec=",",
sep=";")
> names(datos)
> attach(datos)
Using the command attach() the datavariables are made available simply by typing their name. For this reason, the screen displays the name of each variable present in the data file using the command names().
The next step is to specify the functional form of the regression we want to analyze, in this case:
> funcion = Consumo ~ Renta + Deuda + Hijos
Note that, by default, we are considering an independent term in the regression. To use this function GUIME.LM R must be known so that we place a copy in the working directory in a file called funcion.txt and it loads it into memory using the command R:
> source("funcion.txt")
So, without any further commands:
> GUIME.LM(funcion, datos)
we get
the following results stored in the file AnálisisR.txt. Following
these results we conclude that all coefficients of the variables are significant because they have an associated p-value less
than
On the other hand, the p-values associated with the Kolmogorov-Smirnov and Durbin-Watson contrasts, 0.9827 and 0.7679, are greater than 0.05, so we do not reject the null hypothesis of normality and uncorrelated residuals. While the p-value, 0.002016154, associated with the Breusch-Pagan contrast suggests, to be less than 0.05, we reject the null hypothesis that the variance of the random disturbance is constant. Finally, since the factors of enlargement of the variances of the estimated coefficients are less than 10, we conclude that there is no multicollinearity.
Therefore, since there is heteroscedasticity in the model, the estimates obtained are not optimal and the above findings remain in doubt pending resolution of the problem.