Instrumental variable

The method of instrumental variables ( IV instrument) is an umbrella term for certain estimation methods in inferential statistics. It belongs to the family of the GMM estimator (English: Generalized method of moments ), a generalization of the moment method, which was proposed in 1982 by Lars Peter Hansen.

The aim of the IV method is an occurring between the error term and the explanatory variables correlation to eliminate at least asymptotically in a regression analysis by replacing the explanatory variables by other variables, although they are closely associated with them, but not with the error term correlate or be a linear combination of other explanatory variables.

Idea

In many situations in which causal effects are to be investigated and quantified, there is a correlation between the error term and the explanatory variable. If you want, for example, the effect of education (X) on the earned income of a person ( Y) study, one could, for example, estimate a model of the type:

A possibility to estimate would be the method of least squares. However, this is based on several assumptions, including that the error term and the explanatory variables are uncorrelated. However, this is very unlikely in the above example. It can easily be determined many variables that do not appear in the model, but both have an effect on education, as well as on income. Some of these variables are measured on top of that hardly or not at all and therefore can not be included as control variables in the model. For example, is correlated with both the level of education of that person, as well as with their income the diligence of a person with high probability; because the industry also is not measurable and therefore remains in the error term, thus precisely those correlation between the explanatory variable and the error term is made ​​, which may not exist for the validity of the method of least squares. In such a case there is a problem due to exuberant variables ( omitted variable ), and the least squares estimator will be inconsistent. The correlation between the error term and the explanatory variables is referred to as endogeneity. In addition to omitted variables, this problem can also arise when the variables are not exactly, but can only be measured with error, and if bilateral, simultaneous causality ( x has a causal effect on y y, has a causal effect on x). Other approaches to solution of endogeneity are regression - discontinuity analysis, panel data, upon which estimation methods as well as the classic experiment.

Mathematical Background

For the least-squares estimator ( in the bivariate regression model with one explanatory variable ):

When x and not correlated, the second term is zero in an infinite number of observations and the estimator is consistent. If x and are correlated, the estimator is inconsistent.

A variable instrument is correlated with the explanatory variable, but not with the error term. The estimator is:

And when z is not correlated, the last term is zero and results in a consistent estimates. Note: x is not correlated with the error term, x is itself an instrument variable. In this case, the least squares estimator is identical to the IV estimator.

The approach above can be easily generalized to regression with several explanatory variables. X is a T x K matrix of explanatory variables, which results from T observations of K variables. Z is a T x K matrix of instrumental variables. Then follows

This technique is often implemented by means of two-stage least squares ( 2SLS ). In the first step of the 2SLS approach each endogenous explanatory variable is regressed on all valid instruments and all exogenous variables. Since the instruments are exogenous, this approximation of the endogenous variable is not correlated with the error term.

Intuitive: It examines the relationship between y and the endogenous explanatory variables. In the second step, the regression of interest is estimated as usual, but all the endogenous explanatory variables are replaced by the approximate values ​​from step 1

The estimator thus obtained is consistent. Thus, the standard errors are calculated correctly, only the sum of the squared error terms must now be corrected

Possible Problems

A fundamental problem is to find a valid instrument, ie a variable that is not correlated except through the endogenous variable with the explanatory variable. Difficult is here especially that this assumption can not be tested statistically on the basis of existing data. In addition, estimators are indeed consistent, but not undistorted in the rule, so larger samples are needed on the basis of valid instrumental variables. Another problem occurs when the instruments are only weakly correlated with the / the endogenous variable (s ) ( weak instrument ). As a rule of thumb that an endogenous variable should be the F-statistic of the regression in Step 1 is greater than 10.

History

While instrumental variables are nowadays widely used especially in situations with boisterous variables, they have been historically first used as a solution for problems due to simultaneity. In the estimation of supply and demand curves, for example, there is the problem that as the data points are only available for equilibrium prices and quantities, ie quantities which supply and demand are matched. The American economist Philip G. Wright, published a book in 1928 under the title The Tariff on Animal and Vegetable Oils. In one of the appendices to the book Wright presented a method by which the demand and Angebotselatizitäten of butter and linseed oil can be estimated. This is considered the first study that used the instrumental variable approach. As a second application of instrumental variables was then discovered that they can also clean problems due to measurement errors. The now widespread use for solving distortions due to exuberant variables was seen in time added as the last field of application for instrumental variables.

413720
de