Panel analysis

Panel data analysis is the statistical analysis of panel data under the panel research.

In the panel data analysis of both dynamic aspects of individuals (such as a feature developed over time? ) And the heterogeneity ( differences ) are considered. In addition, one can differ depending on the model chosen between cohort, period and age effects. Due to the number of observations increases, the number of degrees of freedom and reduces the collinearity, so that the estimates are more efficient. Compared to several independent cross-section panel data regressions result in the estimation of exogenous variables to better results. By the use of individual- specific constant, the influence of a constant, not modeled variables can be captured; be the estimator more robust to incomplete model specification.

  • 3.1 estimation methods in the static models
  • 3.2 estimation methods in dynamic models

Static Linear Models

Static models do not take into account the temporal evolution of the dependent variable. The use of static models is useful if the reaction of the individuals depends only on the exogenous variables, but not of older values ​​of the observed size. Among them

  • The pooled model,
  • The random-effects model (RE ) model and
  • The fixed-effects model (FE - model, rarely Within- Estimator Covariance model, individual dummy variable model):

Pooled model

In pooled model heterogeneity observed in both time and in the cross-sectional dimension is neglected as in the conventional linear regression model, all the coefficients are considered to be nichtstochastisch and identical for all the observations. The estimators are more efficient than T cross-section regressions, each with N observations, because with increasing number of observations, the standard error of the coefficient decreases, provided that these differ not significant; However, heterogeneity leads to biased estimators. It is also questionable whether the observations are independent if the same individuals are repeatedly questioned ( "serial correlation" ).

Random-effects model (RE model)

In the random-effects, more random- intercept - or Error - Components Model, an individual- specific intercept is introduced, which is the realization of an identically distributed random variables for all individuals for each individual:

This represents the value of the explanatory variables represents the vector of K explanatory variables and the regression coefficients dar. The total error is composed of the individual- specific intercept and the idiosyncratic ( time-varying, systematic ) errors together.

Fixed-effects model (FE - model)

In the fixed-effects model, however, the basic intercept systematically varied, while the remain the same for all individuals. The thus are parameters to be estimated and model the heterogeneity of individuals as in the RE model only by a level shift - ie by different. The influence of the explanatory variables should be the same for all individuals. This process thus explains why an observation deviates from the individual mean, but not the differences in the (average ) values ​​of different individuals. Therefore are not identified time constant variables in the model with fixed effects.

  • The unobservable skills of management affect the profitability of companies
  • Education influences the salary status of workers

Comparison

In general, to random-effects models are preferred when the characteristics of a population should be derived from some individuals. Fixed-effects models are particularly appropriate when predictions (inferences ) should be taken only for the considered sample; but they should also be applied in the case above, if and are correlated and the random-effects model thus leads to inconsistent and biased estimators. An argument against FE models is the loss of degrees of freedom, as a new variable to be estimated with each individual. If the variance of the values ​​of an individual ( within- variance) is much smaller than the variance between individuals ( between- variance), the FE model is a disadvantage: It ignores a part of the information and assumes that the mean values ​​of x and y nothing about the relationship of the variables.

The two-way model ( the German translation of " two-level model " is slightly confusing)

Although based on static process, but forms through for all individuals in force, but time- dependent variable level differences in different periods from. can be estimated to analog as part of an FE model, or RE. Since the time-dependent constant must be redefined for each period, this model is not suitable for forecasting.

Another way to take into account changes over time, is the use of so-called distributed lag models, which distributes the effect of a change in the independent variable on the explained variable over an infinite time horizon. Such a structure thus explains delayed effects of psychological, technological or institutional reasons. In these models, must be paid to the multicollinearity special attention. In addition, problems arise by choosing the right number of delayed observations and a loss of observed values ​​: With an increasing number of parameters, the number of available observations decreases.

The Hausman specification test is a test method to determine whether more of a fixed-effects model (FE - model ) or a random-effects model is available (RE model).

Dynamic models

Dynamic models implicitly on the error term ( autoregressive models ) or explicitly (LDV = " lagged dependent variable " ) a lagged endogenous variable ( ie, for example, if to be explained ). This approach implements the intuitively plausible notion that the level of previous year's size represents a primitive forecast for the current size. The dynamic LDV model is:

The coefficient can not be causally interpreted (as in the static model ), but describes the adaptation rate of the dynamic effect.

Estimation method

Estimation methods in the static models

In static models, the OLS (pooled model), the LSDV - estimator ( least squares dummy variable in the fixed effects model ) and the FGLS estimator ( Feasible Generalized Least Squares in the random-effects model ) was used.

Estimation methods in dynamic models

In dynamic models, the time- lagged endogenous variable of depends, since the transformed individuals on average values ​​and error terms lagged variables are correlated with each other - this is true regardless of whether considered to be fixed or random. Therefore, OLS are biased in finite time horizons T and not consistent; even for T = 30, the distortion is very clear on is the asymptotic bias. This Landau symbol means simplistically that the distortion at most as quickly decreases as 1 / T. An alternative therefore offer certain GMM estimator ( generalized method of moments ), a generic term for many linear and non-linear estimation methods including OLS and now to be discussed instrumental variables (IV). Such methods do not require assumptions about the distribution of the error terms, allow heteroscedasticity and can ( numerically ) itself then be solved if an analytical solution is not possible. IV estimator result in correlation of the explanatory variables with the error term to be consistent estimators, unless other conditions are violated. This correlation can ( take individuals only in their opinion, positive circumstances in the survey ) as here by endogenous variables, but also by unconsidered explanatory variables, self-selection be caused by measurement error or not. In the IV method, the correlation is between and at least asymptotically eliminated by replaced by variables that are closely related to true (that are relevant ), but do not correlate with or represent a linear combination of other explanatory variables and are therefore valid. If the number R of the instruments of the number K corresponds to the explanatory variables, we use the term IV model ( in this case, exogenous variables have their own instruments ), R applies > K, then the model is überidentifiziert and one gets the more efficient, in finite samples but possibly more distorted GIVE, the " generalized instrumental variable estimator ". The estimator in the case R = K is, the matrix of available instruments. This equation can also be derived from the GIVE for R> K:

This estimator results from the minimization of a quadratic function of sample moments. If the weighting matrix is positive definite, the estimator will be consistent, as the quantity to be minimized quadratic equation can only assume positive values ​​and aims at increasing N close to zero. Since any scalar multiple of the inverse covariance matrix of the sample moments leads to efficient estimators obtained under the assumption of the optimal weighting matrix:

The resultant GIVE is also called 2SLS estimators (two stage least squares ), because it can be formed from two successive OLS regressions.

Simulation studies have shown that the variances of the IV estimator for small to medium samples are often quite large. This is especially true in comparison to OLS estimators and is exacerbated by a low correlation of endogenous regressor and IV, because the estimator then are already inconsistent with the error term with low correlation of the IV. The number of required observations depends on the model context. Another problem is the choice of instruments is: For example, while exogenous variables can be used from previous periods or differences from those in the simplest case, the further they are away but in time, the weaker they are probably. Also computationally limits: To reach a standard proposed by Ahn / Schmidt IV estimator with additional moment conditions for 15 cycles and 10 explanatory variables 2,250 columns. These are orders of magnitude by many programs even today can not be solved. The assumptions made with respect to the moment conditions can not be tested statistically. Only if more than necessary conditions are present ( R> K ), a statement can be made whether the moment conditions are superfluous, but not what. If the instruments are valid, more moment conditions lead to more efficient estimators. The Arellano - Bond estimator ( AB estimator ) increases the number of these conditions by considering sustained levels of the dependent and prädeterminierten variables and changes in the exogenous variables:

  • M = (T -2) * (T-1 ) / 2 conditions on a model with a lagged variables, and no exogenous variables:
  • M = (T -2) * [K ( t 1 ) T] / 2 conditions on a model with a lagged variables and K strictly exogenous variables,
  • M = (T -2) * [K ( t 1 ) ( T-1) ] / 2 conditions on a model with a lagged variables and K exogenous prädeterminierten ( predetermined ) variables. These are - as opposed to strictly exogenous variables - dependent on previous realizations of the error term: for s

Generally this results in the following estimator:

With the matrix of the moment conditions, the weighting matrix and the change of the declared or explanatory variables, () and (). However, the method assumes uncorrelated error terms. In final testing must be noted that the standard errors are biased downwards, which can lead to an unjustified neglect an explanatory variable.

This method can be used with minor adjustments for unbalanced panel data.

631798
de