Standard error

The standard error or sampling error is a measure of dispersion for an estimator of an unknown parameter of the population. The standard error defined as the standard deviation of the estimator, that is, as the square root of the variance.

In an unbiased estimators therefore the standard error is a measure of the average deviation of the estimated parameter value from the true parameter value. The smaller the standard error, the more accurately the unknown parameters are estimated by means of the estimator. Standard error depends inter alia on

  • The sample size and
  • Of the variance in the population.

In general, the larger the sample size, the smaller the standard error; The smaller the variation, the smaller the standard error.

An important role is played by the standard error in the calculation of estimation errors, confidence intervals and test statistics.

Interpretation

The standard error provides an indication of the quality of the estimated parameter. The more individual values ​​there are, the smaller the standard error, and the more accurate the unknown parameters to be estimated. The standard error makes the measured dispersion (standard deviation) of two data sets with different sample sizes comparable, normalized by the standard deviation to the sample size.

Is estimated with the help of several samples of the unknown parameters, the results will vary from sample to sample. Of course, this variation is not from a variation of the unknown parameter ( because that is fixed), but by random factors, such as measurement inaccuracies. The standard error is the standard deviation of the estimated parameters in many samples. In general, for a halving of the standard error quadrupling the sample size is necessary.

In contrast, the standard deviation in a population ( = population ) actually existing scattering maps, which is present even at the highest measurement precision and an infinite number of individual measurements (eg, weight distribution, size distribution, monthly income ). It shows that the single values ​​are close to each other or a strong spread of the data is present.

Example

Suppose one examines the population of children who attend high schools, in terms of their intelligence service. The unknown parameter is the medium intelligence performance of children who attend a high school. Now, if random from this population, a sample of the circumference of n (ie with n children) is drawn, then you can from all n measurement results calculate the average. If, after this sample still drawing another, randomly drawn sample with the same number of n children and their mean value is determined, the two averages are not exactly the same. If we take a variety of other random sampling of the circumference n, the scattering of all empirically determined mean values ​​for the population mean can be determined. This dispersion is the standard error. Since the mean of the sample means is the best estimator of the population mean, equal to the standard error of the scatter of the empirical mean values ​​for the population mean. It does not reflect the intelligence scattering of children, but the accuracy of the calculated average value.

Notation

For the standard error used to distinguish different names to distinguish it from the standard deviation of the population and to clarify that it is the variance of the estimated parameter of samples:

  • ,
  • Or
  • .

Estimate

Since the standard error is received, the standard deviation of the population, the standard deviation in the population must have an expected truest possible estimators of the same are estimated for an estimate of the standard error.

Confidence intervals and tests

Standard error plays a major role in confidence intervals and testing. If the estimator is unbiased and at least approximately normally distributed (), then

On this basis, can be confidence intervals for the parameters specify:

Or tests formulate, for example, whether the parameter accepts a hypothetical value:

And the test statistic is given by:

Is the quantile of the standard normal distribution and are also the critical value for the formulated test. As a rule, must be estimated from the sample, so that

Holds, where the number of observations. For t- distribution can be approximated by the normal distribution.

Standard error of the arithmetic mean

The standard error of the arithmetic mean is equal to

Wherein the standard deviation of a single measurement indicated.

Derivation

The mean of a sample of size is defined by

Considering the estimator

With independent, identically distributed random variables with finite variance, so the standard error is defined as the square root of the variance. It is calculated using the calculation rules for variances, and the equation of Bienaymé:

From which the following formula of the standard error. Analogous case may be, as follows

Calculation of

Assuming a sampling distribution, the standard error can be calculated from the variance of the sampling distribution:

  • With the binomial distribution with parameters
  • With the exponential distribution with parameter ( expected value = standard deviation =):
  • And wherein the Poisson distribution with parameter (expectation value = variance = )

In this case, denote

  • The standard error of the respective distribution, and
  • The sample size.

Is the standard error of the mean value can be estimated, then the variance is estimated from the corrected sample variance.

Example

For the ice cream data consumption of ice cream (measured in pint ) the arithmetic mean, calculated its standard error and the standard deviation for the years 1951, 1952 and 1953 per capita.

For the years 1951 and 1952, the estimated means and standard deviations and the numbers of observations are about the same. Therefore, the estimated standard errors also result in approximately the same value. In 1953, are on the one hand the numbers of observations less than the standard deviation is larger. Therefore, the standard error is almost twice as large as the standard error of the years 1951 and 1952.

The graphical representation can be made by an error bar graph. Law are the 95 % estimation intervals for the years 1951, 1952 and 1953 shown. If the sampling function is distributed at least approximate normal, then the 95 % estimation intervals are given by and with the sample means and the sample variances.

Here you can clearly see that the average may be inaccurate in 1953 estimated as the mean values ​​of 1951 and 1952 ( Longer bars for 1953).

Standard error of the regression coefficient in the simple regression model

Classic regression model for simple linear regression is assumed that

  • The error terms are distributed
  • The error terms are independent and
  • The values ​​(so no random variables ) are fixed,

The observations made by running. For the estimators

Then results

The standard errors result to be

And

Example: For the ice cream data, a simple linear regression with the average weekly temperature was for the per capita consumption of ice cream (measured in pints ) performed ( in Fahrenheit ) as the independent variable. The estimate of the regression model revealed:

Although the estimated regression coefficient for the average weekly temperature is very small, however, the estimated standard error yielded an even smaller value. The accuracy with which the regression coefficient is estimated, is a good 6.5 times as small as the coefficient itself

745259
de