Classical test theory

The Classical Test Theory (CTT ) is the most widely used psychometric test theory. The focus of the model of classical test theory is on the accuracy of a measurement or on the size of the measurement error. Therefore, it is often referred to as a measurement error theory. The classical test theory attempts to explain how, it can be closed from a test value of a person to the true expression of the measured personality trait.

  • 4.1 estimation method for determining the validity

Axioms

The greater the measurement error is the smaller the real part and the less reliable feature measures a test.

The first two axioms also follows:

This implies that the measurement error disappears, either when a test is applied to many individuals, or a test is applied several times in one and the same person.

Reliability

The central concept of classical test theory, the reliability of which is the reliability and accuracy (freedom from measurement errors ) detected with a test value of the true value. The reliability is theoretically defined as the ratio of the variance of the real values ​​to the variance of the test values:

Reliability =

With the variance of the measurement error-free test value and as the variance of the measurement error.

From this representation, an initially paradoxical conclusion is clear: An increase in the variability of systematic errors ( bias ) leads to an increase of reliability, because they are not but to be added to.

Estimation method for determining the reliability

The reliability, since you do not know the true values ​​can only be estimated. A method is the so-called split- half reliability, in which the test is split on Itemebene in two equal parts which are then correlated with each other, respectively. This method is basically only of historical importance.

Much more common is a method which can be described as a generalization of the split half reliability nowadays. Each item is regarded as a separate test part and correlated with the other items of the subscale. We used this often Cronbach's alpha that is also used as a measure of internal consistency. The alpha Koreffizient applies here as the lower limit of Reliabilitätsschätzung. Cronbach's alpha assumes homogeneity of the items without checking this assumption.

Another important estimation method, the test -retest reliability which represents the correlation of the same test at two different points in time. The test -retest reliability is worthless if not the interval between the two time points is given. Preposterous is the application of the test-retest reliability in changing constructs ( as would the test-retest reliability of a test that hunger as a construct captures not capture the reliability of the test, but only the volatility of appetite ). This leads to an underestimation of the reliability. Problems are also to short periods between tests, since memory effects can lead to an overestimation of the reliability.

Another method is the construction of the parallel tests. These are tests that are thought to measure the same true values ​​. The reliability can be X1 and X2, estimated by the correlation of two parallel tests. This is also called parallel test reliability. The advantage of the parallel test reliability is that neither item homogeneity as Cronbach's alpha, nor a temporal stability is assumed in the test-retest reliability, which is why you could call from theory as the ideal solution. Practically speaking, however, it is extremely difficult to construct parallel test forms, which require that the corresponding items do not differ in mean item difficulty, discriminatory power, and even foreign selectivity. This contributes to the fact that this form of Reliabilitätsschätzung is rarely applied. In certain performance tests such as However, IQ tests must already exist parallel test forms due to the risk of copying. Here you can mitberichtet as a beneficial side effect, the parallel test reliability.

Also worth mentioning is the inter-rater reliability. It is used especially in measuring techniques interview and observation to estimate the reliability. Is this for nominal scaled data Cohen's Kappa available. For metric scaled data, the intra - class correlation ( ICC) will be used. For ordinal data Spearman's Rho is a feasible measure.

Objectivity

The objectivity plays a minor role in classical test theory. The KTT is a theory whose axioms are mainly related to measurement errors. It is thus a theory of measurement error - and thus indirectly a theory of reliability, which is of course defined as freedom from ( non-systematic ) measurement error. Objectivity can hereby be understood as lower aspect of reliability, since objectivity concerns the extent to which the variance of the test score can not be attributed, starting from the experimenter and the test conditions to a variance (eg, experimenter effect). So objectivity includes measurement error that come through the study guide and the conditions of existence (as well as reliability) and can be divided into different aspects:

  • Implementing objectivity - test results do not vary due to different experimental conditions in different measurement occasions
  • Evaluation Objectivity - test scores or results in a test does not vary due to different evaluator
  • Interpretation of objectivity - The conclusions that are drawn from the test result, do not vary due to different evaluator

Especially with the last two points, the relationship is clear of reliability. Theoretically, the two aspects were quite quantitatively by interrater agreement. In practice, however, most conditions are ensured of which one believes to bring about objectivity. Thus, a standardized test is considered possible with fixed interpretation aids in the manual as a guarantor for the evaluation and interpretation of objectivity. Standardized test conditions, however, to ensure implementation objectivity. This is usually only a distinction between 'given' and ' not given'.

Validity

Analogous to reliability, the validity to be taken in the classical test theory as the proportion of the variance, which is provided exclusively by the measured construct and not a piecemeal, random error or systematic bias.

Validity =

With a variance that is due solely to the construct to be examined, as the variance of the systematic distortion (bias ) and the variance of the measurement error.

In contrast to this, an increase of the reliability of the systematic error leads to a reduction, which is intuitive.

Estimation method for determining the validity

The validity of a test is much more difficult to estimate than the reliability. This is due to the fact that validity is different than the reliability of a very non-uniform term that can be estimated in practice by a lot of different types of Root. On the other hand, there are also aspects of validity which can not be detected or quantitatively this is not so common in the practice of test construction. For the relevant test construction are three upper forms of ( psychometric ) validity:

  • Content validity: inter alia Subject whether items are actually suitable to detect a particular construct. If adopted in practice by expert judgments as either given or not given. At least there is a theoretical possibility, for example, on interrater coincidence measurements are relative to capture expert judgments to items.
  • Construct validity: Is related to the content validity. This is, however, more than the content validity intersubjectively ( empirical- quantitative) verifiable evidence that in fact the relevant construct is measured and no other. This is done in different ways: Internal structure / Factorial validity - Testable with EFA, CFA and SEM
  • Discriminant and convergent validity with dissimilar / similar types of tests that measure the same / another construct. Determined, for example, by bivariate correlations. Applicable MTMM confirmatory test such as by CFA.
  • Criterion: In practice, with the most important quality criterion. Specifies how good can be, for example, results of other tests or behaviors predict by the test result and corresponds to the correlation with the external criterion (eg correlation between intelligence and career success). It can be differentiated based on the temporal relation between test results and criteria: Retrospective validity - How highly correlated actual measurement with measurements in the past, which are caused by the same construct
  • Concurrent validity - How highly correlated a current measurement with current other measurements, which are caused by the same construct
  • Predictive validity - How highly correlated measurements with measurements that have been carried out later and due to the same construct are

Advantage of classical test theory

  • The assumptions of classical test theory are kept simple and are mathematically quite modest in contrast to probabilistic test theory
  • The KTT has already been implemented in many tests and has thus practically proven

Criticism

  • Perhaps the assumption is too coarse, since different types of errors need to be considered. Here the extended latent -state - trait model ( Steyer and others) provides a further attempt.
  • The sample dependence of reliability, item difficulty and Itemtrennschärfe is not or only insufficiently taken into account in the KTT.
  • The homogeneity of items can not be examined in the context of the KTT.
  • After Verdünnungsparadox the criterion-related validity of a test decreases with increasing reliability of criterion and validated test.
  • The classical test theory can only measure stable personality traits. Would in fact change the true value, this would be in contradiction to the second axiom, that the expected value and mean value is the error or the sum of the error equal to zero.
  • It is assumed at the level of an interval scale data, because there are means and variances calculated.

Alternative psychometric models

The evaluation of psychometric data can also occur through latent trait theories (eg Rasch model ). These may be some of the problems associated with the KTT, solve, but also create new ones ( see also probabilistic test theory ).

478481
de