Validity (statistics)

(. ; Engl validity, validity ' Latin validus, strong ', ' effective ') is referred to in the first place, the argumentative weight of a ( mostly scientific) statement, investigation or theory with validity.

Is science understood as a system for the generation and refinement of assumptions about cause - effect relationships, referred validity of the validity and capacity of these assumptions. In contrast to the basic falsifiability ( falsifiability ) and verifiability ( verifiability ) of a scientific statement is a validity ( graded ) criterion for the load capacity of a particular statement. In the context of empirical investigations validity but also refers to the quality of the operationalization of the individual factors described in the causal models, the constructs.

Validity is therefore the one hand, the capacity of the operationalization ( " To what extent measures the test instrument, what it should be measuring ?"), On the other hand, the capacity of the based on the measurements of statements or conclusions ( " To what extent is it true that XY affected ").

  • 3.1 Statistical validity
  • 3.2 Internal validity
  • 3.3 External validity
  • 6.1 Literature
  • 6.2 External links
  • 6.3 Notes and references

Validity as a criterion for measuring instruments

In good measuring instruments, the measured values ​​are independent of the measurement ends; this criterion is called objectivity or observer disagreement. Also good measuring instruments deliver reliable of the same objects, the same measurements; this criterion is called reliability or reproducibility. The third criterion, the validity is a measure of whether the data generated in the measurement as intended represent the quantity to be measured. Only then the data can be meaningfully interpreted. The validity is determined by expert estimation. The quality criteria build on each other; objectivity without any reliability, reliability without any validity.

Validity as a quality criterion for psychological tests

Psychological tests can be considered as measuring instruments; Therefore, objectivity, reliability and validity are also the most important quality criteria of psychodiagnostic methods. In their Technical recommendations for psychological tests and diagnostic techniques (1954 ) proposed the American Psychological Association four types of validity, they are content validity, construct validity and prognostic and diagnostic criterion of which saw " historically and practically [ ... ] the criterion -related validity of the significant aspect " is. " The agreement by a credit rating is like all agreements are not something complete, but may be subject to constant change. [ ... ] The fact remains that each test performers leave to recognize this criterion or to reject or to look for a better one. "

Example

As a standard example of the intelligence test is often used. Looking at the three quality criteria of objectivity, reliability and validity, as it is assumed for this example that the first two quality criteria are well met: the IQ test is designed so that its result ( almost) regardless of the test conductor ( objectivity ) and the test result is also can repeat (reliability ). The validity, ie the validity of the test procedure but is often doubted, when it is criticized that the intelligence test does not ( accurate ) statement about the true intelligence (ie, the construct "intelligence" ) do and, therefore, does not measure intelligence in this way leave (see also criticism of the concept of intelligence ).

Content validity

Content validity (german content validity ) is assumed if a method for measuring a particular construct or characteristic the best operationalization of this construct is. This is for example interests and knowledge tests of the case: A class work or driving test directly represent the measured abilities. Therefore it is also called logical or trivial validity. Whether content validity is given or not, decide to experts in rating.

Construct

The term ' construct ' are understood (latent variables) theoretical property dimensions. Construct validity refers to the admissibility of statements due to the operationalization of the entire construct behind it. This is usually the case when the range of meaning of the construct is shown complete, accurate and understandable. Validity: The empirical indicators of construct validity, the convergent and discriminant (divergent or even ) are:

Both convergent and discriminant validity must be given to ensure complete evidence of construct validity. The empirical approach to convergent and discriminant validity are special cases of the criterion.

In the multitrait - multimethod analysis of convergent validity and discriminant validity will be compared using a single sample. It is expected said shortened that the convergent validity is greater than the discriminant validity.

Factors for reduced construct validity can be:

  • Vague definition of the construct
  • Mono -operation bias: only one aspect of the construct is examined
  • Mono -method bias: only one method is used for the operationalization of the construct
  • Hypothesis rates ( Hawthorne effect )
  • Social desirability
  • Expectations of the experimenter ( Rosenthal effect)
  • Omitting relevant factor levels
  • More than one independent variable is effective (see Konfundierungseffekt )
  • Interaction between measurement and treatment
  • Limited generalizability to similar variables

Criterion

Criterion validity refers to the relationship between the results of the measuring instrument and an empirical criterion (Fast, Hill & Esser, 2005, p 155). For example: A researcher examines the relationship of his new intelligence tests with the grades of the subjects in order to examine the validity of his tests. Retrieved from " internal ( criterion ) invalidity " is in this case then speaking, if the criterion another, is used as a valid recognized test. If an objective measure as a criterion (for example, psycho- physiological measures or economic variables ) or an expert rating is used, is spoken by external ( criterion ) invalidity. Also can be distinguished according to the time to be available in the accordance with the criterion:

Face validity

Face validity, also referred to as face validity depends on whether a measuring instrument laity seems plausible. Face validity says nothing about the actual validity, ie the content, criterion and construct validity, but determined by the acceptance of a measurement method. Also very little validity measuring instruments (such as unstructured job interviews ) are enjoying high face validity and are therefore often used in practice.

Validity of statements about causal relationships

Based on the related individual constructs operationalization pull in most empirical studies, researchers only in the statistical analysis and then in terms of their causal hypotheses conclusions about cause - effect relationships. The terms of the statistical, internal and external validity refers to the formation, validity and transferability of these (inductive) inferences. The Validitätsgrad these conclusions can only discuss and assess never prove, and therefore, it is - as usual - makes more sense more ' to speak of, Validitätsgrad as the presence ( or absence ) of these Validitätsformen.

Statistical validity

For statements or in empirical studies conclusions drawn (usually about cause - effect relationships ), a high degree of statistical validity is assumed if the reliability and statistical power of the measurement instruments and selected statistical method is high and generally the error variance was limited, the mathematical assumptions of the statistical methods were not injured and not individual significances " fished out " (for example, from a correlation matrix), and ( Fishing).

Internal validity

For statements or conclusions drawn in empirical studies, a high level is assumed to internal validity when alternative explanations for the presence or amount of the observed effects can be largely excluded. Internal validity (or ceteris paribus validity) is when the change in the dependent variable can be clearly attributed to the variation of the independent variable ( no alternative explanation). To ensure this, confounding variables must be controlled or eliminated by different methods such as elimination, constant and parallelization. Thus, the effects can not be attributed to characteristics of the subjects, they must be randomly assigned to the experimental conditions.

The internal validity is threatened by:

  • History. Each unplanned event between two measurements can have an unwanted effect on the subjects. Example: In the first measurement, prior to treatment with a new antidepressant, the weather is cold and rainy, the second measurement to check an effect of the drug, the weather is warm and sunny.
  • Maturation. Subjects change alone through aging between two measurements.
  • Reactivity. Subjects may react to the measurements themselves, for example, habituation or sensitization, especially when the measurement process is unpleasant.
  • Change in the measuring instrument. During a study, the properties of the measurement instruments, including measuring people change. These can measure less accurate, for example, experience in more detail or by growing boredom. The dependent variable, can pass through the treatment in a range of values ​​in which the measurement instrument is less accurate. This can lead to floor or ceiling effects. Example: To measure the effect of a cognitive training for children, an intelligence test is used. The training is so successful that the children achieve at the second measurement, all full marks.
  • Regression to the mean. This statistical artifact can overlap treatment effects if, for example, to prevent the floor or ceiling effects, subjects with particularly high ( or low ) in the respective initial values ​​characteristic of excluding from the outset.
  • Selection by inadequate randomization. Is not random assignment of subjects to experimental conditions, the experimental and control group may differ even before the treatment, so that the measurement of the treatment effect is falsified. Moreover, history, maturation and instrument effects can affect the groups in different ways.
  • Failure. If omitted subjects during the study, this may be due to the treatment. The smaller in the second measurement groups are the result of an unintentional selection.
  • Direction of causal inference. A causal relationship between independent and dependent variable is doubtful if ( in another study ), an effect of the dependent on the independent variable is also found, and this correlation can not be explained by a third variable.
  • Exchange of information. If subjects interact between measurements (for example, "I think I belong to the placebo group " ), this may have implications for the next measurement. Effects of conformity can superimpose the effects of treatment; or a group responded that their experimental conditions are much more unpleasant than that of the other group, for example with compensation or demotivation.
  • Rosenthal effects. The experimenter tells unconsciously through gestures, facial expressions and word choice more about the experiment, when the subject is allowed to know. Here, autosuggestion and suggestion can be distinguished. In the former, the investigator shall collect all conscious striving for neutrality tends to be data supporting its anticipations and hypotheses. At the suggestion of these expectations to share with the research subject who acts according to the anticipations of the experimenter and matching data provides (good subject effect).

In English, this does the mnemonic THIS MESS. This acronym refers to eight factors that represent threats to internal validity, namely testing (see reactivity), History ( History ), tool change ( change in the measuring instrument), Statistical regression toward the mean ( regression to the mean ), maturation (maturation ), experimental mortality ( failure), selection (selection by inadequate randomization) and selection interaction (interaction between selection and some other factor, such as maturation only in the experimental group ).

External validity

External validity - also generality, generalization ability or ecological validity (see Ecological fallacy ) - refers to the correspondence of actual and intendiertem investigation. Main idea here is the question of the generalizability ( induction). According to the classical view have statements or in empirical studies conclusions drawn from a high degree of external validity when can be generalized to the population of the results (a ) for which the study was designed and ( b) the specific setting of the study amounts to can transmit other designs, instruments, places, times and situations are so universal, capable of generalization. The most common hazard of personal external quality (a ) is in practical problems in the recruitment of information carriers, ie the people who are interviewed, or the time required for an experiment subjects. Is their participation forced or voluntary? How did you hear about the opportunity to participate ( through newspaper advertisement, poster, etc.)? What motivates them to participate ( they interested in the topic, they need the money, etc.)? These are filters that can reduce the quality of the sample. The most common hazard of the situation-related external quality ( b ) lies in the artificiality of laboratory experiments.

The external validity increases with each successful replication of the findings, because by the repetition with other subjects ( age group, gender, culture, etc.) or variations of the experimental conditions, the restrictions on the validity of the findings are lower. Example: As long as Pavlov had only shown that dogs at the sound of a bell mouth watering converges when the bell rang before often enough at the same time with the gift of food, he has just shown only. From the phenomenon of classical conditioning, one can speak only when many kinds of subjects show many types of conditional responses to many types of conditional stimuli. Is the method of meta-analysis are available for the statistical analysis of replication studies.

From this classical view internal and external validity are in conflict: A high degree of internal validity is best achieved through highly controlled and therefore quite artificial (laboratory) conditions. Especially realistic research designs, however, carry the risk of uncontrollable or overlooked interference. From a deduktivistischen perspective, this is only an apparent contradiction. Since both criteria were developed from an inductivist research logic out the generalization of empirical evidence (eg from an experiment ) is in the foreground. Here the question of the replicability of the results under different conditions with different samples is a reasonable question. However, a research deductivist logic has a different goal. Here is a ( general ) theory attempts to falsify the basis of a specific prediction, not how to verify a theory by sufficient observations in an empirical research logic. Contradict this logic, the observation of the theory, this is considered to be falsified. It is irrelevant if the results are "representative " in any way. If confirmed the prediction of a theory in an experiment, the theory is regarded as proved, but further tests must be subjected. Objections, challenge the validity of the results of the experiment in question, are objections to the internal validity of the experiment.

The research design has a big impact on the admissibility and validity of the causal inferences about the validities are questioned always critical in experimental and quasi-experimental research designs.

Validity in the biological nomenclature

The term " validity " refers to the biological nomenclature on the formal validity of a taxon ( a systematic unity of living beings ). The validity is given when the first description of the taxon satisfies the corresponding formal requirements ( in botany referred to as " valid publication "). In this case, the value selected for the taxon name is considered " valid " ( valid). If the assigned name of the taxon due to formal defects are not valid, it is this name is a noun nudum.

88675
de