Inter-rater reliability

Interrater reliability and interrater agreement judgments referred to in the empirical (social) research ( inter alia, psychology, sociology, epidemiology, etc. ) the degree of matching ( = concordance ) of the estimation results for different observers ( " raters "). This may indicate to what extent the results from the observer are independent, which is why it is strictly speaking a measure of objectivity. Reliability is a measure of the goodness of the method used for measuring a specific variable. Distinction can be made between interrater and intrarater reliability.

Interrater reliability

At a certain object, the same measurement is carried out by two different measuring instruments. The results should be the same. Example: A questionnaire is used by two different people for one and the same object. The case u.U. occurring deviations are identified and measured as a percentage of the inter-rater reliability.

Intrarater reliability

At a certain object twice, the same measurement is performed by a measuring instrument. The results should be the same. Example: A subject is asked twice and at different times by an interviewer.

Kappa statistics

There are a number of statistical methods that can be used to determine the inter- rater reliability. If the judge categorically are two (or more) different observers who simultaneously several objects of observation ( = cases, subjects), so can the inter-rater reliability using Cohen's kappa (for two raters ) and Fleiss ' Kappa ( for more than two Rater ) estimate. The kappa statistics examine the degree of concordance by inclusion and compared to the typically achievable by " random assessing " the extent of compliance. It is assumed that the individual estimates of raters be made completely independent. Kappa can assume values ​​between 1.0 (at high concordance ) and ( at low concordance ). They are particularly suitable for variables nominal scale level.

The use of kappa statistics is also criticized because the values ​​of these statistics usually allow by their mathematical inadequacy of any statement, Krippendorff's alpha is recommended instead.

Inter -rater correlation

For higher levels of scale, other methods use the Pearson rank correlation coefficient or shear Maßkorrelationskoeffizienten Spearman and Kendall to determine the inter -rater correlation between two raters, each paired with one another judgment values ​​in relationship can be analyzed together. The inter -rater correlation coefficient, however, merely describes a (somehow) way related the two measurements, deviations between the Urteilern play a role without that. For example, consistent mildness or severity tendencies do not play a role.

Example: Rater 1 estimates a 4 properties on a scale as follows:; Rater 2 judged on the same scale for the same objects. The inter -rater correlation is r = 1 and is perfect, although the Urteiler not match.

An alternative for ordinal data is Kendall's concordance coefficient W here, in which it comes to calculating the degree of agreement in two or more raters.

Intra - class correlation

For interval- scaled data describes the intra -class correlation coefficient (ICC, Shrout & Fleiss 1979, McGraw & Wong 1996, including: intra-class correlation, intra- class correlation ) that the two values ​​are to have the same value. It assumes interval- scaled data and is usually calculated when more than two observers are present and / or two or more observation points in time to be included.

415092
de