Inter-rater reliability

Interrater reliability and interrater agreement judgments referred to in the empirical (social) research ( inter alia, psychology, sociology, epidemiology, etc. ) the degree of matching ( = concordance ) of the estimation results for different observers ( " raters "). This may indicate to what extent the results from the observer are independent, which is why it is strictly speaking a measure of objectivity. Reliability is a measure of the goodness of the method used for measuring a specific variable. Distinction can be made between interrater and intrarater reliability.

Interrater reliability

At a certain object, the same measurement is carried out by two different measuring instruments. The results should be the same. Example: A questionnaire is used by two different people for one and the same object. The case u.U. occurring deviations are identified and measured as a percentage of the inter-rater reliability.

Intrarater reliability

At a certain object twice, the same measurement is performed by a measuring instrument. The results should be the same. Example: A subject is asked twice and at different times by an interviewer.

Kappa statistics

There are a number of statistical methods that can be used to determine the inter- rater reliability. If the judge categorically are two (or more) different observers who simultaneously several objects of observation ( = cases, subjects), so can the inter-rater reliability using Cohen's kappa (for two raters ) and Fleiss ' Kappa ( for more than two Rater ) estimate. The kappa statistics examine the degree of concordance by inclusion and compared to the typically achievable by " random assessing " the extent of compliance. It is assumed that the individual estimates of raters be made completely independent. Kappa can assume values between 1.0 (at high concordance ) and ( at low concordance ). They are particularly suitable for variables nominal scale level.

The use of kappa statistics is also criticized because the values of these statistics usually allow by their mathematical inadequacy of any statement, Krippendorff's alpha is recommended instead.

Inter -rater correlation

For higher levels of scale, other methods use the Pearson rank correlation coefficient or shear Maßkorrelationskoeffizienten Spearman and Kendall to determine the inter -rater correlation between two raters, each paired with one another judgment values in relationship can be analyzed together. The inter -rater correlation coefficient, however, merely describes a (somehow) way related the two measurements, deviations between the Urteilern play a role without that. For example, consistent mildness or severity tendencies do not play a role.

Example: Rater 1 estimates a 4 properties on a scale as follows:; Rater 2 judged on the same scale for the same objects. The inter -rater correlation is r = 1 and is perfect, although the Urteiler not match.

An alternative for ordinal data is Kendall's concordance coefficient W here, in which it comes to calculating the degree of agreement in two or more raters.

Intra - class correlation

For interval- scaled data describes the intra -class correlation coefficient (ICC, Shrout & Fleiss 1979, McGraw & Wong 1996, including: intra-class correlation, intra- class correlation ) that the two values are to have the same value. It assumes interval- scaled data and is usually calculated when more than two observers are present and / or two or more observation points in time to be included.

Empirical research Cohen's kappa Kendall's W Intraclass correlation

415092