Censoring (statistics)

Censored data (also truncated ( truncated ) data, key data ( engl: Censored, Truncated Data) ) are data in which not all values ​​of a statistical variables are known.

History

Daniel Bernoulli addressed already in 1766 with the problem of censored data, as he tried to demonstrate the efficacy of smallpox vaccination by cowpox.

Types of censored data

Right censored data

If the event has not been observed until the end of the experiment, called the data censored on the right.

One can define three main types of right censored data:

Type I: In experiments with fixed start and end point of all observations are censored at the end of the experiment, if in this test object the event has not yet occurred. That is the type I are all censored observations equal to the length of the experiment.

Type II: In experiments in which the endpoint is determined by achieving a certain number of events, it is called Type II

Type III: If the start and end points of the test subjects in the experiment is not fixed, but are within the period described by the experiment, it is called type III. Observations are censored if you do not know the end point or the event has not yet occurred at the last known date.

Links censored and interval - censored data

  • Is the event at an unknown time in the past has already occurred, it is called left censored data.
  • If the event occurs unobserved between two points a and b, then one speaks of interval - censored data.

Examples and Applications

Simple example of a questionnaire is the question of the age. If not the exact age but only " years younger than ... " or " older than ... years," queried under or over a certain age, it is called censored data.

Censored data are used, for example, if the time of occurrence of a specific event should be observed ( engl.: time - to-event ), since the event in question may be already occurred, for example, before the start of the observation or until the end of the experiment has not yet occurred.

Dealing with censored data

To be able to draw conclusions from a sample with censored data, there are basically two options:

  • Missing values ​​: the records are omitted and treated as missing values
  • Estimate: information about the event, which was not observed to be estimated, usually by applying a regression to the observed values ​​.
  • A Special procedure for censored data is the Tobit model.

Pictures of Censoring (statistics)

172127
de