Survival analysis

Survival analysis (including waiting time, historical data, events (data) analysis and survival analysis;. Engl Survival Analysis, Analysis of Failure Times and Event History Analysis ) is a specialized statistical analysis, in which the time until a certain event ( " time to event " ) is compared between groups to estimate the effect of prognostic factors, medical treatment or harmful influences. The event can be death, but also any other endpoints, such as healing, disease or occurrence of a complication. Examples of such an analysis are the Kaplan- Meier estimator and Cox regression. A key variable is the hazard rate.

Names for this method

The method has been described by different authors vary. Because there are different kinds of applications, are still in use different terms, which are equivalent, and are often used interchangeably. The basic procedure is always the same.

  • In statistics, most survival analysis, survival analysis.
  • In the empirical social research to know the method as residence time analysis (also: historical data analysis, event analysis) where they are (eg, duration of marriage ) deals with changes in a social state.
  • In the engineering sciences, the method is also called reliability analysis (english Reliability Theory).
  • In English programs, it is referred to as survival analysis, Analysis of Failure Times or Event History Analysis.

Applications

This method can always be used if a mortality exists, that is, a gradual withdrawal of measurement objects from the statistical summary. It must not, by the death, but also to the failure of mechanical systems or retirement. Also, when entering positive events, ie new events for which there was no basis for measurement, the method can be applied. ( Birth of first child, occur first technical problems or warranty issues )

Examples of a survival analysis: What proportion of a population will live after a given time yet? With the rate at which the survivors will then die? What qualities or influences increase or decrease the probability of survival?

First, it is necessary to define event time ( life time). For biological systems, the lifetime ends with death. It is more difficult with the mechanical reliability. Failures are often not clearly defined and may be partially. Often it is only gradual failure that can not be so easily set time. Similar problems occur with other biological events. For example, a heart attack or organ failure are difficult to pinpoint in time.

Usually, only events are investigated, which can occur at most once per subject. An extension to recurrent events is possible.

General

Survival function

The central feature is the survival function (English Survival Function, Survivor Function) and is denoted by S. In the area of technical systems, the term reliability function is (english Reliability Function) used and denoted R (t ) for this function:

Where t is an arbitrary time, T is the time to death or to the failure of a unit, and P denotes the probability. The survival function describing the probability that the time until the death of more than t.

In general, it is assumed that S (0) = 1. If an immediate death or loss is possible, then it can be smaller than 1 this starting value. The survival function must be monotonically decreasing: S (u ) ≤ S (t ) if u> t. This function is known, the distribution function F, and the density function f are clearly defined.

Usually it is assumed that the survival function falls with increasing age to zero: S ( t) → 0 when t → ∞. If this limit is greater than zero, then eternal life is possible.

Event time distribution function and event density function

From the survival function is related variables can be derived. The event time distribution function, referred to in technical terms as a probability of default ( english Probability of failure) and abbreviated as F, is the complementary function to the survival function:

And the first derivative of F, the event density function or failure density (english failure density function) is denoted by f:

Event density function of the observed event is the rate per unit time.

Hazard function and cumulative hazard function

The failure rate, also referred to as a hazard function, and represented by h (t) is defined as the rate at which an event occurs at time T with the proviso that it is up to the time t has not yet occurred.

English Force of mortality is a synonym for the hazard function which is specially used in the demographics.

The failure rate must always be positive, h (t) > 0 and the integral over [0, ∞ ) should be infinite. The hazard function can grow or fall, it needs to be neither monotonic nor continuous.

Alternatively, the hazard function may also be replaced by the cumulative hazard function.

Is thus

Ie cumulative hazard function as

It describes the " accumulation " of Hazard ( risk) over time.

It follows from that as time increases indefinitely grows if goes to zero. Next follows that must not fall too much, otherwise the cumulative hazard function converges to a finite value. For example, no hazard function of any event time distribution, since the integral converges.

From the survival function derived variables

The remaining lifetime at time t0 is the time remaining until the death or failure time, ie. The future life expectancy is the expected value of the remaining life time. The event density function for the point under the condition of survival to just

This is the future life expectancy

For this reduces to the life expectancy at birth.

In reliability analysis, the life expectancy English mean time to failure and the future life expectancy is English mean residual lifetime called.

In which the proportion of the survivors reached the age of a given value q, can be determined by the equation S ( t) = q. t is the desired quantile. Usually one has to variables such as the median of the lifetime q = 1/2 or other quantiles as q = 0.90 or q = 0.99 is interested.

Context

In the following presentation of the mathematical relationship between the various parameters is tabulated:

Examples of survival functions

For event time models start by selecting a basic survival function. It is relatively easy to replace by a distribution function to another in order to study the effects. Is no change in the basic theory.

The choice of the specific distribution play prior knowledge about the concrete process a large role. It is analogous in about the selection of the link function in generalized linear models. Some commonly used functions are listed below.

When the function is the error function.

Estimating the parameters

Event -time models can be regarded as normal regression models consider in which the outcome variable is time. The calculation of the likelihood function is complicated because not all information is available at any time.

If birth and death are known, then in this case, the life course is clear. If, however, only know that the birth took place before a certain time, then it is called this record left censored. Just just might be known that the death occurred after a certain date. That is a right censored data. A resume may be censored in this way right and left ( interval censored ). If a person does not reach a certain age, is not observed, then the record is truncated (English: truncated ). In a left- censored data, however, we know at least that the individual existed.

There are a few standard cases for censored and truncated data. Common operating system is right censored data. Consider a group of living entities, then we know that they are alive today. We do not know their past in the future death. Left censored data are also common. We could for each subject know that it is alive today, but we do not know the exact birthday. Truncated data occur in studies with a delayed start. Pensioners could for example be observed from the age 70 years. About the people who have died before their existence is not even known.

The likelihood function for an event time with censored data model can be defined as follows. By definition, the likelihood function is the joint probability of the data given in the model parameters. It is common to assume that the data is independent of the parameters. Then, the likelihood function is the product of the probabilities for each event timing. We divide the data into four categories: uncensored, left censored, right censored and interval-censored data. We differ in the formulas with " unc ", " lz ", " March " and " i.z. ".

For an uncensored event time with the age of death, we use

For left- censored data, we only know that the death of a time before occurred

For a right censored individual we know that death occurs after the time, that is

And of interval events, we know that death occurs between and

59986
de