High availability

High Availability (English highavailability, HA ) refers to the ability of a system, in spite of failure of one of its components with a high probability (often 99.99 % or better) to ensure operation. In contrast to fault tolerance may lead to an interruption in the operation in case of failure.

Availability and High Availability

A system is referred to as available if it is able to perform the tasks for which it is intended. As the availability probability is referred to that a system within a specified period functional ( unavailable). Availability is defined as the ratio of unplanned ( fault- related ) downtime ( downtime = ) sized and total production time of a system:

Or also:

The exact definition of high availability may vary. The Institute of Electrical and Electronics Engineers ( IEEE) gives the following definition:

" High Availability ( HA for short ) Refers to the availability of resources in a computer system, in the wake of component failures in the System. "

Another definition of high availability is:

" A system is considered highly available if an application even in case of failure is still available and can be used without direct human intervention on. In consequence, this means that the user perceives no or just a short break. High Availability ( abbreviated HA derived from engl. Highavailability ) therefore refers to the ability of a system to ensure full operation in the event of its components. "

High Availability and availability classes

The question as to which class availability, a system can be classified as highly available, is answered differently depending on the definition of availability.

An availability of 99 % defined generally not high availability, it is generally now regarded as fundamental or normal, at least for high-quality IT equipment. Consequently, it is spoken of high availability only 99.9 % or higher. But whether already 3 * 9 sufficient or only 4 or 5 * 9 * 9 make a system for fault-tolerant system is to evaluate sources and depends on the manufacturer as well as under the specific application scenario. In general, a system can be classified as highly available, if its annual downtime of a few minutes (~ 99.999 % and AEC -2) or less.

If we calculate the above formula, the availability in the period of a year, this corresponds to an availability of 99.99 %, for example, an idle period of 52.6 minutes. It is now commonly used, the number of nines in the percentage to indicate the availability of class: this means the above example with 99.99 % availability class 4

For a given maximum downtime following is a summary of the relevant Classes 2 to 6, whereby a year with an average of 365.25 days, the month is calculated as 1/12 Year:

Availability Environment Classification

The Harvard Research Group ( HRG) shares high availability in its Availability Environment Classification ( AEC ) in six classes.

Agreed Period of Availability

The high availability is defined in companies often in the context of Service Level Agreements ( SLA), and represents an essential criterion for evaluating IT Services

Many fault-tolerant systems must be available 24 hours * 7 days ago, so the whole year "around the clock ". However, some of these systems have the property of high availability only for a particular time segment: trading systems of Deutsche Börse about at night and need to exchange days off not to be highly available. The high availability so that in these systems refers only to the time of day and / or the working day on which it is needed.

Requirements for high availability

Generally HA systems strive to eliminate so-called single- point-of -failure risks ( SPOF ) ( a SPOF is a single component whose failure to failure of the entire system leads ).

A manufacturer of a fault-tolerant system must equip this with the following characteristics:

  • Redundancy of critical system components
  • Fault-tolerant and robust behavior of the overall system

Typical examples of components that are used to achieve an increased fault tolerance, are uninterruptible power supplies (UPS; engl uninterruptible power supply, UPS. ), Multiple power supplies, ECC memory, or the use of RAID systems. Next, techniques for server mirroring or redundant cluster to use.

The higher the required availability, the more effort the operator must invest in:

  • Rapidly accessible technical personnel
  • Availability of spare parts
  • Preventive maintenance
  • Qualified error message and fast communication system

Highly specialized systems with the highest availability are

  • The Continuum series of Stratus
  • The Integrity NonStop series at HP, which have emerged from the acquisition of tandem about Compaq
  • The servers of the System z series from IBM.
391496
de