Computer cluster

A computer network or cluster of computers, usually called simply cluster ( from the English for " computer - swarm ", "group " or " pile " ), refers to a number of networked computers. The term collectively used for two different tasks: the increase in the computing capacity ( HPC clusters) and the increase of availability ( HA cluster ). The present in a cluster computer (even nodes, nodes from the English or server) are often referred to as a server farm.

  • 4.1 HA cluster
  • 4.2 HPC Cluster
  • 7.1 Cluster Software

Cluster categories

The term cluster primarily describes the architecture of the individual components and their interaction. Hardware or software clusters are fundamentally different. The simple form of a hardware cluster is passively known as active / active. Other variants are known as cascading. In this case, an interruption of the Services must be taken into account. HP OpenVMS cluster are able to implement a Hardware-aktiv/aktiv-Funktionalität.

Clusters or cluster software application, however, are likely to be able to realize a continuous operation (for example, DNS server). It depends on the client in the client / server architecture from, if he can deal with the switching of the service ( or service ).

A distinction is made between so-called homogeneous and heterogeneous clusters. Homogeneous computer cluster running under the same OS and the same hardware, the heterogeneous cluster different operating systems or hardware can be used. Known Linux cluster software eg HP Service Guard, Beowulf and openMosix.

Uses

High Availability Cluster

High Availability Cluster (English High Availability Cluster - HA cluster ) are used to increase the availability or for better reliability. Occurs on a node in the cluster fails, the applications running on this cluster services on another node to be migrated. Most HA clusters have 2 nodes. There are clusters in which the continuously running services on all nodes. These clusters are called active-active or symmetric. Are not all nodes active, it is called active-passive or asymmetric. Both the hardware and the software of an HA cluster must be free from single-point- of- failures (component, by an error, the entire system to fail would bring the ) be. Application find such HA cluster in critical environments where downtime of only a few minutes are allowed per year. In the context of disaster scenarios critical computer systems need to be hedged. For this, the cluster nodes are often several kilometers apart placed in different data centers. When disaster strikes, the node can take over the entire load in the non-affected data center. This type of clustering is also called " stretched cluster ".

Load -balancing cluster

Load -balancing clusters are constructed for the purpose of load balancing over multiple machines. The load distribution is usually done via a redundant, centralized instance. Possible applications include environments with high demands on computer performance. The power requirement is not covered here by upgrading single computer, but by adding additional computers. Reason for using is not least the use of inexpensive standard computers ( COTS components ) instead of expensive specialized computers.

High Performance Computing Cluster

High - performance computing clusters ( HPC cluster ) are used for execution of computational tasks. This calculation tasks are divided among multiple nodes. Either the tasks are divided into different packages and executed in parallel on multiple nodes or ( called jobs) the computational tasks are distributed to the individual nodes. The distribution of jobs takes on mostly a job management system. HPC clusters are often found in the scientific field. In general, the individual elements of a cluster with each other via a fast network are connected. The so-called render farms fall into this category.

History

The first commercially available cluster product was ARCnet, developed by Datapoint 1977. The first real success was the DEC in 1983 with the presentation of the product VAXcluster for their VAX computer system. The product not only supported parallel computing on the cluster nodes, but also the sharing of file systems and devices of all nodes involved. These properties are still lacking today in many free and commercial products. VAXcluster as " VMScluster " still in use by the company for the HP OpenVMS operating system and the processors Alpha and Itanium available.

Technology

HA cluster

The failover function is usually made ( service failover, IP takeover ) are available through the operating system. The acquisition of services can be achieved by the automatic migration of IP addresses or using a multicast address, for example.

It is generally between shared nothing architectures and shared all distinguished.

A typical representative of the "active- active" cluster with shared-nothing architecture is DB2 EEE (pronounced "triple E "). Here each cluster node includes its own data partition. A performance gain is achieved by the partitioning of the data and the associated distributed processing. Reliability is hereby not guaranteed.

The situation is different in the "shared -all" clusters. This architecture provided by a concurrent access to shared storage that all cluster nodes have access to the entire dataset. In addition to scaling and performance improvement, an additional reliability is achieved through this architecture. If a node fails, the other nodes take over its task (s). A typical representative of the shared- all architecture is the Oracle Real Application Cluster (RAC ).

HA cluster of computers can "System Image single" boot without local disks directly from a Storage Area Network ( SAN) out as a. Such diskless shared root cluster facilitate the exchange of cluster nodes, the only place in such a configuration, their computing power and I / O bandwidth.

Services must be specifically programmed on a cluster for use. A service is called a " cluster -aware", when special events (such as the failure of a node cluster for example ) and this responds appropriately processed.

Cluster software can be implemented in the form of scripts or integrated into the operating system kernels.

HPC Cluster

In HPC clusters, the task at hand, the "job", often broken by a decomposition program into smaller parts and then distributed to the nodes.

The communication between job - parts running on different nodes is usually done by means of the Message Passing Interface (MPI) as a fast communication is required between processes. These coupled to the node with a fast network such as InfiniBand.

A common method of distributing jobs on a HPC cluster is a job scheduling program that can make a distribution by different categories, such as Load Sharing Facility ( LSF) or Network Queueing System ( NQS ).

The TOP500 of the supercomputer are over 90% of Linux clusters, not least because cheap COTS hardware can be exploited also for demanding computing tasks.

194685
de