Google File System

The Google File System (GFS ) is a proprietary, Linux-based distributed file system that uses Google for its applications. It is optimized for Google's web search. The data is stored in some cases several gigabytes of files that are rarely deleted, overwritten or reduced. Also, it is optimized for high data throughput rates.

Construction

The Google file system is adapted to the necessary requirements of web search, which generates an enormous amount of data to be stored. JRC originated from a previous attempt Googles, which bears the name " BigFiles " and was developed by Larry Page and Sergey Brin during their research work at Stanford University.

The data is continuously stored in very large, sometimes even several gigabytes large files which are only in extremely rare cases, deleted, overwritten or compressed; Data is usually appended to or read. The file system has also been designed and optimized to run on Google's computing clusters may whose nodes consist of commercially available PCs. However, this also means that you have to watch the high failure rate and the associated loss of data of individual network nodes as normal. This manifests itself in the fact that there is no difference between normal ( halt) and abnormal termination (crash) is made: Server processes are terminated by default per Kill command. Other design decisions rely on high data throughput rates, even if this comes at the expense of latency.

A GFS cluster consists of one master and hundreds or thousands Chunkservern. The Chunkserver save the files, each file into 64 MB pieces ( " chunks " ) is split, similar to clusters or sectors in common file systems.

To prevent loss of data, each file is stored by default in the JRC at least three times per cluster. If one Chunkservers only negligible delays occur until the file again has its default number of replicas. Depending on requirements, the number can be higher, such as executable files. Each chunk is assigned a unique 64 -bit identification, logical mappings of files to the individual chunks are maintained.

The master, however, does not save chunks, but rather their metadata, such as file names, file sizes, their location and that of their copies, just on which chunk what processes are accessing etc. The masters receive any requests for a file, and return the answer the associated Chunkserver and give appropriate barriers to the process. A client may, however, cache the address of the Chunkserver for some time. If the number of available replicas by the standard number, there are also the masters that trigger the creation of a new Chunkkopie. The metadata is kept up to date by the master periodically send update requests to the Chunkserver ( "heart -beat messages", in German as: " heartbeat messages ").

Design and implementation of the JRC provide only one master per cluster. This has to be the appearance of an error in the system, which limits its scalability and reliability, since the maximum size and uptime depends on the efficiency and uptime of the Masters, as it catalogs the metadata and almost all requests pass through it; Google's engineers have, however, shown by measurements that this (at least until now ) not the case and JRC is well scalable. The master is normally the most powerful nodes in the network. To ensure the reliability, there are several " shadow master ", which reflect the main computer and, if necessary, the master should fail, to step in immediately. In addition, the Shadow Master is also available for read-only queries that yes make up the main traffic is available, so that the scalability further increased. Bottlenecks are rare, because clients ask only for metadata that is completely held in memory as a B- tree - they are very compact, per megabyte of data covered only a few bytes. By using only a master node, the software complexity is dramatically reduced, since write operations do not need to be coordinated.

272768
de