Voldemort (distributed data store)

Voldemort is a distributed database management system, which as persistent and fault-tolerant key-value database ( Key Value Store) is aligned and is used by LinkedIn as a high - scalability storage. The name was borrowed from the villainous Lord Voldemort from the Harry Potter novel series. Voldemort's development is not yet complete.

Benefits

Voldemort has a number of advantages over other databases:

  • It combines an in -memory cache to the storage system, so that a separate cache is unnecessary. The storage system itself, is appropriately fast.
  • It is possible to emulate the memory layer. This in turn shapes the development and testing of components very easy, as can be developed and tested against a throw- in-memory system. It is not necessary to put on a real cluster or real storage system.
  • Reading and writing scaled horizontally.
  • Simple programming: The programming decides on data replication and data distribution, and provides space for a variety of application-specific strategies.
  • A transparent data partitioning allows the cluster expansion without the redistribution of the total data.

Disadvantages

Voldemort has a number of disadvantages compared to other databases:

  • Relationships between the data can not be mapped
  • There is no query language, so keys must be known in order to determine a value
  • There are no transactions and therefore any ACID properties
  • The project is still in an early development phase, the use in production systems should therefore be well balanced against

Properties

The distributed database Voldemort has the following properties:

  • Data distribution: There is a support plugbarer data distribution strategies to allow for example a breakdown on distant data centers.
  • Data Replication: The data is automatically replicated to multiple servers.
  • Partitioning data: The data is partitioned automatically, so that the server contains only one subset of the total data.
  • Good single node performance: 10k -20k operations per second can be performed, depending on the computer, network, disk system and data replication factor.
  • Independent nodes: Each node is independent of other nodes without a central coordination. There is no single point of failure.
  • Plugbare serialization: it allows both structured keys and values ​​including lists and tuples with named fields, as well as the integration in general serialization framework. Examples of these frameworks are Avro, Java serialization protocol - buffers and Thrift.
  • Transparent malfunction: Server failures are handled transparently, so users do not notice such problems.
  • Versioning: The data are versioned to maximize data integrity in the event of a malfunction, without restricting the availability of the system.
662235
de