Graph database

A graph database (or graph- oriented database ) is a database that uses graphs to represent and save highly networked information. Such a graph consists of nodes and edges, the connections between the nodes. Both nodes and edges can characteristics, called properties ( eg, weight: 10kg, color: red, Name: Alice) possess. In this context, one speaks, consequently, of a property graph. This specialization on property graph to graph databases and the RDF / triple / quad stores, which are used mainly in the Semantic Web different from the classical data models of relational databases but.

Graph Databases offer a number of specialized graph algorithms to complex database queries simplify. Thus, they provide eg an algorithm to traverse to the graph, ie to find all the direct and indirect neighbors of a node to calculate the shortest paths between two nodes to find well-known graph structures such as cliques or to identify hotspots particularly strong cross-linker regions in the graph.

Basics

A simple example of a graph are relationships between people (see also sociogram ). The nodes represent people; each node is thereby assigned to the person's name. The edges represent relationships; they are characterized by a type ( knows, loves, hates ) excellent.

The above graph is a directed multigraph named. The edges with the label knows are generally symmetrical, whereas this love for relations and hates but does not necessarily agree.

Another example is network connections. Each node corresponds to a computer, switch, router. Each edge of a connection. Each connection has a bandwidth. In this case, one also speaks of weighted graphs.

Demarcation

Relational databases

Manage Relational Databases relations ( tables) and tuples (rows). Each row in a table is a record. Each row consisting of a series of attribute values ​​( features ), the columns of the table. The relation scheme shall specify the number and type of attributes for a relation. Relationships between tables are implemented by key.

With SQL, there is a uniform query language for relational database management systems. SQL allows the selection of rows with specific properties.

It is possible to display graphs in relational databases. For the above example of a social network you choose a table RELATIONSHIP for the people a table PERSON and for the edges. With SQL allows all nodes ( persons) or find edges (relationships) with predetermined properties. To find all indirect acquaintances or to determine a path between two persons, Recursive Common Table Expressions can be used ( ANSI SQL 99, Oracle 11gR2, SQL Server 2008). Thus, unidirectional and bidirectional graphs can search. If the table with the edges of a weighting can thus also the optimal ( shortest) path between two nodes determine with a SQL query.

In contrast, using graph databases more performant Traversionsalgorithmen for the selection of certain nodes. Based on one or more nodes, or select all outgoing edges are traversed.

Semantic Web, RDF (Resource Description Framework)

In the semantic web is graph databases not model which graphs based on nodes, edges and properties, but with the help of so-called triples or quads. Triples Here are the basic building blocks from which a complex graph can be composed. Quads extend Triples additional context information, which triples can be easily grouped together. The above example can be, for example, are as follows:

Alice - knows -> Bob   Bob - hates -> Alice   Bob - hates -> Dave   Carol - knows -> Alice   Carol - loves -> Dave   Dave - loves -> Carol Using these simple building blocks from ( subject - predicate -> object) very complicated graphs, but also schemes can be developed for graphs. With the help of ontologies, that is, controlled dictionaries known terms and their meaning, a computer the ability hereby may in some way be given logical to draw conclusions ( inference ) and to store new information in the graph (eg: A - loves -> B && B - loves -> A = > A - SindEinPaar -> B).

SPARQL is a standardized query language in the Semantic Web can be with the help of RDF graphs created, modified and queried.

Compared to graph databases for the property graph model of the semantic web approach is so much freer and more flexible, but it can quickly enormously lose clarity when all possibilities of RDS schemes and inference are used. This provides a property - graph has a more descriptive approach. Therefore, some RDF stores go on to offer the well-known interfaces of graph databases in addition to the classical semantic tools.

Object-oriented models

With the advent of object-oriented programming languages ​​are increasingly object databases were offered. Thus, objects can be kept out of object-oriented languages ​​directly in the database. This approach has advantages over the relational design if you want to store complex data objects that can be mapped only hard on the flat relational table structures. Graph can be mapped into object databases by keeping the outgoing edges as a list of the destination node. However, it is not possible for this procedure to assign properties to the edge itself.

Algorithms

Important algorithms for querying nodes and edges are:

  • Breadth-first search, depth-first search
  • Shortest path
  • Eigenvector

Query languages

Unfortunately, there is no standard for querying graph databases. This has led to a variety of different query languages ​​and query options. Important representatives are

  • Blueprints - a Java API for property graph that can be used with different databases graph.
  • Cypher - a query language developed by Neo4j.
  • GraphQL - an SQL -like query language
  • Gremlin - an open -source graph programming language with different graph databases ( Neo4j, OrientDB, DEX) can be used.
  • GReQL - a textual graph query language for property graphs enables you to calculate paths through regular path expressions.
  • Pipes - a data flow framework for Java -based process graph specifically for query processing on graph property.
  • Rexster - a HTTP / REST interface for accessing graph databases via the Internet, which is supported by several manufacturers.
277619
de