Data Stream Management System

A data stream management system ( DSMS ) is a database system for managing continuous data streams. It is similar to a database management system (DBMS ), which is used for databases. Unlike a DBMS, a DSMS must also deal with the relations with data streams and can run on these continuous queries. For the formulation of queries specific query languages ​​such as the Continuous Query Language ( CQL ) can be used.

Data stream management systems are in the database world still relatively new. Some initial developments for general purposes are:

  • Stanford Stream Data Manager ( STREAM) at Stanford University
  • Aurora at Brandeis University, Brown University and MIT
  • TelegraphCQ in Berkeley

In addition, there are a growing number of small projects with different focal points. In contrast to non - streaming data, which are almost exclusively managed with universal database management systems, systems for streaming data, however, still generally used, which are specially designed or adapted for the use case.

Differences to DBMS

While traditional database systems for data analysis, the data base remains the same and it relatively different requests may be made to the system, the requests over a certain period remain the same in a data stream management system and are continually added new data. These two principles are complementary for example, in information retrieval as an ad - hoc requests ( new requests to the same document ) and routing tasks ( new documents to predefined queries ) are known (see).

The following table provides a comparison of various characteristics of a Database Management System ( DBMS) and a Data Stream Management System ( DSMS ):

Processing of streams and relations

While in conventional ( relational ) database systems, the data in tables (relations ) are managed, come in a DSMS as the basic data objects data streams added. Data streams may be considered as a continuous sequence of time - value pairs. Because streams are in principle infinite, they have to be temporarily converted for processing in relations. Conversely, relations again be converted into data streams ( see figure). The processing of pure relations can take place with conventional methods. The conversion of currents in other streams takes place via the detour of relations. Built on SQL Continuous Query Language offers different operators.

Formulation, planning and optimization of queries

Just as in traditional database systems queries are formulated in a declarative language and optimized for execution by means of a query plan. Since as many requests to be processed simultaneously, the stored queries are sent as possible combined so that part inquiries can be used repeatedly.

The components of a plan are operators, queues and states. The operators corresponding to the known from conventional database operators such as filtering, sorting, join, math operators, etc. as well as the input and output of data streams. Each operator of a plan joined by queues sequentially written into the data objects and read in the same order from the next operator. As intermediate results, there are states such as the contents of a fixed window.

Example

A news portal would like to show current news on his side on the topics currently most discussed as well as the set of messages a day. In a data stream messages come and in another data stream as "Zeitgeist " the currently important topics. Each message is assigned a topic. Concretely, the message title the last hour of the last 10 issues as well as the number of all matching messages are displayed in the last 24 hours. Formulated in CQL these are two questions:

Q1: SELECT title FROM news N [ Range 1 HOUR ], Zeitgeist Z [RANGE 10 ] WHERE N.Thema = Z.Thema

Q2: SELECT COUNT ( *) FROM news N [RANGE 1 DAY], Zeitgeist Z [RANGE 10 ] WHERE N.Thema = Z.Thema

The DSMS created now from these requests as efficiently as possible plan which might look as shown in the adjacent figure. From the news first, the title and theme are projected and come in a queue. The themes come first in a queue and from there into a window of length 10 News and windows are linked by a JOIN operator and taken to a window that contains all the news one day. For this window, the result of the query Q2 is determined by the COUNT operator. For the query Q1 joins the larger window to a smaller window with the extent of one hour.

219525
de