Time variance

Under temporal data management (also called historicization ) is understood in the information technology, the retention of the temporal development of the data in storage in a database.

Often it is sufficient to store only the amount of current (today) valid value in a data set, with a change of the old data value is simply overwritten. However, if the requirement is to document all the changes, a temporal data management is required. This allows to reconstruct what value to what point was valid, or - in less frequent cases - will be valid only in the future.

In a temporal data management two types of temporal consideration are relevant:

  • Validity Time: The period of time in which a data element in the real world is valid. Example: A item at the price of 1.95 € more expensive to June 1, 2006 to 2.25 €.
  • Transaction time (including working time): The time at which a data item has been stored in the database. Example: The above price adjustment of the article was edited on 25 May 2006 and included in the dataset.

In some cases, this fact, both types are relevant for this purpose is also used the term " bitemporal ". This applies, for example, following which relate to the examples above question: What price was a customer named on 20 May 2006 for the purchase of the article, the purchase should take place only on 15 June 2006?

Details for temporal data management (often in conjunction with the archiving of data ) are defined as requirements for auditability of information systems - such as how long changes must be detectable.

  • 3.1 Time stamping
  • 3.2 Determine the primary key
  • 3.3 Integrity Checks
  • 3.4 Simplifying pictures
  • 4.1 Possible variants
  • 4.2 Criteria for outsourcing
  • 4.3 Ensure consistency
  • 5.1 Data Warehouse
  • 5.2 Payroll
  • 5.3 Risk management in banking

Figure temporal data in database systems

In the imaging of temporal data, the following variants exist:

  • Use of temporal databases These are database systems, where the system side already provide more support temporal data management is present, which goes beyond the support of time-related data types. Currently exist for this but only prototypes and it is still not a commercial database system available that would reflect the requirements of the temporal data storage extensively.
  • Using spatio - temporal databases These are database systems, which are designed besides supporting time-dependent data for the storage of spatial information. In this case, however, the focus is on the spatial information. Are used such databases, for example in the field of traffic telematics.
  • Figure in traditional relational databases Since the support of temporal data types is given in conventional relational database systems, temporal data can be stored in such databases, in principle, the temporal attributes are mapped as "normal" attributes. The discussion of the temporal aspects must be effected by the application program, or used by an application development framework.
  • Figure in other databases (eg object databases ) For other database systems, especially the object-oriented databases, there is no comparable with relational systems uniformity and distribution, so that a general statement about the figure of temporal data is not possible. Yet surely can be most of the aspects set out below be transferred to such databases.

The following section describes general properties of temporal data management are considered, which are broadly valid for all above pictures. This is followed by detailed explanations of the figure of temporal data in traditional relational databases. For more information about the other variants, see illustration may help explain the database management systems themselves

Definitions and terminology

The most important terms are explained in terms of temporal data management. These explanations are essentially in keeping with the so-called "consensus Glossary" ( see links).

Valid time and transaction time

As mentioned in the beginning designated validity period ( Valid Time) the time or period when a state of affairs in the modeled representation of the real world ( the reference world) applies. It may be relevant to both future and past periods.

Are both valid and transaction time relevant, we speak of bitemporaler data management. In this context should also be noted that the transaction time from the system ( eg the database system ) can be determined, the validity period, however, must be specified by the user.

The consensus glossary, the notion of user-defined time (user -defined time) exists. This should otherwise time values, which represent "normal" attributes in the database ( such as a birth date ) are recognized by the temporal data storage. This name is unfortunate, since the validity time is set by the user.

As a snapshot ( snapshot ) is called the consideration of temporal data at a fixed date for the validity and possibly also the transaction time. In most cases, such a snapshot refers to the current time. A database without temporal support offers only such a snapshot, therefore, such databases are also called snapshot databases.

Time points, the time intervals and time periods

A point in time (instant) represents a point on a time axis used dar. There, different granularities are used (eg, the exact day, the exact second ).

Times may be anchored (fixed) or be floating. An anchored time refers to a specific time or a specific interval in the time scale. For a period of time is a floating time interval.

Time- anchored intervals are also called periods (Time Period). The term interval without additional specification is ambiguous in this respect, especially because the term is used in SQL for a period of time.

The smallest possible time interval at a given granularity is called a chronon, at a date that would be for example a day.

Time stamping

As time stamp is referred to, the addition of a time reference to a data attribute and a data line, referred to in connection with a relational database as tuples. Here, validity, transaction and bitemporal time stamping can be distinguished.

Generally, it is not defined only once, as this time reference is mapped technically. It may be a simple time stamp which expresses the validity of a single specific point in time. More often, however, the time reference is shown in the form of a time interval. The most common form is in this case a so-called temporal element that represents a set of one or more time intervals.

This time reference is referred to by consensus glossary as timestamp ( timestamp ). Here it is important that this should not be confused with the conventional meaning of that term here, because this is a much more abstract form of a timestamp.

With the filling of timestamps, a distinction explicit and implicit time stamping. The explicit time stamping, the time stamp is not supplied by the database system, but must explicitly by the application program (or used by the architecture, such as a database trigger ) are supplied. An implicit time-stamping is only available for "real" temporal databases, can take on the database system usually this task. It should be noted that this can drain completely encapsulated only in the case of transaction time in the validity period, the indication of the validity period must also be explicitly given, temporal databases offer this but often special interfaces and hand do not have the time stamp as a normal attribute.

Tuple -versus - attribute time stamping

With the introduction of a time-stamping, the question arises, what level you completed this. Basically, should each attribute that is not synchronized is the same with another, versioned by itself, that is to be provided with its own time stamp. This approach is referred to as attribute - timestamping.

The technical administrative burden for an attribute timestamping, however, is considerable, so that one often all the attributes of a data line ( a tuple ) versioned together, although the attributes over time do not behave synchronously. This is called a tuple time stamping.

The decision of which method you choose depends mainly on the change in frequency of the individual attributes. Typically, however, is versioned attributes with high frequency of change rather on their own, with little change frequency attributes in common.

Temporal normalization ( Coalescing )

With the tuple time stamping but it is necessarily the problem one that you, successive periods obtained with analyzes that evaluate only one of the common versioned attributes, which do not differ in the attribute values ​​. One would then like to present those periods. Such a summary is called a temporal normalization (also Coalescing ).

The use of the tuple time stamping is not the only reason why you need a temporal normalization. For example, this normalization is also required when evaluating only bitemporal data separately according to the validity of the transaction or time. Moreover, such a summary be desirable even if you linked several provided with timestamping tuples with evaluations ( join).

Formally we mean by temporal normalization is a summary of similar attribute values ​​whenever the data types used for time stamping according to the is possible. When using time intervals for time stamping so successive intervals with the same attribute values ​​are summarized, with the use of temporal elements even all periods are combined, to which a particular attribute value occurs.

In the following example, the price and the target stock are versioned together for the product data. If both the price and the target stock by itself evaluated without normalization, one would get four separate intervals. Due to the temporal normalization can be doing two single intervals combined into one.

Figure in traditional relational database systems

The figure of temporal data in traditional relational databases is possible, yet there is no standardized approach to implementation, since this is highly dependent on the specific requirements. The increasing complexity and the penalties Figure temporal data in comparison to a " conventional " snapshot imaging are considerable, so that there are also various simplistic illustrations, each depending on the situation possible. With the support of temporal data has the following disadvantages:

  • The volume of data increases significantly. Under certain circumstances, a cleanup function is necessary that archives older data or deletes.
  • Access to the current value becomes considerably more complex ( implementation effort, performance), this is especially important, therefore, as this is by far the most common form of access in many cases.
  • From the ER model (without temporal analysis) derived constraints can not be mapped easily by means of the definition of primary keys and use the referential integrity.

It is therefore not appropriate to support a temporal mapping for data for which this is not absolutely necessary due to the requirements. This also means that in the general introduction of time dependence in a data model is not always all relations should be changed.

In the following, various conceptual aspects of a full-fledged illustration temporal data are first presented. Finally, also still carried the discussion of various simplistic illustrations.

Time stamping

Time intervals are used for time stamping normally. Since there is not a ( fixed ) time interval as an explicit attribute in relational databases, this two -time attributes to be included (eg of type DATE or TIMESTAMP ) in the table definition, define the start and end of the time interval, in the bitemporal case each separately for valid and transaction time. There are two basic approaches:

Both approaches have their advantages and disadvantages:

  • Intersecting lines can be inserted without violating the integrity of the database. It is a separate validation is required ( for example, by database triggers).
  • There is a special value for the identification of an indefinite validity required. For this purpose, either NULL or the possible minimum and maximum date can be used (see below).
  • In a value change adaptation (termination ) of the previously valid line in addition to inserting a new row may be required.
  • The determination of the data at a specific time (usually the current time ) is difficult and requires a subquery ( subselect ).
  • The end of the validity or validity interruptions can be mapped only in that all data attributes of the line is empty ( NULL) to be set.
  • Bitemporal in case the point representation can not be used for both time dimensions ( see example).

In the Interval Representation one also has the choice to use a closed or a right-sided half-open interval, that is, the end itself is no longer part of the interval. It speaks much for the latter variant, otherwise in the determination of imperviousness of the intervals always a Chronon would have to be added to the end, which is independent of the database would not be possible, for example, the data type TIMESTAMP.

The following example shows the SQL queries for both variants in the case of a valid time stamp dar. The current prices for items are (identified by item # ) determined. The validity start each is expressed by the column 'valid for the Interval Representation, the expiration date is defined by UngueltigAb ( so it is a right-sided half-open interval). In addition, it is assumed in the case of permanent validity end that the maximum possible date is entered.

SELECT Item, Price     FROM articles AS a    WHERE a.GueltigAb < = CURRENT_DATE      AND a.UngueltigAb > CURRENT_DATE   SELECT Item, Price     FROM articles AS a    WHERE a.GueltigAb = (SELECT MAX ( valid from )                           FROM articles                          WHERE Item = a.ArtNr                            AND 'valid < = CURRENT_DATE                        ) Still a bit more work is designed such a query in the bitemporal case. Also in this example the right side half-open intervals are used, the interval of transaction time is defined by ErfasstAm and GeloeschtAm. At the point representation is to be noted that it is not possible to merely add the beginning of the interval as an attribute for the transaction time, here then an interval must be used, at least at the transaction time for a reasonable interpretation is also possible. In contrast to the above example is here now is not the value to the current date (expressed by CURRENT_DATE ) but at predetermined time points are determined, expressed by the parameter: VorgGueltigkeit and: VorgDatenstand.

SELECT Item, Price     FROM articles AS a    WHERE a.GueltigAb < =: VorgGueltigkeit      AND a.UngueltigAb >: VorgGueltigkeit      AND a.ErfasstAm < =: VorgDatenstand      AND a.GeloschtAm >: VorgDatenstand   SELECT Item, Price     FROM articles AS a    WHERE a.GueltigAb = (SELECT MAX ( valid from )                           FROM articles                          WHERE Item = a.ArtNr                            AND 'valid < =: VorgGueltigkeit                            AND ErfasstAm < =: VorgDatenstand                            AND GeloeschtAm >: VorgDatenstand                        )      AND a.ErfasstAm < =: VorgDatenstand      AND a.GeloeschtAm >: VorgDatenstand In the example above, it should be noted that it is assumed for the end of the transaction time, that for the currently valid line the maximum possible date is entered. However, is often used for this case instead uses NULL as then the query on currently valid ( undeleted ) data is easier ( GeloeschtAm IS NULL ). Becomes more complicated in this case, but the query on past data objects, such as in the example above. There, each column GeloeschtAm should be replaced by the following construct:

CASE WHEN GeloeschtAm IS NULL THEN '9999 - 12-31 'ELSE END GeloeschtAm It is '9999 -12- 31 ', the maximum possible date, this value is, however, dependent on the database system used.

Determine the primary key

In temporal relations expressing evolution of object states, it is not possible to tuples ( rows of data ) without identifying the time dimension into account.

When using the Point Representation for time stamping is simply to extend "normal" primary key nor the validity start defining attribute. For the example of the article so the validity start date would be next to part number yet included in the primary key.

When using the Interval Representation, several alternatives present. In this case, either the start or the end of the interval to be included in the key. This decision still plays a role, the value is defined as a proxy for an indefinite validity:

  • The primary key of a relation is defined on the one hand to guarantee uniqueness, on the other hand is the definition of the primary key, an optimization question, since the access to the data, this key is used as the index to find the matching rows.
  • In particular, the transaction time stamping currently currently valid line is often required. This is the line at which the expiration date is indefinite. This would argue for the inclusion of the validity end in the primary key.
  • In conventional relational database systems, NULL is not possible as the primary key column, therefore, the validity end not be included in the key, if you choose this variant figure for an indefinite expiration date. Alternatively, the use of the maximum possible value establishing a permanent validity end would then offer.

You should also remember that when you use the Interval Representation none of the variants a Intersection freedom of rows ensure this is to consider separately.

For performance reasons, an additional attribute is sometimes introduced as a surrogate key (artificial key ) to obtain the shortest possible identification of a row of data in temporal relations instead of the composite key. In many database systems, there is a possibility of the automatic allocation of unique identifiers for such surrogate key.

Integrity checks

As already mentioned have constraints that are normally covered by the primary key uniqueness or referential integrity are hedged elsewhere in temporal data. To this end, the following variants are to:

  • Definition of integrity constraints (constraints or assertions )
  • Database Triggers
  • Database procedures ( Stored Procedures)
  • Provide other central functions for database updates, for example as part of a framework, which include the necessary tests.

In particular, when using constraints and database triggers, the problem can arise that the integrity can be temporarily injured during a multi- database operations update and is respected again for an update event after performing all database operations. For this purpose, the database system must provide the ability to perform integrity checks only at the end of a transaction.

Using the representation interval testing for freedom of lines overlap, as already mentioned, is required for an object. In the following, an exemplary SQL query, the overlapping entries for rows to articles (identified by item # ) from a table provides the name of article, the validity time interval is expressed by 'valid and UngueltigAb.

SELECT * FROM products AS x, y AS items    WHERE x.ArtNr = y.ArtNr      AND x.UngueltigAb > y.GueltigAb      AND y.UngueltigAb > x.GueltigAb      AND '' condition (s) to the exclusion of the same line IN x and y'' The latter condition is necessary so that a row is not diagnosed as overlapping with itself. How exactly the condition is to formulate, depends on the selected primary key, some database systems here support special functions or data types to identify a table row.

If it is ensured via the primary key is that it can only be an entry to a validity start date, a simplified test is also possible in the following ways:

SELECT * FROM products AS x, y AS items    WHERE x.ArtNr = y.ArtNr      AND x.UngueltigAb > y.GueltigAb      AND y.GueltigAb > x.GueltigAb This approach also has the advantage that the two lines of intersecting pair are supplied only as a result line.

Still a bit more work is designed to test the referential integrity in the temporal sense if both the referencing table ( the parent table ) has a time stamp ( the Dependent table ) and the referenced table. It must be checked for each line of the Dependent table, whether

Only if all three conditions are met, there is no violation of integrity.

The example shows two catalog entries ( Dependent ) reference the item ( Parent). Here, a catalog entry for the period March 1, 2006 to August 31, 2006, and another catalog entry, which shall apply from 1 September 2006 and is currently a current catalog entry exists. For the entire time required by the calendar periods must be ensured that the referenced item exists, in these periods quite price changes of the article take place (the price does not seem to be represented in the catalog). Both catalog entries thus relate to two different lines of the article. In this example, note that the Products table, both the price and the existence of the article defined, ie that the article at the appropriate time in the range is.

Simplistic pictures

As an alternative to transaction time stamping, for the case that must be resorted to only in very rare cases to older data, a logging ( logging) the database updates the reconstructability older data constellations can be secured. However, the use of corporate functions for database updates is necessary for the implementation of such a logging process. In addition, functions must be provided to reconstruct by evaluating the Protocol old database objects.

The validity time stamping, it is useful for the case of cyclic data changes, rather than the intervals for time stamping normally used to directly use the identification of the validity period (eg, at an annual cycle of variation can be used year as an additional primary key component alone). This approach also has the advantage that the questions asked by some database systems available concepts of partitioning can be used to increase efficiency.

If the temporality exclusively for this is necessary to document what data items a particular evaluation function has been executed, and the execution of the event itself can be used as a time stamp. Note, however, that this approach is questionable, when such two different evaluation functions exist that involve similar data types.

Another approach to facilitate access to current data is to outsource the history in separate tables. This is especially interesting in view of the performance when accessing the actual data. This procedure is for both the transaction and the validity time possible, for the latter only if no valid data in the future are considered. Moreover, the price is relatively high, since all routes must be duplicated.

Archiving and cleanup

A temporal data management inevitably leads to a constant increase in the volume of data because obsolete data are indeed deliberately not removed from the database. In this respect, the introduction of any temporal data management in the longer term it is necessary to think about a method for database archiving, in order to allow an adjustment of the operational database.

Possible variants

In contrast to non-temporal data archiving is in temporal data management is not required to reconstruct deleted or altered states of an object, since this is already possible through the use of a transaction time stamping.

For this reason, depending on the situation as the simplest variant, Cleanup ( Vacuuming ) the temporal database sufficient, ie a deletion of older no longer required data. However, here is necessary that the cleanup process, the consistency of the database maintains, which is a major aspect even in a "real" application- oriented database archiving.

For the application-oriented database archiving there are again different methods. A principal feature of classification of these variants is the distinction between an independent and an integrating archive. A stand-alone archive in this case represents itself a database that is internally consistent and can be accessed when needed directly. An integrating Archive contrast, only serves to play back the data in the operational database as needed ( copy-back or Move- back).

Criteria for outsourcing

It is necessary to define possible clear and comprehensible criteria that define when a data item is removed from the operational database. In bitemporal databases to offer this first to the transaction time, that is, for example, all records where the transaction time end is older than a specified date are removed from the database and optionally archived.

The use of transaction time has the particular advantage that it is assigned by the system and is not influenced by the user, so it is not possible that new data collected relate to a period that was actually already outsourced to the archive.

The use of the transaction time is not possible if only the validity period is managed. Moreover, it is in addition to the transaction time to use as a criterion in the bitemporal case certainly often necessary, the validity period.

Ensure consistency

The cleanup and archiving process must not compromise the consistency of the operational database. This also applies to the database used for archiving the case of a standalone archive.

With regard to temporal data this means the following:

  • Intervals for the validity time of an object or transaction may not overlap.
  • For relationships between objects, referential integrity must be preserved also in the temporal sense ( see also integrity checks )
  • The data should be normalized temporal, that is, there is a coalescing must be carried out if necessary.

Removing data from the operational database, a cutting of intervals is necessary to ensure the referential integrity under certain circumstances.

The adjacent example illustrates this: In the database outsourcing of all data to be made, the validity of which is before 1 July 2006. This means that the article version can be removed with the price of 1.95 € entirely from the database. But in order to continue to ensure referential integrity now, the article version must be cut at the price of 2.25 €. The same applies to the catalog entry " Spring / Summer 2006". Now, if the integrity is to be preserved in the archives for the case of a separate archive, the cut-off counterparts of the intervals must be included in the archive exactly. This also makes it clear that in the archive database Coalescing is necessary then after each subsequent archiving, since in the next archiving process is truncated at the previous archive parts of the intervals are archived.

Most problematic in this context is the re-storage, especially if the restocking is only partial and not across out for a whole period, and also the re- stored data is also kept in the archive ( copy-back ). Then some measures to ensure the consistency must be taken, since then, for example, can also occur overlap between the time intervals of an object.

Typical areas of application

The following is a list of typical applications for temporal data management. However, this list does not claim to be complete.

Data Warehouse

In a data warehouse is a database that was created primarily for the analysis of the data and of one or more other systems ( mostly operational databases) is fed. Typically, a data import to transfer the data from the operating systems is carried out in the data warehouse periodically.

In such a configuration also offers to supplement the supplement the time dependence of the periodic, import the data. The validity start date is the date of the import. This has the advantage that the operating system is not burdened with the complexity required for temporal data management, time-dependent analyzes of the data warehouse but are still possible.

As the import is in this case (eg monthly ) carried out usually by a constant cycle, no intervals must be used for time stamping, but the validity period can be identified by a single attribute (see also simplistic illustrations). Furthermore, can be built up a dimension table for the period within the meaning of star schemas, which enables evaluations as part of the Online Analytical Processing (OLAP).

Salary statement

A typical case for the need bitemporaler data is a payroll. Here, among other things, an employee has to be timed correctly held to a content group ( validity period ). For subsequent correction (transaction time ) of an associated content group or even the assignment period, this group must remain comprehensible basis on which a billing process has operated on.

Risk management in the banking sector

In particular, the rules that are defined by the Basel II Regulation, credit institutions and financial service providers must be able to be clearly documented, based on what information was taken (eg, equity and credit ratings ) which decision.

This requires a comprehensive transaction time stamping, sometimes bitemporal figure. The latter is, for example, the rating of a borrower is required, since it should be noted, first, at which time such assessment was conducted by a rating agency ( Evaluation Date ). On the other hand also must be documented as to when this new assessment was known to the credit institution and included in the dataset.

286017
de