A data warehouse (DWH ) is a database that summarizes the data from different sources in a uniform format (information integration). This comfort is improved access to these data. The data provided by the data sources and loaded into the ETL process into the data warehouse and there mainly for data analysis (OLAP) and business decision support in enterprises and stored for data mining over the long term. The term comes from the information management in business computer science.
The creation of a data warehouse is based on two guiding principles:
The data warehouse is the central component of a data warehouse system. Data is extracted from different sources, adjusted by transformation and unified, to be subsequently loaded into the data warehouse (ETL ) process. This process can be carried out as scheduled, so that not only data according to content, but also on the aspect of time in data warehouse - are kept, which allows analysis on the time - so long term.
There is currently no standard definition for the term " data warehouse ". Largely, however, the following applies:
- A data warehouse provides a global view of heterogeneous and distributed databases while relevant to the global summary data are merged from the data sources to a consistent database.
- Thus, the content of a data warehouse is created by copying and processing data from different sources.
- Usually, a data warehouse is the basis for the aggregation of operational metrics and analysis of multi-dimensional matrices ( OLAP cubes), the so-called online analytical processing (OLAP).
- A data warehouse is often a basis for data mining.
- In general, the applications use application created specifically excerpts from the data warehouse, the so-called data marts.
Differences in definitions can be found mainly in the general purpose of a data warehouse as well as the scope and handling of the data in the data warehouse.
- The range of definitions begins with the restrictive view of Inmon: " A data warehouse is a subject- oriented, integrated, chronologisierte and persistent collection of data to support management in its decision-making processes. " In the original ". A data warehouse is a subject - oriented, integrated, time -variant, nonvolatile collection of data in support of management 's decision- making process " (Ref.: Inmon (1996 ), page 33) The definition in Inmon can be interpreted as follows: subject - oriented ( topics orientation): The selection in the data warehouse to the receiving data is done according to certain data objects ( product, customer, company, ... ) that are relevant for the analysis of key performance indicators for decision-making, however, not by operating processes
- Integrated ( unification ): The different structures in different ( operational ) source systems, data is stored in the data warehouse in a standardized form.
- Time -variant (time orientation): analysis of temporal changes and developments to be made possible in the data warehouse; therefore the long-term storage of data in the data warehouse is necessary (introduction of the dimension "time ").
- Nonvolatile ( resistance ): Data are permanent ( non- volatile) saved.
The restriction " physically " is necessary to delineate the data warehouse of the "logical" federated database system.
History of the term
The Data Warehouse term was coined in the mid -1980s at IBM and is denoted by "information warehouse ." The term " data warehouse " was first used in 1988 by Devlin. More recently, data warehouse systems are also referred to as a business warehouse systems (eg SAP) or as a business intelligence systems ( analysis -oriented view ), whereby the business importance of such systems should be emphasized. Meanwhile, the term " data warehouse " in German literature will be used.
Operation of a data warehouse ( data warehousing )
The entire process of data collection, management and analysis of data warehouses is also referred to as data warehousing. For data warehousing include:
- Data retrieval, data integration (staging) and further processing in the ETL process
- Data management, ie the long-term storage of data in the data warehouse (see also long-term archiving )
- Data evaluation and analysis
- Power and data storage necessary for the analysis of separate data sets, the data marts.
In the data marts, the data are often stored as multi-dimensional matrices in the so-called star schema or related data schemas like snowflakes and Galaxy schema. Also conceivable are mixed forms as the star flakes or Star Flake Schema, combine the advantages of the aforementioned models.
In recent years, more and more turning away from regular excessive loading through to real-time data warehousing has been completed. Some industries, such as telecommunications and retailing industries had need for readily available data while preserving the separation of operational and evaluating systems. Real -time data warehousing is the prerequisite for the active data warehouse (English Active Data Warehouse, associated process Active Data Warehousing, both short ADW ). In the active data warehousing are firstly the results of the analysis of time - and event-driven communicated to interested receivers, on the other hand allows the Active Data Warehousing direct control of operational processes such as Workflows. In addition to the loading of the data warehouse with timely data to Active Data Warehousing includes the immediate playing back the results into operational systems. This analysis results of data from the data warehouse, in turn, affect the the data warehouse supplying operational systems; one speaks therefore of the closed loop.
Data Warehouse Applications
- Integration of data from differently structured and distributed databases in order to allow a global view of the source data, thus overlapping evaluations
- Identifying hidden relationships between data through data mining
- Fast and flexible availability of reports, statistics and indicators in order to recognize about relationships between market and services can
- Detailed information about business objects and relationships
- Transparency over time to business processes, cost and resource use
- Provision of information, for example for the creation of product catalogs.
The 59th Conference of Data Protection Commissioners of the Federation and the Länder of 14-15. March 2000 has, in its resolution on data warehouse, data mining and data protection on the legal risk to go, which is associated with these methods. In particular, the fundamental right to informational self-determination and the protection of privacy is in danger. Reason is the possibility of the said method to store personal data on their earmarking out and take advantage of what is unlawful under certain circumstances. Our recommendation is to focus on techniques that use an anonymous or pseudonymous form of the original form of the personal data.