Data quality

Information quality is the measure of the fulfillment of the " totality of the requirements for an information or an information product, which relate to their capability to fulfill the given information needs ". Statements about the quality of information relating to, for example, on how exactly this reality describes, ' or how reliable it is, how it is thus used as a basis for planning of their own actions.

The term data quality ( as a quality measure for data) is close to the, quality of information ' very. As the underlying information, data 'is high, the data quality ' on the quality of information obtained from the corresponding data: No "good" information from bad data.

  • 3.1 Application Context
  • 3.2 Quality criteria and their relevance
  • 3.3 Importance of Context
  • 3.4 Other dimensions

Definitions

The quality of information must be distinguished from the pure significance ( the semantics) and from the formal information content ( statistical significance ).

There are a large number of quality criteria, the meaning depends on the context and the use of information and the data underlying them. Typical frequently used quality criteria are accuracy, completeness, relevance, consistency and timeliness ( particularly in communications ).

The IQ - Community ( Information Quality ) considers the quality of information ( according to R. Wang ) to the following categories and dimensions:

The German Association for Information and Data Quality ( DGIQ ) proposed on the basis of the evaluation system of R. Wang a German translation. It is recommended to use these in the German language area uniformly. A graphical overview of the 15 IQ dimensions can be found on the DGIQ page.

To optimize the quality of information in information systems, the quality of individual data sources by means of a cost function is evaluated based on different criteria. Based on preferences about the quality criteria, a request may be optimized to the information system such that the reply has the highest possible quality of the information!

Information quality may, as a general term of quality, refer to different concepts ( according to the traditional classification of Garvin )

  • Product-related; here the quality is regarded as an inherent characteristic;
  • User-related; the use of the product defines the quality;
  • Process-related; compliance with the specification is guaranteed;
  • Point; created a reference example, between price and quality.

A poor information quality can have far- reaching consequences if not detected early. Examples:

  • Hotel reservations are not found because of misspelled names.
  • Due to incomplete address information invoices are sent to the wrong people.
  • Due to translation errors are made billions ( ' billion ') trillions.
  • Poor credit by scoring process due to incorrect output data.

Quality criteria for data quality are different from those for information quality and are by:

  • Correctness: The data must match the reality.
  • Consistency: A record may and to other records do not inherently have contradictions.
  • Reliability: The source of the data must be traceable.
  • Completeness: A record must contain all the necessary attributes.
  • Accuracy: The data must be available in the respective required accuracy (example: decimal ).
  • Actuality: All Records have to represent the current state of the depicted reality.
  • Redundancy freedom: within the records no duplicates may occur.
  • Relevance: The information content of data sets must meet the respective information needs.
  • Uniformity: The information in a data set must be uniform and consistent structure.
  • Uniqueness: Each record must be clearly interpretable.
  • Comprehensibility: The records must in their terminology and structure consistent with the ideas of the subject areas.

Measures to improve the quality of data are called, inter alia, data cleansing.

Importance in different areas

Statistics

Eurostat defines data quality according to the following criteria:

  • Relevance of statistical concepts ( relevance): user, user needs, detail and subject
  • Accuracy of the estimation results ( accuracy and reliability ): Sampling error: standard error, confidence intervals
  • Non- sampling errors: nonresponse, coverage error, measurement error

Statistics or regions, sub-groups, time

  • Accessibility and clarity of information ( accessibility and clarity ): publication of data, methods, reporting, completeness (not part of the Code of Practice)

Natural and Social Sciences

In the natural and social sciences is referred particularly in terms of measurements and surveys of data quality. In this play, especially interference, the precision of measurement and the size of the data base, ie the number of measurements or surveys, a role: the less possible interference there is, the more precise the measurement, and the greater the number of measurements, the more accurate form the resultant data from the reality and therefore the better is the data quality. Here it is important to remember that a good data quality alone is not sufficient to construct a good model: this also the interpretation of the data, particularly in terms of causality must be correct.

News agencies and news services

The purpose of news agencies and intelligence agencies is to collect information from the best possible quality and to make available. It is especially critical that the quantity of the data available to those are selected which are relevant for the clients, and that these are brought into a consistent shape, without distorting the statement. In particular, mistakes and misinformation to be excluded, often by reference to several news sources are checked.

Economy

In business, information quality is of prime importance, as for example, like decisions based on information, assess market opportunities and negotiations are ongoing. All this can only be as good as the data and the underlying information. Often, the term data quality or even enterprise-wide data quality is used as a synonym for ' information quality '; However, data-related quality refers only to the stored contents of data, while information quality ' includes additional elements such as convenient selecting appropriate amounts of data, the formation of (partial) sums and / or their representation.

Multifaceted concept

Colloquially, the term ' information quality ' is often equated with ' high quality '. However, this is only partially correct and needs - similar to other quality concepts ( eg, software, water, sound quality ) or judgmental statements (how fast, bright, loud) - for reliable quality determination of a relativist view: The application context determines what quality criteria ( as a general framework ) are relevant and what specific requirements are made for each criterion. The degree of fulfillment of these requirements by the respective information results - in sum - their information quality. The quality of information is thus always context-and user- dependent, never ' isolated for themselves' to judge.

Application context

Reference point is the 'Information', for which the quality statement is to apply.

  • What information was desired / expected?
  • What details are actually expected? Only concrete Stipulated can be checked. " About ... " could hardly be exaggerated.
  • For which information quality is to be determined? What was delivered exactly?
  • For which ( n ) the information user is determined? Language knowledge (lay, specialists )?
  • What purpose does the information? " only interested ", purchase decision, need help
  • How important is this purpose? Any costs incurred, intended level of investment, vital
  • Which importance is attached to the quality of information? What happens as a function of high or low quality? Which quality criteria?

Quality criteria and their relevance

The quality of information derived from the checking or the fulfillment of relevant criteria. Nohr uses the following " dimensions of quality " (*):

  • The task relevance and purpose orientation of information: to understand? match the expectation?
  • The degree of certainty to be true
  • The credibility based on existing experiences
  • The verifiability of information: What sources are known? Are they reliable?
  • The accuracy of the information: Is it complete? This criterion is often difficult to verify. Is it consistent?
  • And timeliness of information

(*) Evaluation criteria for the quality of information are applied inconsistently. Nohr notes this: "On criteria and evaluation criteria for the quality of information, there is a perceived as a serious deficiency. " With respect to " Rolph / Bartram in 1994 ," he recalls used by British managers criteria: " rated the quality of information their decisions underlying with respect to a comprehensive eight criteria quality scale globally rather insufficient ( 1 = poor, 5 = high) ": 3.64 accuracy, credibility, 3:31, 3:18 presentation, timeliness 3:07, completeness, 2.88, 2.84 Visible priorities, relevance 2.80, 2.80 usable format.

Such questions require depending on the context in which used the information and determine their quality, a more or less detailed review. It may further research is required - which in turn provide new information (with its own ' quality of information ').

Context of meaning

An evaluative statement on Information quality is therefore simplistic and unsophisticated not possible, but can always be derived only from the degree of fulfillment of the (relevant) requirements. Deficits or gaps represent risks that are higher, the more significant is the application context and the more important is the potential impact of these deficits. Under certain circumstances, the requirements / expectations of the quality of information must therefore be considered priorisierend / weighted. For information with high importance can do this, for example, special scoring and documentation procedures (such as scoring ) are used. Conversely occur with a little ( he ) important context (such as " it interests me only " ) such considerations into the background; of information whose relevance (eg ) is not known, could - be, good quality ' attests, where this criterion is not important or not relevant - in this regard; because the (defined) requirements would be met. The quality of information may only be a flat rate, as judged out of the feeling ' in simple situations often; the judgment is then based on an intuitive assessment of certain individual criteria, is only conditionally secure and can not be justified in this scenario.

Other dimensions

In the literature are related to information quality to be found further distinctions:

  • Statements about the quality of information are possible as rating to a concrete information, but also as a target for a potential or expected to be produced information - especially here where are to define more precisely the quality-determining criteria.
  • With similar meaning Nohr distinguishes between constructive ( = quality in the production of information) and receptive ( = external audit -related information ) information quality.
  • With regard to the quality of the underlying requirements can vary according to user-specific or general requirements.
220248
de