Metadata

Metadata or metadata is data that contain information about the characteristics of other data but not the data itself

The data described by metadata is often to larger collections of data such as documents, books, databases, or files. Thus, administration of data of a single object (eg personal names ) are referred to as its metadata.

Examples

Typical Metadata about a book, for example, the author's name, the edition, the year of publication, publisher, and ISBN. The metadata of a computer file include the file name, access rights and the date of last change.

Distinguish data - Metadata

While the concept of metadata is relatively new, the principle of referral and formal specifications is already centuries of library practice. A general standardization of formats for metadata is not tracked. A valid distinction between metadata and ordinary data exists but only for the special case, as the term is a matter of opinion. For the reader of a book, the content is the actual data, while the author's name or number of the edition is metadata. For the publisher of a book catalog, these two properties but are directly part of the book and are therefore regarded by him as the actual data.

Purpose

If one tries to distinguish between data and metadata, so it is helpful to introduce the "purpose" as a term. The purpose determines the outcome; To be able to fulfill a specific purpose - to achieve a certain result -, metadata is needed. The result can consist of data, in particular metadata may be part of the result in their role as data.

  • Purpose: To search within a library for all locations ( signatures ) of available books by a specific author
  • Metadata: " author name " and "available"
  • Result: The " signature" ( about the signature is the location of potential development )

Use

In many cases, no conscious separation between object and meta-level takes place. For example, one speaks of it to look in a catalog a book and not only its metadata. The use of metadata is often expected that they are a closed, self- descriptive system by direct coupling with the user data inseparable components.

Metadata is often employed to describe information resources and thereby make it easier to find and establish relationships between the materials. This is usually only requires a development with a certain degree of standardization (for example by Librarian of rules ).

Storage

For the storage of metadata, there are various possibilities.

For storage and transmission of metadata there are a number of data formats and data models ( data models, such as Dublin Core, can be expressed in different formats).

Interoperable Metadata

" Interoperable " means, in technical language additions initially " designed so that this worked, can be operated on ." The prefix " inter" comes from Latin and means something like "between". Interoperable metadata are therefore potentially metadata from different sources, between which ( "inter" ) a relationship in such a way is that working together with them ( " surgery " ) can be.

Standards for interoperable metadata have the task of making metadata from different sources available. First, include the following aspects:

  • Semantics
  • Data model
  • Syntax

The semantics describes the meaning of which is defined in the rule of standardization bodies ( cf. Dublin Core). The data model specifies which structure may have the metadata. As a " data " can be interpret statements in connection with metadata, are taken via an object to be described (document, Resource, ...). As a " model " component of the concept of data model can interpret a description of how the statements are designed structurally (the term data model does so in the context of metadata as much as "grammar" or " structure of statements"). Examples of data models of metadata is a simple attribute / value combinations (for example, HTML meta elements ) or sentences with subject, predicate and object (eg triples in RDF). Finally, the syntax is used to represent the generated according to the data model statements. Example of a representation format is XML (eXtensible Markup Language).

Now following relationship exists between these three aspects: The semantics is represented by constructs of the data model. The data model is in turn represented by syntactic constructs. The syntactic constructs are ultimately composed of characters of an agreed character set (such as Unicode). These three aspects can be understood as hierarchically superimposed layers, since each layer builds up on each of the underlying layer. The layers are independent of each other, i.e., the use of a certain standard in a layer is independent of the other layers (such as the layers of the network communication model, such as the ISO / OSI model). Thus, a particular semantics are represented by constructs of different data models (eg, attribute / value combination, triple ), which in turn can be represented by different syntaxes ( graphs, XML formats).

Orthogonal to these layers is the fourth aspect of the identification, which affects all three layers. In order to process metadata from various sources makes sense, must be clearly labeled to which semantics which data model and what syntax is (worldwide). For this purpose, an identification mechanism is needed as provide him the URI (Uniform Resource Identifier).

All four aspects - semantics, data model, syntax and identification - are required to create standards for interoperable metadata. They can therefore be classified together in a framework. Thus, a framework provides a kind of backbone or skeleton, which already describes the main elements or components of a system and their relationships, but without making any precise instructions regarding their design. Thus it acts as a kind of " reference system" that allows the meaningful integration of new components. As a framework shown elements and their relationships, it can be easily visualized by the arrangement of graphical elements. Figure 1 shows a framework for metadata on a meta level. In contrast to concrete expressions of frameworks, so that is the markedness - or instance-level, describes a framework for the meta-level, a generalized framework - as indicated by the generic names of the ingredients.

As an example of a concrete framework for metadata is called RDF (Resource Description Framework) W3C (World Wide Web Consortium ). RDF contains all of the above four aspects with specific characteristics, as shown in Figure 2.

The components in detail:

  • Semantics: Domain-specific semantics can be imported via namespaces, so that the semantics of an RDF vocabulary can be extended
  • Data model: RDF has a fixed data model that statements about resources in the form of triples with subject, predicate and object allows
  • Syntax: To represent such statements can be any syntax used RDF / XML, graph, or the N- triple notation; However, RDF / XML is the normative syntax
  • ID: URIs be made compulsory as a universal identification mechanism

Following the idea of a framework defines RDF itself does not domain-specific semantics, but only specifies a mechanism, as can be integrated via namespaces using a URI additional semantics. Mandatory RDF defines a common data model, however, determined in the form of triples and the universal use of URIs as an identification mechanism. These are used both to the individual components of a triple (subject, predicate, object ) to identify, as well as their values ​​and data types. The concrete syntax for the representation of the triples can, however, again following the idea of a framework can be freely chosen, with RDF / XML is provided as standard. With RDF Schema RDF also contains a schema language to define their own metadata vocabularies.

RDF Schema is similar to XML Schema to XML to RDF ​​. An RDF schema is also a valid RDF document as an XML schema is also a valid XML document. So in both cases there are specialized subsets of a markup language. While XML Schema, however, syntactic restrictions describes, eg, element name, frequency, etc., RDF Schema describes semantic restrictions, eg that an attribute " hasPublished " only to instances of the class " person" or " legal person " applied may be, but not to instances of the class "animal" - formulated in the schema language, has the attribute " hasPublished " the domain " person" or " legal person ".

How to use XML with the principle of simplicity and extensibility following the world of data thoroughly altered, in which it the definition made ​​possible by a uniform syntax, a standard type system and its text -based nature between multiple systems and programs interchangeable data formats, RDF tries the world of metadata through a to change unified data model. Through the character of a framework RDF links himself to also to established principles such as simplicity and extensibility.

Metadata in statistics

In statistical databases, those data is called metadata, which are not directly the contents of a statistic, such as trade or professional designations, community directories, and other catalogs. To the statistical metadata includes descriptions of the data fields in survey forms, may also complete form descriptions. The actual statistical data are referred to in contrast to the metadata as micro data and macro data.

Metadata in software development

In software development, the term metadata is used for various purposes:

  • We call elements of a program source code as metadata that can not be evaluated by the actual translation tool, usually a compiler, but additional tools. This metadata is mostly used for documentation or with the help of annotations for code generation. Examples are the annotation in Java or attributes within the. NET Framework.
  • A deviating from classical software form is the use of metadata in Universal software. Here, most of the required application functions are pre-compiled and are called and parameterized by a metadata engine. The desired target application must be previously described using specific metadata declaratively. This approach is pursued in particular of data warehouse and business intelligence products. Some manufacturers such as Tenfold, data warehouse and Scope GmbH Country Technology GmbH apply this principle on the creation of writing database applications. This so-called universal application approach promises dramatic cost reductions in the production of application software and an otherwise not -to-reach flexibility of the solutions thus prepared.
  • Metadata refers to the record definition in a data dictionary of a database.

Metadata when recording music

Typical Metadata for music and other audio recordings are such as title, artist, composer, release date, music publisher or the ISRC number. About this necessary to create a traditional music library primary data addition, there are significantly more complex music content metadata. These include, for example, stylistics, main and secondary instruments, genre, tempo, key, dynamics, vocal character and the description of moods and scenes. This content metadata are referred by Wilbert Hirsch, composer and pioneer of music categorization as secondary music metadata. Much trickier in their development work, these secondary metadata the basis for substantive music categorization.

565909
de