Semantic gap

The semantic gap describes the semantic, ie meaning -related difference between two descriptions of an object resulting from the fact that different forms of representation ( languages ​​) can be selected. This term is used in the computer science is generally there clearly where an image of the real-life needs to be transmitted in a formal machine- processable representation.

More precisely, the term the difference between the formulation of contextual knowledge in a powerful language (eg natural language ) and its formal and automated reproducible representation in a less powerful formal language (eg language ). In the natural language can express relationships that are not evaluated in a formal language. For this reason, the difference in expressive power is also not formally describable.

The Church - Turing thesis states means that with a machine exactly the formal operations are carried out, which also performs a calculating man. However, the selection of the operations necessary for the correct execution of a calculation is not fully ensured by such a formal set of rules. If the underlying object that is not fully predictable provides a formal approach to either no or only a partial result, or the rule application is not terminated. In contrast, it is possible for a person to formulate such tasks, such as the halting problem, and to recognize.

This discrepancy knowledge modeling results from the contextual, not decidable ambiguity of spoken language, which is referred to in the Chomsky hierarchy as an extended context. Practically usable programs that reproduce automated knowledge are, however, dependent on uniqueness and decidability. For this reason, the semantic gap is probably never fully close with the resources currently available. Rather, an abstraction of the basic low-level information and tools to high- level expertise needs to be developed on the application context for each application. This corresponds to the programming and parameterization of an algorithm.

Formal languages ​​in practice

In practice, applications are formalized with programming languages. The basis of today's most common Von Neumann architecture is the Boolean algebra in which all operations are expressed with our computers are all possible. In addition, mechanisms for storing binary data and to determine the execution sequence, which corresponds to a Turing machine. This lowest level is determined by the currently technically feasible, somewhat different situation would be, for example, with the quantum computer. On such a Turing machine, complex algorithms are difficult, and practically no longer deploy modern applications such as operating systems or Textverabeitungen. Therefore, tools to facilitate the work in the form of programming are needed. The first stage of this case form the machine or assembly languages ​​, for example, Arithmetic, and combine memory operations in commands and provide readable. In high-level languages ​​and more complex sequences of low-level operations are now combined into increasingly easy -to-understand instructions. As these commands but again can only be executed on a Von Neumann computer, the Turing machine continues to form the limit of what is possible, no matter how complex the higher programming language is seemingly. That is, the usual tools to compiler or interpreter alone will not close the semantic gap.

Illustration of natural language

To write a program for a real-world application, despite the programming task is to translate the knowledge of the user on the application of the natural domain specific language into the language of the Turing machine stops. These can be derived from the studies of the Chomsky hierarchy, that this step is not automated, so always an interaction with the people is required receipt?. A practical consequence of this is that any use of computers to solve a real problem, the user requires a certain level of knowledge of what is technically feasible. A text processing hides, for example, data structures, memory access, and search and sort algorithms behind an appropriate user interface and the user can concentrate on the creation of the content on a more abstract level than the range of ASCII codes. This is as far abstracted from the underlying technology that a user only accesses a low-level function when saving and loading the document. For more complex applications, such as a decision-making system in medicine, this abstraction is, however, much more difficult. The user should be theoretically known which methods exist to assign values ​​to the observations as required by the application. On the other hand, the developer needs to know which combinations of measurements and observations appear to select the appropriate methods for learning the decision function. Exactly in this domain change the semantic gap manifests.

Software engineering as a solution

It is the general object of the software technology to fill the gap between application knowledge and technical feasibility. For this, the domain knowledge ( high-level ) about a problem into an algorithm and a parameterization ( low-level ) must be transferred. This requires a dialogue between users and software developers that must be performed for each new domain. The goal is always a software that allows the user the results of the algorithm without any technical explanation of the developer to interpret and express his knowledge in parameterizations, without knowing the technical details of the implementation. A central role is played by an appropriate user interface.

Examples

A typical domain, which requires a high degree of abstraction from the low-level methods at a high degree of automation, the diagnosis support in medicine. Here and complex interrelationships are stored in data structures for expert systems that are efficient to train the users and to browse without knowledge of methods of artificial intelligence are expected of him. Even more complex is the problem of semantic gap in the automated image analysis. Objective here is to recognize image content, and match the photos as one or more meanings. The available data base to form only nonspecific pixel data as low-level information. To be seen from these raw data the objects or scenes depicted, algorithms must be suitably combined to pixel selection or manipulation, parameterized and connected with natural terms. The implementation of natural description categories such as color or shape requires in each case completely different mathematical Formalisierungskonzepte, in addition to the natural language formulation must be known to the user.

722497
de