Semantic search

The semantic search is a search method, in which the meaning of a query (on the Internet or in a digital text archive ) is placed at the center.

Through the use of background knowledge the content meaning of texts and search requests will be considered for a semantic search engine. It is not only for words in the text, as in keyword -based search engines, looking for. This enables a search to be detected precisely and associated with the content relevant texts. Thus contextually correct search results are provided. Semantic Search mimics to some extent the human brain by knowledge and associations are used for search.

Background knowledge

The background knowledge - in the form of thesauri, semantic networks and ontologies - which is used in semantic search, knowledge of a particular domain maps. Depending on the application concepts and relevant relationships between concepts are recorded. The mapping of concepts and their relationships enables method specialization - ie the narrow down your results - and the generalization - the generalization - a search query. Relationships themselves can range from simple nature - in the form of "A- is - a -B" - be; However, they can also more complex relationships - map - like " A knows B " or "A - activated -B". In computer science, and especially in bioinformatics, the data formats OWL and RDF and RDFs for the storage of background knowledge in ontologies have established. In order to create ontologies as efficiently as possible, the Stanford University have developed the tool Protégé, and the University of California, Berkeley, OBO -Edit. There are besides these two tools, a variety of other such software systems. A current challenge is the automatic creation of ontologies dar. this, different approaches are used, ranging from manual processing to semi- automated processes. In the semi- automatic generation of an ontology, an automated process is run, creates the proposals of concepts and their networking, which must then be reviewed and approved by a domain expert.

Annotation between text and background knowledge

An important aspect of the semantic search, the methods for annotating dar. The Annotator linked text data from documents or databases with relevant entities of the background knowledge, ie the ontology. For the annotation process of text mining are used to read content semantically correct and be able to classify. Today's highly trained algorithms achieve a combination of accuracy and completeness, the so-called F- measure by over 90 percent. The F- measure is the key figure in which the precision and the success rate is equally valued. On used Annotator also the technical success of a semantic search engine aligns.

Aspects of semantic search

The quality of semantic search is primarily determined by two factors. The inclusion of synonyms in the query is important for the completeness of the search results. In the background knowledge all known synonyms of a term are for deposited. The user is using one of these search terms in a query are also included all related synonyms in the query. Thus, it is possible, for example, when searching for " programmer" and those documents to find where the qualification is held with the synonym " software developer ".

The differentiation of homonyms (eg Jaguar (car brand ) compared to Jaguar ( animal ) ) in the search results improves the quality of search results found. The means of disambiguation, the resolution of ambiguities found and mismatched search results will be automatically removed. Here, among others, statistical techniques, text mining and NLP ( natural language processing) used to identify the context of a document, and thus to conclude that the right or wrong assignment of the subject area. Is in this case the context of the document in which the search query was found, this is classified as a real result. By implication excludes documents with incorrect context of the search results.

The third and most important aspect of the semantic search is the application of existing background knowledge. One searches for a term such as Heart disease, and other relevant terms of the area such as are the coronary disease " angina pectoris " will be considered, as the concept is shown " in the vicinity " of heart disease in the background knowledge. For example allows the network MeSH (medical subject headings ) with approximately 80,000 concepts of this approach in the biomedical domain. The scientific biomedical search engine shows the possibilities of semantic search in this field.

The presentation of search results that are usually much more extensive than in a keyword search, in a user friendly form is a difficult but solvable problem.

722322
de