Full text search

A full-text search ( often also Full text search) is the discovery of texts in a variety of the same or of different types of files on a computer, a server and / or the Internet. The search areas are previously indexed with any program- internal or - independent index tools.

For quick information extraction and retrieval from both known and not known (but present on the media) documents the full search is used. Therefore, the full-text search is used to find, explore and extract unknown, non-trivial and important information from large amounts of unstructured text / files and is therefore also an important area of text mining. It is an instant solution to eliminate the need for a concrete question to systems such as document management and data mining can.

In the context of databases, full-text search means that in addition to an otherwise about used SQL query, which presupposes a knowledge of the field structure can also be searched independently of the field.

History

The full text search has come up in the middle of the 1970s. Previously, systems were often used, in which integrate key concepts in the later -to-find text or meta files a person had ( catalog system ). This method is applicable to many areas largely no longer feasible because such costly and time- intensive work function rather poor for large data sets. Among other things, the search engine Yahoo has failed in the mid -1990s with such an approach.

As a solution to this problem we began to prepare the entire original text for the purpose of later quick retrieval and store in the processed form. Thus, theoretically any document which contains only at least one word of the search query are found. Thus, the above-described manual indexing process has not only been bypassed; you also get a more complete search result. In practice, there are some problems. There may also documents contain words from the query that are not relevant to the current topic, they will be found and the user sees is a huge result compared with often irrelevant document hits. In contrast, also documents are not found, although they are suitable for this topic, but other words such as Use synonyms. This problem is nowadays processed by means of ontologies. In the mid-1970s, however, new types of search have been introduced in addition to the classic word search as a phrase search or wildcard search and ranking process in order to mitigate the above problem.

Another possibility opened up in relational databases with the introduction of field types such as type = memo ( MsAccess ) or type = BLOB ( MySQL) or varchar in SQL databases that are able to accommodate larger texts. Here the often already taking place indexing of the tables in a database can be used together with the wildcard search for corresponding SQL queries when the relevant documents are stored in such database fields. This leads possibly to a faster database response.

Basic search types

This list is not exhaustive:

Depending on the search system used, there are the following search options:

Technology

The most common approach for a full-text search system is that a complete index of the complete data base is created. For each word - except for stop words that are rather useless to search - an entry in the index with the exact position is made in the dataset ( inverted file).

A query can now be relatively easily processed, since now no longer any document itself must be searched. For small data sets, a serial scan would still practicable, but not for larger amounts of data.

Document management system Data-Mining Binary Large Object Fault-tolerant system

808123