Stop words

Stop words are called in information retrieval words that are not taken into account in a full-text indexing, since they occur very frequently and usually have no relevance for the detection of the document content.

General common stop words in German documents are definite article ( ' the ', ' the ', 'that '), indefinite article ( ' a ', ' a ', ' a '), conjunctions (eg ' and', ' or ',' but ') and frequently used prepositions (eg ' on ', ' in ', ' of ' ), and the negation ' not '. In English, among other things, ' a', ' of', 'the', ' I', 'it', 'you ' and 'and' stop words. Depending on the documents to be developed stopwords can also be multilingual. Although more likely to identify as a stop sign, and the dot (.), The comma (, ) and the semicolon ( ;) are called stop words often.

Allen stopwords in common is that they take mainly grammatical / syntactical functions and therefore allow any conclusions on the content of the document.

Another common feature is their large number: you enter into any document on very often and occur in very many documents, which they would cause major difficulty in opening the documents.

The identification of stop words makes search engines more efficient. If one were to observe stop words during a search operation, virtually every document would be a hit. Such a result would be useless for the user.

Hans Peter Luhn, one of the pioneers of information retrieval, coined the concept of stop words and used this concept in the design and implementation of the indexer KWIC.

750486
de