Frequency list

A frequency class is a statistical measure of the frequency of use of a word in a natural language or in a language cutout in linguistics. To calculate the frequency class the Zipf law is used as the language law has a special significance in quantitative linguistics. Also in the corpus linguistics frequency classes have been established as an empirical frequency measure.

Calculation

As a basis for calculating a representative and sufficiently large amount is used to the available documentary sources from a language corpus is called. The most common word in this corpus is used as a basis for comparison. In the written German language this is the word of in the English ( " of / / the ") in the Swedish och ( " and").

The Zipf law serves as a basis for calculation. The value of the frequency class is calculated doing with the base-2 logarithm of the ratio of the to be examined word and the word most frequently encountered.

The floor function rounds the intermediate result from an integer. The calculated frequency class is an integer that expresses how many times more likely the most common word occurs as the studied word in the analyzed dataset. The most common word itself belongs to the frequency class 0, and iA it is the only representative of this class. Words, which is about times as likely as this occur, are classified in the frequency class. The result is that a word occurs more frequently, the smaller its frequency range.

According to Zipf's law, it is expected that the class contains some words (types ) and that the sum of their occurrences (tokens ) is approximately the same in each class, this approximation for the top and bottom classes is the least accurate. In particular, one would expect according to Zipf's law for each corpus that roughly half of all occurring words (types ) in each occurs only once.

Frequencies can be viewed on two linguistic levels: For a single word form ( as shown above) or for an entire lexeme with its various word forms. The most common word whose frequency is used in the calculation of the frequency class as a comparative value should be determined on the same linguistic level: In written German is the most common word form is the word of and the most frequent lexeme the definite article ( the inflected forms of that, the, the, the, the ).

378291
de