Soundex

Soundex is a phonetic algorithm for indexing words and phrases according to their sound in the English language. Similar sounding words should in this case be coded to an identical string.

The Soundex algorithm yields but often also to the German language good results.

Soundex was developed by Robert Russell and Margaret Odell for indexing the surnames of the Census ( Census ) developed in the USA in 1918 and patented ( U.S. Patent 1,261,167 ). The Soundex code for a word consists of its first letter followed by three digits that represent the first letter after the following consonant of the word. Similar sounds having the same code ( B, F, P and V are, for example, all with the number " 1" is coded).

Basic rules

Letter codes

The vowels A, E, I, O and U, and the consonants H, W and Y are to be ignored except for the first character. Expanding for the German language can be defined: the umlauts Ä, Ö and Ü are to be ignored, the "sharp S " ß is encoded as the simple S.

If several consecutive in the original string letters the same Soundex code, it will appear in the result only once, from abfx ie about A120 (a will is because the first letter, b and f both give the same code 1, x results in 2, at the end appends a zero to obtain four characters).

In the practical application of the soundex method mainly two points are critical: First, it is very much geared towards the English language, on the other hand, it provides only a very rough analysis.

Nevertheless, it should be noted that it is in the illustrated algorithm is probably the most commonly used for phonetic search. Has certainly contributed to this, that for the database Oracle very early a corresponding PL / SQL standard command was implemented.

It later developed different variants specifically for other languages. So also the so-called "Cologne process " (or " Cologne phonetics " ) for German concerns, for example, under SAP in addition to the standard soundex method is implemented.

Lately, the following example has established itself as a demonstration of the very rough analysis: According to the soundex methods, the concepts Britney Spears and proven Super bitch are phonetically identical:

739589
de