Syllabification

Word separation ( in Austria: compartments ), the splitting of - usually longer - words used in alphabet fonts to the improved utilization of space at hand - and typewritten line break. The word separation follows fixed orthographic rules.

The term hyphenation, refers to the same thing, in view of the German language problematic because the word separation often does not coincide here with the phonological or phonetic division into syllables.

General

At the end of the line for economic reasons ( a word does not fit entirely on one line) and aesthetic reasons ( the page is evenly filled ) separately. In many languages, including German, the main basis for the word separation is the decomposition of compound words into their components and the subsequent decomposition by syllables.

Another basis used in the German and some other languages ​​the word separation is the separation of writing according to etymological principles, that is, the separation due to the original composition (ie, the original speech syllables) in their own or borrowed language. This type of word separation is based on the decomposition into parts of words that do not always coincide with the decomposition into syllables as phonetic units. The Linguistics defines the syllable as the smallest volume group in the natural flow of speech. It is a phonetic and no sense of unity. This means that the division into syllables matches often not of the division into meaningful units ( morphemes ). Among other things, by the irresolvable conflict between morphological and phonetic principles to separate between words, for example, in the English language is so complicated that will be discussed only rarely and then only fleetingly in English-speaking countries in school. Even in the Internet you can find almost no information on this other than the usual in school council to look up in the dictionary. In addition, there are also differences between British and American customs and rules. Due to the very weak correspondence between sounds and letters in English, but it is without a drastic spelling reform impossible to separate between words easier, ie to make phonetic.

The word separation in German

Separator

According to the current German spelling the characters quarter em dash ("-" ) is used to separate words. In earlier times, instead also the double hyphen ( " ⸗ " ) was used.

Principle

The orthographic word separation in the German language is based on words that (talk ) syllables, graphic and aesthetic properties. The reformed rules are displayed new German spelling in the article.

Occasionally one encounters distorts the meaning separations, as caused mostly text be processed by automatic spell checking programs such as curse - pie ( right: Escape locations ), door - left ( right: Door latch ). The reason for this is the wrong comparison with morphemes that are included in the pre-installed dictionary. Remedial action can be accomplished only by manual correction of Hyphenation Exceptions in the program or expansion of the installed dictionary.

Automatic hyphenation

Today's word processing programs bring besides a spell check usually with the possibility of automatic hyphenation. For this they use the approach via built-in dictionaries with data for syllable division. The dictionaries are used sensibly together for hyphenation and spell checking. In this way, the vast majority of regular and special cases can be covered. The dictionaries are necessarily language-specific, they can not be used to edit another language speaking texts.

In earlier times, when such large amounts of data as in the above dictionaries were not handled ( from memory and speed reasons), it has been tried, algorithmic, ie to achieve the hyphenation rule with pure logic. The basic approach is that the software the desired separation point ( end of line ) are considered, the text then scans to the left until the next vowel (where umlauts and the Y also counts as vowels ) and then goes to a consonant to the left and above this a proposed separation. As a refinement stage are consonant clusters such as " ch", "sh" or ( according to the new spelling ) "ck" ( and according to old letters "st " ) and then, for example, still " gn " ( for of Greek origin foreign words such as magnetic ) as a consonant counted. With these very simple rules that require little program memory space and absolutely no space for dictionary data reach programs for German language texts already around 75-80 % correct hyphenation points, the rest usually located next to only one letter. This is always traversed interactively, allowing the user to move these separation proposal before a confirmation nor a word or even separation may refuse altogether. Also this approach is language-specific because of the mentioned exceptions, given the relatively small amounts of data, it is relatively easy to cover with a Software supports multiple languages ​​optional.

An element that helps both in the dictionary-based and algorithmic separation, is the setting of soft delimiters by the user. These are signs that indicate the software a suitable release position; required the separation, it is replaced by a normal hyphen when printing, it is not needed, it remains invisible when printed. In this way, the user can for example also prepare interspersed foreign words to the dictionary or unknown special expressions for the proper separation.

Regardless of the reason approach, the software will follow as a further refinement in addition general rules for the pressure set to separate, for example, not to small fragments of a word or use an existing one hyphen as a separator, if it is within one ( in size possibly configurable ) tolerance zone. In a refinement step, this tolerance zone is set larger in soft delimiters for algorithmically found separations; because if a soft separation point is given, should preferably be locked onto it instead to algorithmically found, and possibly differing points of separation.

Hyphenation in non- printed texts

In texts that are not or at least not primarily intended for printing, is usually omitted word separation. This applies to most of the content of the Internet, such as websites or e -mails. Since the representation of such texts and thus the most appropriate place for the line break depending on the device can vary greatly ( screen width, font size, etc. ), It is not usually possible, already in the text generation set the hyphenation automatically or manually. This task would have to be borne by the software of the terminal, in the case of websites, for example by the browser.

However, since the automatic hyphenation is complex and error-prone, waived the vast majority of the performing programs on -line and breaks easily at the appropriate place for a word to the end and takes into account more than hyphens as additional possible separation points. This generally results in left-justified text, the right more or less " tattered " acts. Can counteract a representation justified, however, has the disadvantage that very large spaces can arise.

Really problematic the absence of word separation is, however, only in the case of excessively long words: In the extreme case, a single word that goes beyond the intended line length or even on the possible width of the screen, break the layout. To prevent this in turn is a more or less well fulfilled responsibility of the individual software of the terminal. Performing software can manage by itself if necessary, that it forces in defiance of orthographic rules at arbitrary point in the word a line break.

A hyphen as a usable character provided with a website programmer ( soft ) could specify hyphenation points - In the HTML standard is indeed with " ". Also CSS provides ways for handling the problem. Both mechanisms are not supported by all browsers, not ignored by all search engines and are therefore seldom used.

" Hyphenation " in URLs

A particular problem is the " word separation" within long URLs dar. Since the dash is a legal and commonly used characters in URLs, it does not appear whether it is in a standing end of the line stick to a belonging to the URL hyphen or to an inserted hyphen is. For example, the URL

Therefore URLs should not be separated by a hyphen, but wrap without inserting a ( misleading ) separation character. Within texts a URL should instead by unique characters which may not be part of a URL, be limited. RFC 3986, Appendix C, recommends this " double quotes " or , so a less-than and greater-than symbols to be used ( see also URL # URLs in the text). Long URLs and thus probable separations (especially in mails) can be caused by short URLs avoided.

Examples:

For HTML texts, there is the invisible characters - (" soft hyphen " soft hyphen ), indicating the illustrative browser, where it can be separated. If necessary, the browser then disconnects the line there and sets a hyphen (-) a. Automatic hyphenation, there are now websites for both server as well as a JavaScript bookmarklet for use in the browser.

Pictures of Syllabification

91433
de