Arabic script in Unicode

The sign of the Arab and Syrian are in Unicode in seven different Unicode blocks. In addition to the individual characters of the Unicode standard also defines a set of algorithms to properly display Arabic and Syriac texts.

Coded character

The most important sign of the Arab lie in the Unicode block Arabic. In addition to the letters of the Arabic alphabet, similar in size and arrangement of ISO 8859-6, here are even numbers, some punctuation marks that are very different from those that are used with the Latin alphabet, and special characters. Even if a letter in the word has different representations depending on the position, this block contains only one character for all variants.

The Arabic alphabet is also used in other languages ​​, which add some more characters. Thus, there are in the Persian alphabet, four additional letters. Such letters are combined with characters that are no longer in use, in the blocks Arabic, supplement and Arabic, extended -A.

The two blocks Arabic Presentation Forms- A Arabic Presentation Forms- B and contain - especially for compatibility with other standards - Display variants and ligatures.

Unicode block Arab mathematical alphanumeric symbols finally includes Arabic characters for use in mathematical equations.

The letters of the alphabet are in the Unicode block Syrian Syriac. Other than the Arab there are no signs that are encoded in multiple different display formats.

In addition to these characters, the bidirectional control characters and the wide loose connector or connectors not play a role in the Arab and Syrian digital typography.

Direction

Arabic and Syriac is written from right to left, just numbers - regardless of the number used - to write from left to right. Some punctuation marks, such as parentheses, are displayed mirrored to the ordinary version. For the correct representation of the Unicode standard provides the Unicode bidi algorithm as for other left-handed writings.

Context-dependent letterforms

Depending on the position in the word, an Arabic letter in up to four different forms of representation occur: When isolated letter ( as in character tables ), as a letter in the initial position, where it connects with the following letter to the left, at the end of a word, where he is the previous letter connects the right and in the middle of words, where it is connected with two neighbors. So A font must maintain up to four different glyphs for a single character. To select the correct glyph depending on the context, the following algorithm is used:

For this purpose, each Unicode character to a Joining_Type property. This property specifies whether and in which direction connects the sign with neighboring characters. There are six different values:

  • R for characters such as Alif or Dal, which are connected only to the right
  • L for characters that are connected only to the left. In Arabic there is no sign of this value, it is however used in the Phagpa font.
  • D is Ba or characters, such as Ta, which are connected to both sides
  • C for characters such as the Kaschidazeichen or the wide loose connector, which also leads to both sides initiating a connection, but themselves remain unchanged
  • U for characters that do not connect with their neighbors, or about every Latin characters, or even the breadthless Non connector.
  • T for characters like combining characters that should be ignored in the application of the algorithm.

This property is determined by a set of rules, the form in which a sign is to be displayed:

Character of the R, preceded by a character of type L, D or C (where characters are skipped of type T ), are shown in the right associated form, analogous to the type of characters L, which is a sign of the R, D or C follows (where characters are skipped of type T ) are presented in the form attached to the left.

For characters of type D are both applied these rules stand on either side suitable character, the exposure to both sides associated form is chosen, such a sign, on the other is not only on one side, the corresponding associated form is selected.

If none of the rules, so the sign is represented in the form unconnected.

This algorithm is also used for the Syrian font, with special for the Syrian letter Olaf additional rules apply.

Other writing systems in which this algorithm is applied are N'Ko, Mongolian and Phagpa.

Ligatures

Another special feature in Arabic and Syriac, certain ligatures, which differ markedly in appearance from the assembled individual letters of which they are composed.

For the correct display of the ligatures of the Unicode standard contains another property Joining_Group. This may take the values ​​that are named after the letters of this group. Thus, Lam and derived characters all the value Lam. Follow on such a character is a letter from the group Alef ( the Alif and derived characters belong ), these two characters are represented by the Lam - Alif ligature.

Other features

Some characters require special representation, for example, U 06 DD, end of Aya. This sign includes all directly following paragraphs. To recognize a character as a number, computer systems can make use of the general category of the character. The same goes for the characters to code points U 0600 to U 0603, underline the general numbers, years, footnotes and page numbers. In Syriac there is the Syrian abbreviation (U 070 F), indicating the start of an acronym, which should then be marked with a line on set with individual points. The example shows the first four letters of the alphabet Syrian, of which the last three are spanned by the Syrian abbreviation.

Swell

  • Julie D. Allen et al.: The Unicode Standard. Version 6.2 - Core Specification. The Unicode Consortium, Mountain View, CA, in 2012. ISBN 978-1-936213-07-8. Chapter 8.2: Arabic, Chapter 8.3: Syriac. (on-line PDF)

Latin | Greek and Coptic | Cyrillic and Glagolitic | Hebrew | Arabic and Syriac | Indian Magazines | East Asian fonts | Historical writings

Punctuation | numeral | Symbols | Math | character control characters

  • Unicode
  • Typography
  • Arabic writing
73877
de