Alphabet (computer science)

In computer science, an alphabet is a finite set of mutually distinguishable symbols, which are also called characters or letters.

Alphabets are often referred to with the formula sign (Sigma ), rarely used as symbols as an abbreviation of the English vocabulary. The Kleene hull of the alphabet denotes the set of all words over the alphabet that can be formed by symbols. Alphabets thus represent the character inventory for words available and thus form the basis for formal languages.

In an alphabet no words occur which are formed over this alphabet. In particular, the empty word never belongs to the alphabet, as it is included in each set of words.


An alphabet is a finite set. Often it is also required that the set is not empty. The elements of an alphabet are called letters, symbols or characters. According to this definition, the alphabet is a character set, equivalent to a character set. The word character set but is often a character encoding meant. Alphabets on the other hand is independent of a coding.

According to DIN 44300, an alphabet is, however, a totally ordered finite set of distinct symbols. It is therefore more specifically to a character set, together with a total ordering.

Demarcation for natural language

In computer science, the alphabet is a generalization of the usual alphabets natural languages. For example, the alphabet of the Latin characters is also an alphabet in the sense of computer science. In theoretical computer science, however, are common alphabets whose elements are symbols that can represent more than one letter. For example, an alphabet of three elements. They can be added together in any order, about to.

Here the arbitrariness of symbols is particularly important: Which characters are used for the elements of the alphabet, is irrelevant, as long as they are distinguishable from each other. The string can thus, for example, represent a sequence of notes, but just as well as a program controller with three different commands.

In this context should also be noted that you referred to in the computer science any sequence of characters of an alphabet as a word. Also on the Latin alphabet is therefore in the computer science, type a word.


  • With the help of the alphabet all natural numbers can be formed in the decimal system. In number theory, a distinction according to the distinction of characters of the alphabet and words over this alphabet between digits and numbers.
  • The Roman numeral system is based on the alphabet Σ = {I, V, X, L, C, D, M }. However, here are the rules for how the string needs to have in order to qualify as word of the Roman number system, complex (IV instead of IIII, larger units further to the left than smaller, ...). However, they can be represented by a formal grammar.
  • For the Morse code, two different alphabets can specify that describe the communication system of Morsens at different levels: First, there is the alphabet or from which the amount of the Morse character is formed on the basis of each letter frequencies. In addition to letters and numbers, among other things (SOS ) directly Morse code, as it is gemorst without a break between the dit and dah. The characters of a message are not easy hintereinanderweg gemorst, but between each character a short pause is inserted each ( because some characters are also beginning other characters, this is necessary; see prefix ). The Morse code itself therefore consists of the characters and the pause between characters.

These examples are intended to illustrate, that can be described by optionally hierarchical pairs of alphabets and languages ​​associated with the construction of a complex communication system.