Translation memory

A translation memory ( also archive translation; . Eng translation memory, abbreviated TM) is a database of structured translation, which is the essential component of applications for computer- assisted translation ( computer - aided translation, abbreviated CAT).

Database structure

The structure of the database, there are two basic types:

  • Firstly, there are databases in which the stored segments are belong together lyrics (separately for source and target language ). These systems have the advantage that no isolated records are stored, but each set in context. In addition, the database query can be restricted to specific topics and thus accelerates display of hits.
  • On the other hand, there are databases in which the segments are sentences or paragraphs, which are insulated, so stored without the context of source code. However, the response time does not depend so much on the size of units than on the efficient indexing in the database.

Practical work

In practice, the work begins with a translation memory so that a source code directly from the text editor is invoked or imported on stand-alone TM programs. The program then searches in the memory according to formulations with a specified minimum match and offers them as a translation. These translations can be accepted, rejected or adapted by the editor. If no matching segments found, the editor in a new translation, which he can then save to the output segment. If it does, then it is suggested from the occurrence of similar segments. Wennd the segments are provided with additional information, facilitates the later selection from among several proposals. Such information shall include:

  • Users, of which the translation is stored ( applied / modified segment)
  • Date of creation / modification of the segment
  • Frequency of formulation
  • The context of the formulation
  • More classifying information

This additional information will be either automatically assigned by the program or must be manually maintained by the translator.

When detecting whether there is a similar source text, the software punctuation, spaces, paragraph marks and formatting evaluates the same way as text.

Program Technical Properties

Usually, TM systems have features that allow the detection of a usable translation independent of variable elements such as numbers, dates, measurements or proper names.

Search for similar source segments by means of different consuming search algorithms ( fuzzy search ), which then also specify a percentage similarity value most.

To make text from word processing and desktop publishing programs for the TM systems available, there are filters and extraction programs that solve out the source code from the respective files. As a result, one then a labeled ( " tagged " ) shall file, in which the translatable text between specific control codes ( tags) is available. This layout tags are protected or hidden so they can not be accidentally overwritten or modified by the system. When translating software (localization), the program code can be protected in this way from accidental change. After translation, the control codes are used to filter program to insert the text back to its proper location in the DTP file and thereby also apply formatting (eg bold, italic, ...) to the corresponding places of the translation.

Most TM systems have special editors to facilitate the work with these " tagged " files.

When changing between different TM systems can exchange translation memories about the TMX (Translation Memory eXchange ) projects and via the XML Localization Interchange File Format ( XLIFF ). There are open standards that are supported by most professional suppliers. Since the content of a system strongly depends on the nature of the segmentation and the definition of TMX leaves room for interpretation, the exchange is typically not lossless.

782428
de