File format

A file format defines the syntax and semantics of data within a file. It is thus constitutes a two-way mapping of information onto a one-dimensional binary memory

The knowledge of the file format is essential for the interpretation of the information stored in a file. Modern operating systems organize files on the file format of applications that can interpret the files.

Origin and Meaning of the file format

File formats are defined in the rule by manufacturers of computer software or by standardized for a body. For formats that have been set out by a manufacturer, we also speak of proprietary file formats. Also from proprietary file formats to standard formats can develop if they are documented and taken up by others. Standard formats make it possible that software from different manufacturers with the same file formats works.

Organizations of archives have been working for several years on the creation of file format directories (English: " file format registries " ), which enable the automated detection of formats and provide information on their use.

Specifications

In a specification, the type of encoding and arrangement of data within a file format should be described in detail. For many file formats, the specifications are published, other specifications are treated as trade secrets, and just as there are file formats that are not documented at all outside of her interpretive programs.

Detection of file formats

The detection of the format of a file is required in order to interpret the information contained in the file to. The file format can be automatically determined in three different ways:

  • Interpretation of the file contents
  • Interpretation of the file name
  • Interpretation of metadata

Often, the format is not recognized, but simply assumed - it is then the responsibility of the user to open only "appropriate " files with the computer program.

Interpretation of the file contents

For the interpretation of the file contents the file or pieces of the file is read and analyzed for known patterns. To magic numbers are often used. The file format is recognized when the file starts with the magic number that is associated with the file format.

Interpretation of the file name

One method commonly used to distinguish file formats is the interpretation of the file name. Usually this only the file name extension is used. This method is used for example by the operating systems Mac OS X, CP / M, DOS, and Windows, and also in developer tools such as "make " (here, regardless of operating system ). Here, the last point in the file name is considered as separator and the following expansion part used as an identifier for the file format. Since old operating systems, this file name extensions were limited to three characters, most file formats ("C. " Or such. " Exe" ) are still identified by a one-to three -digit code.

Since changing the file name extension leads by untrained users to problems (a file is no or assigned to the wrong application), for example, Microsoft has in more recent versions of Windows decided to default to hide the file extension, which has led to some new problems, such as for example, to the fact that viruses get a "double extension", whereby an executable file " kournikova.jpg.exe " will " kournikova.jpg " is displayed as a supposed image file.

Interpretation of metadata

The only reliable method of determining the file format is to store metadata along with the file or to transmit that define the file format exactly. On the Internet, such metadata in the form of MIME types are transmitted. Some operating systems create metadata from the file system.

Possible classifications

File formats can be classified according to many criteria. Common criteria include:

  • Textually compared to binary
  • Data compared to executable application
  • By type of content: text, image, sound, video formats
  • Open to proprietary
  • Spread over rare

Etc.

Proprietary formats

Copyrighted ( proprietary) file formats produce sometimes a function of the relevant software manufacturer (and its supported platforms ), especially when

  • The internal structure is also protected by software patents;
  • The format of the company 's intellectual property and economic interests ( customer loyalty ) is not disclosed to the public.

So can be developed for this format no third-party programs or " open source ".

This results in risks such as insolvency of the manufacturer discontinuing the development of the product ( at least for the chosen platform ), increase in license fees ( see, eg, GIF patent fees ) or prizes.

Sometimes may proprietary or patented formats are also used to license payment from third parties and are getting a distribution, which provides sufficient independence from a single provider (such as the binary image format " GIF " - patents on it but 10/2006 expired).

Thus, proprietary binary formats are limited for archiving data files, unless the format is in common use. Also have older documents if they are to remain legible, are sometimes converted in a software update to the new version of the format. Although this is also in the development of free formats the case, but the disclosure of the format of the old version is in principle reached at least.

219746
de