Comma-separated values

Template: Infobox file format / Maintenance / Magic number missing template: Infobox file format / Maintenance / Developers missing template: Infobox file format / Maintenance / type missing template: Infobox file format / Maintenance / default missing template: Infobox file format / Maintenance / site lacks

The file format CSV stands for Comma English - separated values ​​( rare Character - separated values ​​, since the delimiter does not necessarily have to be a comma ), and describes the structure of a text file for saving or exchanging simply structured data. The file name extension is. Csv.

A general standard for the CSV file format does not exist, but it is described in RFC 4180 fundamentally. The character encoding to be used is as little fixed; 7-bit ASCII is widely recognized as the lowest common denominator.

In CSV files, tables or a list of lists of different lengths can be mapped. More complicated, for example, nested data structures can be saved by additional rules or in concatenated CSV files. In order to save them in a file to other formats such as XML or EDI but are better.

File structure

Within the text file, some characters have a special function for structuring the data.

  • A character is used to separate data sets. This is usually the line break of the file generating operating system - with the Windows operating system, there are in practice often actually two characters.
  • A symbol is used to separate data fields (columns) within the records. In general, the comma is used for this. Depending on third-party software and user settings also semicolon, colon, tab, spaces or other characters are common.
  • To use special characters in the data (eg comma in Dezimalzahlwerten ), a Feldbegrenzerzeichen (also: text delimiter ) used. Normally this field delimiter is the quotation mark. " When the delimiter itself is contained in the data, this is doubled in the data field (see masking characters).

The first record may be a header record, which defines the column name.

Each record should contain the same number of columns according to RFC 4180, paragraph 2, point 4 - but this is not always respected.

Formatting of data fields

The formatting of the data itself is not fixed. This means that the formats used must be agreed between the participating users. Particularly affected are:

  • Date and time The order of individual items (year, month, day, hour, minute, second, ...) can not always be clearly recognized.
  • Complication is to be used in particular for dates nationally different delimiter.
  • The most harmless hurdle in this case is that the numerical values ​​occur with or without a leading zero.

Examples: Is 3.4.02 March 4, 1902, April 3, 2002, March 2, 2004, or a completely different value? Meets 8:09 " in the morning eight nine ", "20 Clock 09" or is it a " time of 8 minutes and 9 seconds "?

  • Texts In contrast to XML CSV does not provide notice of the font used within the file. The character encoding used should be established between all parties in advance.
  • According to the original specification for the CSV data format number fields can be used with a fixed minimum width. Then numerical values ​​are supplemented with zeros to obtain the minimum width.
  • In different countries, different decimal and thousands separators have been established. Beyond borders, these characters can be used even contradictory.
  • Sometimes no thousands separator is used.
  • The format wealth of information currency is unmanageable.
  • The field content " " is sometimes interpreted as an empty content and sometimes as a single quote.

Special features of the import

The CSV files are not always interpreted in the same way by the same spreadsheet programs:

  • Microsoft Excel Open by importing from a text file Column width is adjusted to the content
  • Separator can be selected in the Import dialog
  • All columns have the same width
  • Delimiter is a semicolon if the CSV file is stored ANSI standard
  • Separator can also be a comma. It depends on the region and language settings of Windows. The delimiter can be specified explicitly with ' sep = ' in the first line of the. Csv file, eg ' sep = ' for comma or ' sep =; ' for the semicolon etc.
  • Similar to Microsoft Excel
  • Import dialog is also invoked when data is inserted via the clipboard.

Calculations

The CSV format line by line describes inter-related datasets. Calculations are not provided, yet many programs like LibreOffice Calc, OpenOffice.org Calc, Excel and Gnumeric accept appropriate calculation expressions. These depend on the specific program. For the above programs, such as the following (first) line works:

100, 200, = A1 B1

Also named functions can be used, depending on locale.

Software

  • CSV files can be edited with any text editor or with a special program.
  • Spreadsheet programs like LibreOffice Calc, OpenOffice.org Calc or Microsoft Excel and database systems such as Oracle or MySQL can import CSV files usually and also export, with or in the rule settings such as coding, delimiters, any text delimiters and column headings in the first row can not be made.
  • At two CSV files to compare, csvdiff can be used.

Applications

  • The CSV file format is often used to exchange data between different computer programs such as database tables.
  • The password file / etc / passwd UNIX user management is a CSV file with the delimiter ":".

Example

The following source code of a CSV file, with the semicolon (;) as the separator and with column headings in the first row:

Hour, Monday, Tuesday, Wednesday, Thursday, Friday 1; math, German, English, Math, Arts 2; sport; French; history, sports, history 3; sport; " Religion ev; Catholic "; Art; ; Art

Represents the following table:

In this example, the quotation marks are used to identify the semicolon between ev and Catholic in the last line as text. The third element of this line so is religion ev; cath.

Pictures of Comma-separated values

686
de