The Full Wiki

Comma-separated values: Wikis

  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

From Wikipedia, the free encyclopedia

Comma separated list
Comma-separated values
CsvDelimited001.svg
Filename extension .csv or .txt
Internet media type text/csv
text/comma-separated-values 
(deprecated)
Type of format multiplatform, serial data streams
Container for database information organized as field separated lists
Standard(s) RFC 4180

A comma-separated values (CSV) file is used for the digital storage of data structured in a table of lists form, where each associated item (member) in a group is in association with others also separated by the commas of its set. Each line in the CSV file corresponds to a row in the table. Within a line, fields are separated by commas, each field belonging to one table column. Since it is a common and simple file format, CSV files are often used for moving tabular data between two different computer programs, for example between a database program and a spreadsheet program.

Contents

Technical background

A file format is a particular way to encode information for storage in a computer file. Particularly, files encoded using the CSV format are used to store tabular data. The format dates back to the early days of business computing and is widely used to pass data between computers with different internal word sizes, data formatting needs, and so forth. For this reason, CSV files are common on all computer platforms.

CSV is one implementation of a delimited text file, which uses a comma to separate values (where many implementations of CSV import/export tools allow an alternate separator to be used). However CSV differs from other delimiter separated file formats in using a " (double quote) character around fields that contain reserved characters (such as commas or newlines). Most other delimiter formats either use an escape character such as a backslash, or have no support for reserved characters. The benefit of CSV is that they allow for the transfer of data across different applications.

In computer science terms, this type of format is called a "flat file" because only one table can be stored in a CSV file. Most systems use a series of tables to store their information, which must be "flattened" into a single table, often with information repeated over several rows, to create a text file.

History

Comma-separated value lists are very old technology and predate personal computers by more than a decade; the IBM Fortran (level G) compiler under OS/360 supported these in 1967, and they were not a new idea even then. Comma-separated value lists were often easier to type into punched cards than fixed-column-aligned data, and were less prone to producing incorrect results if a value was punched one-column-off from its intended location.

The comma separated list (CSL) is a data format originally known as comma-separated values (CSV) in the oldest days of simple computers. In the personal computer industry (then more commonly known as a "Home Computer"), the early most common use was by small businesses for generating solicitations using boilerplate form letters, via mailing lists.

Some early software applications, such as word processors, allowed a stream of "variable data" to be merged between two files: a form letter, and a CSL of names, addresses, and other data fields, and still do, simply because tasks requiring human input (construction of lists) is natural and easy using comma separation delimiting. CSL/CSVs were also used to exchange data between desktop computers of different architectures, and for simple database uses.

Specification

Background

Comma separated lists date from before the earliest personal computers, but were widely used in the earliest pre-IBM PC era personal computers for tape storage backup and interchange of database information from machines of two different architectures. In that day, affordable hard drives did not exist, and many small businesses tried to achieve the benefits of computing using floppy disk based software. [1]

No general standard specification for CSV exists. Variations between CSV implementations in different programs are quite common and can lead to interoperation difficulties. For Internet communication of CSV files, an Informational IETF document (RFC 4180 from October 2005) describes the format for the "text/csv" MIME type registered with the IANA. Another relevant specification is provided by Fielded Text which also covers the CSV format.

Many informal documents exist that describe the CSV format. How To: The Comma Separated Value (CSV) File Format provides an overview of the CSV format in the most widely used applications and explains how it can best be used and supported.

Basic Rules

The basic rules from a lot of these specifications are as follows:

CSV is a delimited data format that has fields/columns separated by the comma character and records/rows separated by newlines. Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes. However, if a line contains a single entry which is the empty string, it may be enclosed in double quotes. If a field's value contains a double quote character it is escaped by placing another double quote character next to it. The CSV file format does not require a specific character encoding, byte order, or line terminator format.

  • Each record is one line terminated by a line feed (ASCII/LF=0x0A) or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A), however, line-breaks can be embedded.
  • Fields are separated by commas (although in locales where the comma is used as a decimal point, the semicolon is used instead as a delimiter, inducing some drawbacks when CSV files are exchanged e.g. between France and USA)
1997,Ford,E350
  • In some CSV implementations, leading and trailing spaces or tabs, adjacent to commas, are trimmed. This practice is contentious and in fact is specifically prohibited by RFC 4180, which states, "Spaces are considered part of a field and should not be ignored."
1997,   Ford   , E350
same as
1997,Ford,E350
  • Fields with embedded commas must be enclosed within double-quote characters.
1997,Ford,E350,"Super, luxurious truck"
  • Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super ""luxurious"" truck"
  • Fields with both embedded double-quote characters and commas must be enclosed with double double-quote characters.
1997,Ford,E350,"""Super, luxurious truck"""
  • Fields with embedded line breaks must be enclosed within double-quote characters.
1997,Ford,E350,"Go get one now
they are going fast"
  • Fields with leading or trailing spaces must be enclosed within double-quote characters. (See comment about leading and trailing spaces above.)
1997,Ford,E350,"  Super luxurious truck    "
  • Fields may always be enclosed within double-quote characters, whether necessary or not.
"1997","Ford","E350"
  • The first record in a csv file may contain column names in each of the fields.
Year,Make,Model
1997,Ford,E350
2000,Mercury,Cougar

Example

1997 Ford E350 ac, abs, moon 3000.00
1999 Chevy Venture "Extended Edition"   4900.00
1999 Chevy Venture "Extended Edition, Very Large"   5000.00
1996 Jeep Grand Cherokee MUST SELL!
air, moon roof, loaded
4799.00

The above table of data may be represented in CSV format as follows:

1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00

This CSV example illustrates that:

  • fields that contain commas, double-quotes, or line-breaks must be quoted.
  • a quote within a field must be escaped with an additional quote immediately preceding the literal quote.
  • a quote and a comma within a field must be escaped with an extra additional quote preceding the literal quote.
  • space before and after delimiter commas may not be trimmed. This is required by RFC 4180.
  • a line break within an element must be preserved.

Line break handling within CSV files

It is noteworthy to say that many applications will not handle a line break within a cell as in the example above. Such applications may interpret the line break as a delimiter and call for a new cell to begin. In this case, the layout of the CSV file will be disrupted or broken.[2]

Application support

The CSV file format is very simple and supported by almost all spreadsheets and database management systems. Many programming languages have libraries available that support CSV files. Even modern software applications support CSV imports and/or exports because the format is so widely recognized. In fact, many applications allow .csv-named files to use any delimiter character.

See also

References

  1. ^ Peachtree and Condor, to name two.
  2. ^ [1]

External links


Comma separated list
Comma-separated values
Filename extension .csv or .txt
Internet media type text/csv
Type of format multiplatform, serial data streams
Container for database information organized as field separated lists
Standard(s) RFC 4180

A comma-separated values or character-separated values (CSV) file is a simple text format for a database table. Each record in the table is one line of the text file. Each field value of a record is separated from the next by a character (typically a comma but some European countries use a semi-colon as a value separator instead of a comma). Implementations of CSV can often handle field values with embedded line breaks or separator characters by using quotation marks or escape sequences. CSV is a simple file format that is widely supported, so it is often used to move tabular data between different computer programs that support the format. For example, a CSV file might be used to transfer information from a database program to a spreadsheet.

Example of a USA/UK CSV file (where the decimal separator is a period/full stop and the value separator is a comma):

Year,Make,Model,Length
1997,Ford,E350,2.34
2000,Mercury,Cougar,2.38

Example of a German CSV file (where the decimal separator is a comma and the value separator is a semicolon):

Year;Make;Model;Length
1997;Ford;E350;2,34
2000;Mercury;Cougar;2,38

Contents

Technical background

A file format is a particular way to encode information for storage in a computer file. Particularly, files encoded using the CSV format are used to store tabular data. The format dates back to the early days of business computing and is widely used to pass data between computers with different internal word sizes, data formatting needs, and so forth. For this reason, CSV files are common on all computer platforms.

CSV is a delimited text file that uses a comma to separate values (many implementations of CSV import/export tools allow other separators to be used). Simple CSV implementations will not allow field values that contain a comma or other special characters such as newlines. More sophisticated CSV implementations permit commas and other special characters in a field value. Many implementations use " (double quote) characters around values that contain reserved characters (such as commas, double quotes, or newlines); embedded double quote characters may be represented by a pair of consecutive double quotes. (Creativyst 2010) Some CSV implementations may use an escape character such as a backslash to encode reserved characters as an escape sequence.

In computer science terms, a CSV file is a "flat file".

History

Comma-separated values are old technology and predate personal computers by more than a decade: the IBM Fortran (level G) compiler under OS/360 supported them in 1967. Comma-separated value lists were often easier to type into punched cards than fixed-column-aligned data, and were less prone to producing incorrect results if a value was punched one column off from its intended location.

The comma separated list (CSL) is a data format originally known as comma-separated values (CSV) in the oldest days of simple computers. In the industry of personal computers (then more commonly known as "Home Computers"), the most common use was small businesses generating solicitations using boilerplate form letters and mailing lists.

Some early software applications, such as word processors, allowed a stream of "variable data" to be merged between two files: a form letter, and a CSL of names, addresses, and other data fields. Many applications still do, simply because tasks requiring human input (such as constructing a list) are natural and easy using comma delimiters. CSL/CSVs were also used for simple databases.

Specification

Background

Comma separated lists date from before the earliest personal computers, but were widely used in the earliest pre-IBM PC era personal computers for tape storage backup and interchange of database information from machines of two different architectures. In that day, affordable hard drives did not exist, and many small businesses tried to achieve the benefits of computing using floppy disk based software.[citation needed]

No general standard specification for CSV exists. Variations between CSV implementations in different programs are quite common and can lead to interoperation difficulties. For Internet communication of CSV files, an Informational IETF document (RFC 4180 from October 2005) describes the format for the "text/csv" MIME type registered with the IANA. (Shafranovich 2005) Another relevant specification is provided by Fielded Text which also covers the CSV format.

Many informal documents exist that describe the CSV format. Creativyst (2010) provides an overview of the CSV format in the most widely used applications and explains how it can best be used and supported.

Basic rules

The basic rules from a lot of these specifications are as follows:

CSV is a delimited data format that has fields/columns separated by the comma character and records/rows separated by newlines. Fields that contain a special character (comma, newline, or double quote), must be enclosed in double quotes. If a line contains a single entry which is the empty string, it may be enclosed in double quotes. If a field's value contains a double quote character it is escaped by placing another double quote character next to it. The CSV file format does not require a specific character encoding, byte order, or line terminator format.

Note: While binary data is not prohibited, it is especially problematic to incorporate as reserved CSV characters (comma, newline, double-quote) are often present in binary data, and are not typically 'escaped' or otherwise correctly preprocessed. The tradition has been that CSV file data is humanly readable as text, so that binary numbers are converted to ASCII string format before collation in the file. Example: binary (as hexadecimal) 0x3FFF (two bytes, one of value 63 followed by another of value 255) would be represented in ASCII as 16383.

  • Each record is one line terminated by a line feed (ASCII/LF=0x0A) or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A), however, line-breaks can be embedded.
  • Fields are separated by commas (although in locales where the comma is used as a decimal separator, the semicolon is used instead as a delimiter, inducing some drawbacks when CSV files are exchanged e.g. between France and USA)
1997,Ford,E350
  • In some CSV implementations, leading and trailing spaces or tabs, adjacent to commas, are trimmed. This practice is contentious and in fact is specifically prohibited by RFC 4180, which states, "Spaces are considered part of a field and should not be ignored."
1997, Ford , E350
not same as
1997,Ford,E350
  • Fields with embedded commas must be enclosed within double-quote characters.
1997,Ford,E350,"Super, luxurious truck"
  • Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.
1997,Ford,E350,"Super ""luxurious"" truck"
  • Fields with embedded line breaks must be enclosed within double-quote characters.
1997,Ford,E350,"Go get one now
they are going fast"
  • In CSV implementations that trim leading or trailing spaces, fields with such spaces must be enclosed within double-quote characters. (See comment about leading and trailing spaces above.)
1997,Ford,E350," Super luxurious truck "
  • Fields may always be enclosed within double-quote characters, whether necessary or not.
"1997","Ford","E350"
  • The first record in a csv file may contain column names in each of the fields.
Year,Make,Model
1997,Ford,E350
2000,Mercury,Cougar

Example

YearMakeModelDescriptionPrice
1997FordE350ac, abs, moon3000.00
1999ChevyVenture "Extended Edition" 4900.00
1999ChevyVenture "Extended Edition, Very Large" 5000.00
1996JeepGrand CherokeeMUST SELL!
air, moon roof, loaded
4799.00

The above table of data may be represented in CSV format as follows:

Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00

This CSV example illustrates that:

  • fields that contain commas, double-quotes, or line-breaks must be quoted.
  • a quote within a field must be escaped with an additional quote immediately preceding the literal quote.
  • space before and after delimiter commas may not be trimmed. This is required by RFC 4180.
  • a line break within an element must be preserved.

Line break handling within CSV files

It is noteworthy to say that many applications will not handle a line break within a cell as in the example above. Such applications may interpret the line break as a delimiter and call for a new cell to begin. In this case, the layout of the CSV file will be disrupted or broken.[1]

Application support

The CSV file format is very simple and supported by almost all spreadsheets and database management systems. Many programming languages have libraries available that support CSV files. Even modern software applications support CSV imports and/or exports because the format is so widely recognized. In fact, many applications allow .csv-named files to use any delimiter character.

Microsoft Excel will open .csv files, but depending on the system's regional settings, it may expect a semicolon as a separator instead of a comma, since in some languages the comma is used as the decimal separator.

When pasting text data into Excel, the tab character is used as a separator: If you copy "hellogoodbye" into the clipboard and paste it into Excel, it goes into two cells. "hello,goodbye" pasted into Excel goes into one cell, including the comma.

See also

References

External links








Got something to say? Make a comment.
Your name
Your email address
Message