|Developed by||LDS FHD|
|Latest release||GEDCOM 5.5 Standard + Errata Sheet / 2 January 1996|
|Type of format||Genealogy data exchange|
GEDCOM, an acronym for GEnealogical Data COMmunication, is a de facto specification for exchanging genealogical data between different genealogy software. GEDCOM was developed by The Church of Jesus Christ of Latter-day Saints as an aid to genealogical research.
A GEDCOM file is plain text (usually either ANSEL or ASCII) containing genealogical information about individuals, and meta data linking these records together. Most genealogy software supports importing from and/or exporting to GEDCOM format. However, some genealogy software programs incorporate the use of proprietary extensions to the GEDCOM format, which are not always recognized by other genealogy programs, for example the GEDCOM 5.5 EL (Extended Locations) specification.
GEDCOM uses a lineage-linked data model. This data model is based on the nuclear family and the individual. This contrasts with evidence-based models, where data are structured to reflect the supporting evidence. In the GEDCOM lineage-linked data model, all data are structured to reflect the believed reality, that is, actual (or hypothesized) nuclear families and individuals.
A GEDCOM file consists of a header section, records, and a trailer section. Within these sections, records represent people (INDI record), families (FAM records), sources of information (SOUR records), and other miscellaneous records, including notes. Every line of a GEDCOM file begins with a level number where all top-level records (HEAD, TRLR, SUBN, and each INDI, FAM, OBJE, NOTE, REPO, SOUR, and SUBM) begin with a line with level 0, while other level numbers are positive integers.
Although it is theoretically possible to write a GEDCOM file by hand, the format was designed to be used with software and thus is not especially human-friendly. A GEDCOM validator that can be used to validate the structure of a GEDCOM file is included as part of PhpGedView project, though it is not meant to be a standalone validator. For standalone validation you can use "The Windows GEDCOM Validator" or the older unmaintained Gedcheck from the LDS.
During 2001, The GEDCOM TestBook Project evaluated how well four popular genealogy programs conformed to the GEDCOM 5.5 standard using the Gedcheck program. Findings showed that a number of problems existed and that The most commonly found fault leading to data loss was the failure to read the NOTE tag at all the possible levels at which it may appear. In 2005, the Genealogical Software Report Card was evaluated, (by Bill Mumford who participated in the original GEDCOM Testbook Project) and included testing the GEDCOM 5.5 standard using the Gedcheck program.
0 HEAD 1 SOUR Reunion 2 VERS V8.0 2 CORP Leister Productions 1 DEST Reunion 1 DATE 11 FEB 2006 1 FILE test 1 GEDC 2 VERS 5.5 1 CHAR MACINTOSH 0 @I1@ INDI 1 NAME Bob /Cox/ 1 SEX M 1 FAMS @F1@ 1 CHAN 2 DATE 11 FEB 2006 0 @I2@ INDI 1 NAME Joann /Para/ 1 SEX F 1 FAMS @F1@ 1 CHAN 2 DATE 11 FEB 2006 0 @I3@ INDI 1 NAME Bobby Jo /Cox/ 1 SEX M 1 FAMC @F1@ 1 CHAN 2 DATE 11 FEB 2006 0 @F1@ FAM 1 HUSB @I1@ 1 WIFE @I2@ 1 MARR 1 CHIL @I3@ 0 TRLR
The following is a sample GEDCOM file. The first column indicates an indentation level.
The header (HEAD) includes the source program and version (Reunion, V8.0), the GEDCOM version (5.5), and the character encoding (MACINTOSH), (Which is invalid, as according to the GEDCOM 5.5 specification, valid choices are (ANSEL), (UNICODE) or (ASCII).)
The individual records (INDI) define Bob Cox(ID 1—@I1@), Joann Para (ID 2), and Bobby Jo Cox (ID 3).
The family record (FAM) links the husband (HUSB), wife (WIFE), and child (CHIL) by their ID numbers.
The current version of the specification is GEDCOM 5.5, which was released on 12 January 1996. A subsequent draft GEDCOM 5.5.1 specification was issued in 1999, introducing nine new tags, including WWW, EMAIL and FACT, and adding UTF-8 as an approved character encoding. This draft has not been formally approved, but its provisions have been adopted in some part by a number of genealogy programs and is used by FamilySearch.org While PAF 5.2 does support GEDCOM 5.5, PAF 5.2 uses UTF-8 as its internal character set, a feature which was introduced in the GEDCOM 5.5.1 draft, and can output a UTF-8 GEDCOM.
On 23 January 2002, a draft(beta) version of GEDCOM 6.0 was released for developers to study only as it was not a complete specification and recommended not to begin to implement in their software. For example, descriptions of the meaning and expected contents of tags were not included. GEDCOM 6.0 was to be the first version to store data in XML format, and was to change the preferred character set from ANSEL to Unicode.
Today, lineage-linked GEDCOM is still the deliberate de facto common denominator. Despite version 5.5 of the GEDCOM standard first being published in 1996, many genealogical software suppliers have yet to support the feature of multilingual Unicode text (instead of the ANSEL character set) introduced with that version of the specification. Uniform use of Unicode would allow for the usage of international character sets. An example is the storage of East Asian names in their original Chinese, Japanese and Korean (CJK) characters, without which they could be ambiguous and of little use for genealogical or historical research.
|Red||Old Standard/Draft; not supported|
|Yellow||Old Standard; still supported|
|GEDCOM Version||Release Date||Notes|
|2.0||Dec 1985||PAF 2.0|
|2.1||Feb 1987||GEDCOM for PAF 2.1|
|2.3 Draft||7 August 1985||with PAF2.0 GEDCOM implementation conventions|
|2.4 Draft||13 December 1985||with PAF2.0 GEDCOM implementation conventions|
|3.0 Standard||9 October 1987||PAF 2.0 and 2.1 implementation of 3.0|
|4.0 Standard||August 1989||PAF 2.1 - 2.31|
|4.2 Draft||25 January 1990||-|
|5.0 Draft||31 December 1991||lineage-linked structures were introduced.|
|5.1 Draft||18 September 1992||-|
|5.2 Draft||22 January 1992||-|
|5.3 Draft||4 November 1993||Unicode standard (ISO/IEC 10646) was introduced as an additional character set.|
|5.4 Draft||21 August 1995||-|
|5.5 Standard||11 December 1995||PAF 3, 4 and 5|
|5.5 Standard + Errata Sheet||2 January 1996||PAF 3, 4 and 5|
|GEDCOM (Future Direction) Draft||1 May 1998||"it used an entirely new data model"|
|5.5.1 Draft||2 October 1999||Used by FamilySearch.org UTF-8 added as an approved character encoding.|
|5.6 Private Draft||-||"Jed Allen sent those two files to a few people only for sort of "private comments"|
|6.0 XML Draft||28 December 2001||Was not a complete specification and recommended not to begin to implement in their software.|
A GEDCOM file can contain information on events such as births, deaths, census records, marriages, etc.; a general rule of thumb is that an event is something that took place at a specific time, at a specific place (even if the time & place are not known). GEDCOM files can also contain attributes such as physical description, occupation, and total number of children; unlike events, attributes generally cannot be associated with a specific time or place.
The GEDCOM standard requires that each event or attribute is associated with exactly one individual (or family). This causes redundancy for events such as census records where the actual census entry often contains information on multiple individuals, but in the GEDCOM file a separate CENS event must be added for each individual referenced.
Notice that events can be associated with families, so (for example) marriage information is only stored in a GEDCOM once, as part of the family (FAM) record, and then both spouses are linked to that single family record.
Most genealogy programs would require the user to enter the data multiple times through their user interface.
The way GEDCOM handles places is also considered a weakness of the specification. Places are encoded as strings on the events. An example would look like this following:
1 BIRT 2 PLAC New York City, , New York, USA
Additional references to New York City are represented by additional strings, so changes (for example, to add the county or change spelling) require changing every occurrence throughout the file. This also leads to duplication of information if geo-coding or other subrecords are added to the place.
Sometimes the GEDCOM specification has been made purposefully flexible to support many ways of encoding data, particularly in the area of sources. This flexibility has led to a great deal of ambiguity, and has produced the side effect that some genealogy programs which import GEDCOM do not import all of the data from a file.
GEDCOM does not allow representation of many types of close interpersonal relationships, such as same-sex marriages, domestic partnerships, cohabitation, polyamory, polygamy or incest, most of which are increasingly recognized in modern Western societies as well as diverse cultures and historical contexts.
GEDCOM has many features that are not commonly used, and hence are sometimes thought to not exist. It is important to keep in mind that just because the GEDCOM standard allows a particular feature does not imply that any particular software package will also support that feature.
The GEDCOM standard does support the inclusion of multimedia objects (for example, photos of individuals). Such multimedia objects can be either included in the GEDCOM file itself (called the "embedded form") or in an external file where the name of the external file is specified in the GEDCOM file (called the "linked form"). Embedding multimedia directly in the GEDCOM file makes transmission of data easier, in that all of the information (including the multimedia data) is in one file, but the resulting file can be enormous. Linking multimedia keeps the size of the GEDCOM file under control, but then when transmitting the file, the multimedia objects must either be transmitted separately or archived together with the GEDCOM into one larger file.
The GEDCOM standard does allow for the specification of multiple opinions or conflicting data, simply by specifying multiple records of the same type. For example, if an individual's birth date was recorded as 10 January 1800 on their birth certificate, but 11 January 1800 on their death certificate, two BIRT records for that individual would be included, the first with the 10 January 1800 date and giving the birth certificate as the source, and the second with the 11 January 1800 date and giving the death certificate as the source.
Notice that in the case of multiple instances of the same record, the preferred record should be listed first in the record.
This example encoded in GEDCOM might look like this:
0 @I1@ INDI 1 NAME John /Doe/ 1 BIRT 2 DATE 10 JAN 1800 2 SOUR @S1@ 3 DATA 4 TEXT Transcription from birth certificate would go here 3 NOTE This birth record is preferred because it comes from the birth certificate 3 QUAY 2 1 BIRT 2 DATE 11 JAN 1800 2 SOUR @S2@ 3 DATA 4 TEXT Transcription from death certificate would go here 3 QUAY 2
The GEDCOM standard supports internationalization in several ways. First, newer versions of the standard allow data to be stored in Unicode (or, more recently, UTF-8), so text in any language can be stored. Secondly, in the same way that you can have multiple events on a person, GEDCOM allows you to have multiple names for a person, so names can be stored in multiple languages (although there is no standardized way to indicate which instance is in which language). Finally, in the latest draft version (5.5.1, not yet in widespread use), the NAME field also supports a phonetic variation (FONE) and a romanized variation (ROMN) of the name.
Commsoft, the authors of the Roots series of genealogy software and Ultimate Family Tree, defined a version called Event-Oriented GEDCOM (also known as "Event GEDCOM" and originally called InterGED), which included events as first class (zero-level) items. Although it is event based, it is still a model built on assumed reality rather than evidence. Event GEDCOM was more flexible, as it allowed some separation between believed events and the participants. However, Event GEDCOM was not widely adopted by other developers due to its semantic differences. With Roots and Ultimate Family Tree no longer available, very few people today are using Event GEDCOM.
GEDCOM is a specification for exchanging genealogical data between different genealogical systems. GEDCOM is an acronym for GEnealogical Data COMmunication. A GEDCOM file is plain text (often in ASCII in the United States, although technically the standard mandates use of an obscure text encoding named ANSEL) containing records for each individual in the family tree, and data linking these records together.
Most (if not all) genealogy software supports importing from and/or exporting to GEDCOM format. Additionally, many tools exist to convert GEDCOM files to HTML pages.
Although it is theoretically possible to write a GEDCOM file by hand, the format was designed for export and import by software, and is not especially human-friendly.
A GEDCOM file consists of a header section, records, and a trailer section.
Every line of a GEDCOM file begins with a level number, and all top level records (the header, trailer, and each INDI, FAM, SOUR, EVEN, or OTHR) begin with a line with level 0. All other level numbers are positive integers.
This file structure handles basic relationship information very well, but it is criticized by some genealogists.
Keeping track of records and events is just as important as keeping track of relationships, but GEDCOM stores these as details under the people and family records. This makes them more difficult to organize and add further details.
Another dilemma is which record should own an event. For example, the record for adoption details could be the child, the adopted parents, the birth parents, or the family the child becomes part of.
Many of these issues have been addressed by the open GRAMPS XML format.
|This page uses Creative Commons Licensed content from Wikipedia (view authors).|