A grapheme (from the Greek: γράφω, gráphō, "write") is a fundamental unit in a written language. Examples of graphemes include alphabetic letters, Chinese characters, numerical digits, punctuation marks, and the individual symbols of any of the world's writing systems, although arguably a diacritical mark or ancillary glyph does not constitute a grapheme.

In a fully phonemic orthography, a grapheme corresponds to one phoneme. However this is very much the exception. In spelling systems that are to some extent non-phonemic, such as in English, multiple graphemes may represent a single phoneme. These are called digraphs (two graphemes for a single phoneme) and trigraphs (three graphemes). For example, the word ship contains four graphemes (s, h, i, and p) but only three phonemes, because sh is a digraph. Conversely, a single grapheme can represent multiple phonemes, or no phonemes at all in the case of 'silent' letters: the English word "box" has three graphemes, but four phonemes: /bɒks/.[1]

Furthermore, a particular grapheme can represent different phonemes on different occasions, and vice versa. For instance in English the sound /f/ can be represented by 'F', 'f', 'ff', 'FF', 'ph', 'PH', 'Ph', 'gh', 'GH', and in some place names of Welsh origin by 'Ff'; while the grapheme 'f' can also represent the phoneme /v/ (as in the word of).

Also, a script such as Japanese katakana uses an essentially fully phonemic orthography (see the article phonemic orthography) but in most cases one grapheme corresponds to a pair of phonemes.

In some languages, a group of more than one grapheme may be treated as a single unit for the purposes of collation, for example in an Hungarian dictionary, words starting in cs come after all other words starting c, while in a Welsh dictionary, words starting ll come after all other words starting in l.

In addition, a single grapheme in print may not correspond to a single grapheme in handwriting, for example in German handwriting the combination ch is usually written quite differently from c + h: given that it also has its own sound value, there is a strong argument for treating this as a single distinct grapheme.

In English and other languages, the choice of grapheme(s) is available to convey morphological relationships, for instance the link between sign and signature is closer in writing than in speech.

Different glyphs can represent the same grapheme, meaning that they are allographs. For example, the lower case letter a can be seen in two variants, with a hook at the top <a>, and without <ɑ>. Not all graphemes represent phonemes: for example the logogram ampersand (&), was derived from the Latin word et, and is used for and in many languages, and thus does not directly represent any combination of phonemes. Similarly for Arabic numerals.

In some English personal names and place names, the relationship between the spelling of the name and the pronunciation is so distant that it cannot be identified which phonemes represent which graphemes. Examples are Marjoribanks (pronounced Marshbanks) and Featherstonehaugh (pronounced Fanshaw). Not only that, but in many other words the pronunciation has evolved subsequently to the fixing of the spelling, so that it has to be said that the phoneme(s) represent the grapheme(s), not the other way round. And for many technical jargons, the primary medium of communication is the written language and not the spoken language, so again it is clear that the phoneme(s) represent the grapheme(s).


