CJK Unified Ideographs is a range of Unicode code points assigned for ideographs used by Chinese characters. Since its introduction in Unicode 1.00, the use of CJK ideographs has been extended to multiple blocks.
Contents |
These ideographic characters appear in the following blocks:
Unicode includes support of CJKV radicals, strokes, punctuation, marks and symbols. Although some characters have their (decomposable) counterparts in other blocks, the usages can be different:
Additional compatibility (discouraged use) characters appear in these blocks:
These compatibility characters are included for compatibility with legacy text handling system and other legacy character sets. They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means.
Usually, compatibility characters are those that would not have been encoded except for compatibility and round-trip convertibility with other standards. However, the amount of CJK ideographs within any non-Unicode standard is too big to fit into Unicode's CJK Compatibility Ideographs blocks. Instead, code points are assigned when the affected characters are approved by Unicode Consortium, but have yet to assign any code points within the CJK Unified Ideographs blocks.
The character U+4039 (䀹) was a unification of two different glyphs (one with jiā 夾 phonetic and one with shǎn 㚒 phonetic) until Unicode 5.0. However, they were lexically different that should not have been unified; they have different pronunciations and different meanings.
The proposal of disunification of U+4039 was accepted and the new character is encoded at U+9FC3 in Unicode 5.1.
In CJK Unified Ideographs Extension B, there are hundreds of unifiable glyph variants were encoded, as well as five exact duplicates.
| Unicode version | Addition | Plane | Characters | Total Characters |
|---|---|---|---|---|
| 1.0 | CJK Unified Ideographs | Basic Multilingual Plane (BMP) | 20,902 | 20,914 |
| CJK Compatibility Ideographs | BMP | 12 | ||
| 3.0 | CJK Unified Ideographs Extension A | BMP | 6,582 | 27,496 |
| 3.1 | CJK Unified Ideographs Extension B | Supplementary Ideographic Plane (SIP) | 42,711 | 70,207 |
| 4.1 | CJK Unified Ideographs: Ideographs from HKSCS-2004 and GB 18030-2000 not in ISO 10646 | BMP | 22 | 70,229 |
| 5.1 | CJK Unified Ideographs: Ideographs from Adobe Japan and disunification of U+4039 | BMP | 22 | 70,237 |
| 5.2 | CJK Unified Ideographs Extension C | SIP | 4,149 | 74,394 |
| 8 other characters from ARIB #47, #95, #93 and HKSCS | BMP | 8 |
The code points in this region are assigned under Source Separation Rule. These characters came from following:
| Code | Standard | Character count | note |
|---|---|---|---|
| G0 | GB 2312-80 | 6763 | |
| G1 | GB 12345-90 | 2352 | |
| G3 | GB 7589-87 unsimplified form | 7237 | |
| G5 | GB 7590-87 unsimplified form | 7039 | |
| G7 | Modern Chinese general character chart | 642 | |
| G8 | GB 8565-89 | 290 |
| Code | Standard | Character count | note |
|---|---|---|---|
| T1 | CNS 11643-1986 plane 1 | 5401+9 | |
| T2 | CNS 11643-1986 plane 2 | 7650 | |
| TE | CNS 11643-1986 plane 14 | 6319+239+10 | 239 from CCIII, 10 from XCCS |
| Code | Standard | Character count | note |
|---|---|---|---|
| J0 | JIS X 0208-90 | 6335+1 | |
| J1 | JIS X 0212-90 | 5801 |
| Code | Standard | Character count | note |
|---|---|---|---|
| K0 | KS C 5601-87 | 4888 | includes 268 duplicates |
| K1 | KS C 5657-91 | 2856 |
In Unicode 4.1, 14 HKSCS-2004 characters and 8 GB 18030 characters are assigned to between U+9FA6 and U+9FBB code points.
| Code | Standard |
|---|---|
| GE | GB 16500-95 |
| GS | Singapore CJK ideographs |
| Code | Standard | note |
|---|---|---|
| T3 | CNS 11643-1992 plane 3 | |
| T4 | CNS 11643-1992 plane 4 | |
| T5 | CNS 11643-1992 plane 5 | |
| T6 | CNS 11643-1992 plane 6 | |
| T7 | CNS 11643-1992 plane 7 | |
| TF | CNS 11643-1992 plane 15 |
| Code | Standard | note |
|---|---|---|
| JA | Unified Japanese IT Vendors Contemporary Ideographs, 1993 |
Japan
| Code | Standard | note |
|---|---|---|
| K2 | PKS C 5700-1:1994 | |
| K3 | PKS C 5700-2:1994 |
| Code | Standard | note |
|---|---|---|
| V0 | TCVN 5773:1993 | |
| V1 | TCVN 6056:1995 |
As of April 2008, Extension C was under ballot within the International Organization for Standardization (ISO) and will be included in Unicode 5.2.0. The current allocation is to the code points U+2A6E0 to U+2B734. The characters are derived from the following:
Mainland China
Japan
South Korea
North Korea
Vietnam
UTC
According to the CJK editorial group report ISO/IEC JTC1/SC2/WG2/IRG N1266, there are at least characters from following:
Taiwan
Korea
Vietnam
Unicode
Others
|
|