CJK Unified Ideographs: Wikis

  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

Updated live from Wikipedia, last check: May 30, 2012 10:39 UTC (41 seconds ago)

From Wikipedia, the free encyclopedia

CJK Unified Ideographs is a range of Unicode code points assigned for ideographs used by Chinese characters. Since its introduction in Unicode 1.00, the use of CJK ideographs has been extended to multiple blocks.

Contents

Unicode ranges

Character Types

Letters and other
     script specific
Unihan ideographs, etc.
Phonetic characters
Numerals
Punctuation and separators
Diacritics and other marks
Symbols:
Compatibility characters
Control characters
Other Topics
Combining character
Precomposed character

These ideographic characters appear in the following blocks:

  • CJK Unified Ideographs (4E00–9FFF) (chart)
  • CJK Unified Ideographs Extension A (3400–4DBF) (chart)
  • CJK Unified Ideographs Extension B (20000–2A6DF)
  • Enclosed CJK Letters and Months (3200–32FF) (chart)

Unicode includes support of CJKV radicals, strokes, punctuation, marks and symbols. Although some characters have their (decomposable) counterparts in other blocks, the usages can be different:

Additional compatibility (discouraged use) characters appear in these blocks:

  • CJK Compatibility (3300–33FF) (chart)
  • CJK Compatibility Ideographs (F900–FAFF) (chart)
  • CJK Compatibility Ideographs Supplement (2F800–2FA1F) (chart)
  • CJK Compatibility Forms (FE30–FE4F) (chart)

These compatibility characters are included for compatibility with legacy text handling system and other legacy character sets. They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means.

CJK Compatibility Ideographs

Usually, compatibility characters are those that would not have been encoded except for compatibility and round-trip convertibility with other standards. However, the amount of CJK ideographs within any non-Unicode standard is too big to fit into Unicode's CJK Compatibility Ideographs blocks. Instead, code points are assigned when the affected characters are approved by Unicode Consortium, but have yet to assign any code points within the CJK Unified Ideographs blocks.

Known issues

Disunification of U+4039

The character U+4039 (䀹) was a unification of two different glyphs (one with jiā 夾 phonetic and one with shǎn 㚒 phonetic) until Unicode 5.0. However, they were lexically different that should not have been unified; they have different pronunciations and different meanings.

The proposal of disunification of U+4039 was accepted and the new character is encoded at U+9FC3 in Unicode 5.1.

Hundreds of unifiable glyph variants and five exact duplicates in Extension B

In CJK Unified Ideographs Extension B, there are hundreds of unifiable glyph variants were encoded, as well as five exact duplicates.

  • U+34A8 㒨 = U+20457 𠑗
  • U+3DB7 㶷 = U+2420E 𤈎
  • U+8641 虁 = U+27144 𧅄
  • U+204F2 𠓲 = U+23515 𣔕
  • U+249BC 𤦼 = U+249E9 𤧩

Version history

Unicode version Addition Plane Characters Total Characters
1.0 CJK Unified Ideographs Basic Multilingual Plane (BMP) 20,902 20,914
CJK Compatibility Ideographs BMP 12
3.0 CJK Unified Ideographs Extension A BMP 6,582 27,496
3.1 CJK Unified Ideographs Extension B Supplementary Ideographic Plane (SIP) 42,711 70,207
4.1 CJK Unified Ideographs: Ideographs from HKSCS-2004 and GB 18030-2000 not in ISO 10646 BMP 22 70,229
5.1 CJK Unified Ideographs: Ideographs from Adobe Japan and disunification of U+4039 BMP 22 70,237
5.2 CJK Unified Ideographs Extension C SIP 4,149 74,394
8 other characters from ARIB #47, #95, #93 and HKSCS BMP 8

Sources

CJK Unified Ideographs

The code points in this region are assigned under Source Separation Rule. These characters came from following:

Mainland China

Code Standard Character count note
G0 GB 2312-80 6763
G1 GB 12345-90 2352
G3 GB 7589-87 unsimplified form 7237
G5 GB 7590-87 unsimplified form 7039
G7 Modern Chinese general character chart 642
G8 GB 8565-89 290

Taiwan

Code Standard Character count note
T1 CNS 11643-1986 plane 1 5401+9
T2 CNS 11643-1986 plane 2 7650
TE CNS 11643-1986 plane 14 6319+239+10 239 from CCIII, 10 from XCCS

Japan

Code Standard Character count note
J0 JIS X 0208-90 6335+1
J1 JIS X 0212-90 5801

South Korea

Code Standard Character count note
K0 KS C 5601-87 4888 includes 268 duplicates
K1 KS C 5657-91 2856

Others

In Unicode 4.1, 14 HKSCS-2004 characters and 8 GB 18030 characters are assigned to between U+9FA6 and U+9FBB code points.

CJK Unified Ideographs Extension A

Mainland China

Code Standard
GE GB 16500-95
GS Singapore CJK ideographs

Taiwan

Code Standard note
T3 CNS 11643-1992 plane 3
T4 CNS 11643-1992 plane 4
T5 CNS 11643-1992 plane 5
T6 CNS 11643-1992 plane 6
T7 CNS 11643-1992 plane 7
TF CNS 11643-1992 plane 15

Japan

Code Standard note
JA Unified Japanese IT Vendors Contemporary Ideographs, 1993

Japan

South Korea

Code Standard note
K2 PKS C 5700-1:1994
K3 PKS C 5700-2:1994

Vietnam

Code Standard note
V0 TCVN 5773:1993
V1 TCVN 6056:1995

CJK Unified Ideographs Extension B

CJK Unified Ideographs Extension C

As of April 2008, Extension C was under ballot within the International Organization for Standardization (ISO) and will be included in Unicode 5.2.0. The current allocation is to the code points U+2A6E0 to U+2B734. The characters are derived from the following:

Mainland China

Japan

  • Japanese KOKUJI Collection

South Korea

  • Korean IRG Hanja Character Set 5th Edition: 2001

North Korea

  • KPS 10721:2003

Vietnam

  • Từ điển chữ Nôm (喃字典), Nguyễn Quang Hồng, 2006
  • Từ điển chữ Nôm Tày, Hoàng Triều Ân, 2003
  • Bảng tra chữ Nôm miền Nam, Vũ Văn Kính, 1994

UTC

  • ABC Chinese-English Dictionary, John DeFrancis (德范克), et al., eds., 2nd edition. (1998) Honolulu: University of Hawaii Press
  • The Church of Jesus Christ of Latter-day Saints Hong Kong division
  • Mathews' Chinese-English Dictionary, Robert H. Mathews (1975) Cambridge; Harvard University Press
  • Guangyun
  • Chinese bird system index (中国鸟类系统检索), Zheng Zhuoxin (郑作新), et al. (2000), Beijing, 科学出版社 (www.sciencep.com)
  • Annotated Shuowen Jiezi, Duan Yucai

CJK Unified Ideographs Extension D

According to the CJK editorial group report ISO/IEC JTC1/SC2/WG2/IRG N1266, there are at least characters from following:

Taiwan

  • TD-454E
  • TC-5036
  • TD-624C
  • TD-5352
  • TC-4139
  • TC-4A76
  • TD-5C26

Korea

  • K5H00535
  • K5H00222
  • K5H00297
  • KP1-73E1
  • KP1-712E
  • KP1-70BE
  • KP1-6752
  • KP1-672B
  • KP1-6651
  • KP1-4B50
  • KP1-487E
  • KP1-4731

Vietnam

  • V04-5073

Unicode

  • UTC00103

Others

  • CJK Unified Ideographs Extension C Remainder list
  • Macao SAR (IRGN1249 with minor adjustment)
  • Unicode (IRGN1256 and IRGN1257, 472 char)
  • Mainland China (IRGN1264, 57)

CJK Compatibility Ideographs

See also








Got something to say? Make a comment.
Your name
Your email address
Message
Please enter the solution to case below
70+12=