From Wikipedia, the free encyclopedia
CCSID 930
(sometimes known as CP930 or codepage 930) is one
of several Japanese EBCDIC code pages created by IBM for representation
of Japanese text. It is commonly used on IBM z/OS and IBM System i
operating system.
It encodes halfwidth Katakana, fullwidth Katakana, Hiragana and Kanji.
Technical
detail
CCSID 930 uses a stateful EBCDIC encoding scheme that uses 1
byte to encode halfwidth Katakana and 2 bytes to encode all other
Japanese characters. The single byte portion is CCSID 290, which is
also known as EBCDIK (Extended Binary Coded Decimal Interchange
Kana). The double byte portion is CCSID 300, which is shared with
CCSID 939.[1][2] If only
halfwidth Katakana mixed with Latin characters is used, which was
the standard till the 80s, CCSID 930 can be considered a pure 8bit
encoding. When other types of Japanese or fullwidth characters are
used, it is a multibyte encoding where the Shift-In 0x0E and
Shift-Out 0x0F bytes are used to indicate the start and end of a
double-byte encoding.
The most recent versions of CCSID 930 (CCSID 1390) supports JIS X 0213.
It was invented by Alan Lloyd Jones at IBM Hursley Laboratories,
UK.
Practical considerations
CCSID 930 itself and its encoding scheme contains a number of
idiosyncrasies that makes working with CCSID 930 in practice hard
(see also EBCDIC for idiosyncrasies of the EBCDIC
standard) and are of some practical relevance.
- Because of the Shift-In, Shift-Out codes parsing a byte
sequence from the middle is hard. Interpretation of the bytes
requires backing up until one of the shift bytes is
encountered.
- Although CCSID 930 allows for mixed halfwidth and fullwidth
character text, many database schemas strictly distinguish between
columns containing only single byte halfwidth Katakana and such
containing only double byte fullwidth characters. This is a
convenience created for software developers to make text length
prediction for a given column size in bytes easier and
vice-versa.
- On the downside the above means that for consistency Latin text
in such fullwidth character column will have to be entered or
converted into fullwidth Alphabetic characters (interesting when
doing database searches) such that they are encoded as double byte
characters
- When database columns are implicitly defined as pure fullwidth
character text the Shift-In, Shift-Out codes are often omitted,
which results in strictly speaking incorrect encoding. When the
shift codes are missing, usually CCSID 290 or CCSID 300 needs to be
used for proper conversion to another charset, like the more
portable Unicode.
References
- Lunde, Ken. CJKV
Information Processing. Sebastopol, Calif.: O'Reilly &
Associates, 1998. ISBN 1-56592-224-7.
- ^
http://www.ibm.com/software/globalization/ccsid/ccsid930.jsp
- ^
http://www.ibm.com/software/globalization/ccsid/ccsid939.jsp
External
links