The Full Wiki

UTF-EBCDIC: Wikis

Advertisements
  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

From Wikipedia, the free encyclopedia

Unicode
Character encodings
UCS
Mapping
Bi-directional text
BOM
Han unification
Unicode and HTML
Unicode and E-mail
Unicode typefaces

UTF-EBCDIC is a character encoding used to represent Unicode characters. It is meant to be EBCDIC-friendly, so that legacy EBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to UTF-8's advantages for existing ASCII-based systems. Details on UTF-EBCDIC are defined in Unicode Technical Report #16.

To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first. The main difference between this encoding and UTF-8 is that it allows unicode code points U+0080 through U+009F (the C1 control codes) to be represented as a single byte and therefore later mapped to corresponding EBCDIC control codes. In order to achieve this 101XXXXX was used instead of 10XXXXXX as the format for later bytes in a multi-byte sequence. As this can only hold 5 bits rather than 6, UTF-EBCDIC will generally produce larger output for the same input data than UTF-8.

This transformation leaves the data in an ASCII based format, so a reversible byte-byte transform is made on this data using a lookup table to make it as close to normal EBCDIC code pages as feasible. These steps can be easily reversed to recover the unicode code points.

Generally, this encoding form is rarely used, even on EBCDIC based mainframes for which it was designed. IBM EBCDIC based mainframe operating systems, like z/OS, usually use UTF-16 for complete Unicode support. For example, DB2 UDB, COBOL, PL/I, Java and the IBM XML toolkit support UTF-16 on IBM mainframes.

Codepage layout

There are 160 characters with single-byte encodings in UTF-EBCDIC; these are shown in the following table. The remaining 96 codes are used as part of multi-byte characters. As you can see, the single byte portion is similar to IBM-1047 instead of IBM-37 due to the location of the square brackets. CCSID 37 has [] at hex BA and BB instead of at hex AD and BD respectively.

UTF-EBCDIC
—0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F
 
0−
 
NUL
0000
0
SOH
0001
1
STX
0002
2
ETX
0003
3
ST
009C
4
HT
0009
5
SSA
0086
6
DEL
007F
7
EPA
0097
8
RI
008D
9
SS2
008E
10
VT
000B
11
FF
000C
12
CR
000D
13
SO
000E
14
SI
000F
15
 
1−
 
DLE
0010
16
DC1
0011
17
DC2
0012
18
DC3
0013
19
OSC
009D
20
LF
000A
21
BS
0008
22
ESA
0087
23
CAN
0018
24
EM
0019
25
PU2
0092
26
SS3
008F
27
FS
001C
28
GS
001D
29
RS
001E
30
US
001F
31
 
2−
 
PAD
0080
32
HOP
0081
33
BPH
0082
34
NBH
0083
35
IND
0084
36
NEL
0085
37
ETB
0017
38
ESC
001B
39
HTS
0088
40
HTJ
0089
41
VTS
008A
42
PLD
008B
43
PLU
008C
44
ENQ
0005
45
ACK
0006
46
BEL
0007
47
 
3−
 
DCS
0090
48
PU1
0091
49
SYN
0016
50
STS
0093
51
CCH
0094
52
MW
0095
53
SPA
0096
54
EOT
0004
55
SOS
0098
56
SGCI
0099
57
SCI
009A
58
CSI
009B
59
DC4
0014
60
NAK
0015
61
PM
009E
62
SUB
001A
63
 
4−
 
SP
0020
64
.
002E
75
<
003C
76
(
0028
77
+
002B
78
|
007C
79
 
5−
 
&
0026
80
!
0021
90
$
0024
91
*
002A
92
)
0029
93
;
003B
94
^
005E
95
 
6−
 
-
002D
96
/
002F
97
,
002C
107
%
0025
108
_
005F
109
>
003E
110
?
003F
111
 
7−
 
`
0060
121
:
003A
122
#
0023
123
@
0040
124
'
0027
125
=
003D
126
"
0022
127
 
8−
 
a
0061
129
b
0062
130
c
0063
131
d
0064
132
e
0065
133
f
0066
134
g
0067
135
h
0068
136
i
0069
137
 
9−
 
j
006A
145
k
006B
146
l
006C
147
m
006D
148
n
006E
149
o
006F
150
p
0070
151
q
0071
152
r
0072
153
 
A−
 
~
007E
161
s
0073
162
t
0074
163
u
0075
164
v
0076
165
w
0077
166
x
0078
167
y
0079
168
z
007A
169
[
005B
173
 
B−
 
]
005D
189
 
C−
 
{
007B
192
A
0041
193
B
0042
194
C
0043
195
D
0044
196
E
0045
197
F
0046
198
G
0047
199
H
0048
200
I
0049
201
 
D−
 
}
007D
208
J
004A
209
K
004B
210
L
004C
211
M
004D
212
N
004E
213
O
004F
214
P
0050
215
Q
0051
216
R
0052
217
 
E−
 
\
005C
224
S
0053
226
T
0054
227
U
0055
228
V
0056
229
W
0057
230
X
0058
231
Y
0059
232
Z
005A
233
 
F−
 
0
0030
240
1
0031
241
2
0032
242
3
0033
243
4
0034
244
5
0035
245
6
0036
246
7
0037
247
8
0038
248
9
0039
249
APC
009F
255
—0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F

See also

External links

Advertisements

Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message