ISO/IEC 8859-1: Wikis

Advertisements
  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

From Wikipedia, the free encyclopedia

ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin-1. It is generally intended for “Western European” languages (see below for a list).

ISO-8859-1 is the IANA preferred charset name for this standard when supplemented with the control codes from ISO/IEC 6429 for the C0 (0x00-0x1F) and C1 (0x80-0x9F) parts. Escape sequences (from ISO/IEC 6429 or ISO/IEC 2022) are not to be interpreted.

The Windows-1252 codepage coincides with ISO-8859-1 in the code ranges 0x00 to 0x7F and 0xA0 to 0xFF, but not for the range 0x80 to 0x9F.

Contents

Coverage

ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout The Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages.

Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following European languages (with a few exceptions due to missing characters, as noted):

Modern languages with complete coverage of their alphabet
Languages commonly supported with nearly complete coverage of their alphabet
  • Dutch (missing IJ, ij but these should always be represented as IJ or ij in electronic form)
  • Estonian (missing Š, š, Ž, ž for loan words)
    • Note that Windows-1252 and ISO-8859-15 do contain these
  • Archaic English and modern French (missing Œ, œ and the very rare Ÿ; they are generally replaced by 'OE' and 'oe' without the normally required ligature, and 'Y' without the diaeresis)
    • Note that Windows-1252 and ISO-8859-15 do contain these
  • Finnish (missing Š, š, Ž, ž for loan words)
    • Note that Windows-1252 and ISO-8859-15 do contain these
  • Hungarian (missing Ő, ő, Ű, ű)
  • Welsh (missing Ŵ, ŵ, Ŷ, ŷ)
Coverage of punctuation signs and apostrophes

For some languages listed above the correct typographical quotation marks are missing, for only « », " ", and ' ' are included.

Also, this encoding scheme does not provide the correct character for the apostrophe and oriented single high quotation marks, although some texts use the spacing grave accent and spacing acute accent that are both part of ISO 8859-1, instead of the 6-shaped/9-shaped quotations marks or apostrophes (and this works reliably with some font styles where all these characters are displayed as slanted wedge glyphs).

See also: Alphabets derived from the Latin

History

ISO 8859-1 was based on the Multinational Character Set used by Digital Equipment Corporation in the popular VT220 terminal. It was developed within ECMA, the European Computer Manufacturers Association, and published in March 1985 as ECMA-94, by which name it is still sometimes known. The second edition of ECMA-94 (June 1986) also included ISO 8859-2, ISO 8859-3, and ISO 8859-4 as part of the specification.

In 1985 Commodore adopted officially for its new AmigaOS operating system ANSI/ISO8859-1 layout for its codepage and all internal operations in order to refer to international approved standards rather than proprietary standards, as it happened in those times with MS-DOS, and Mac OS and thus this standard was also used for manufacturing the keyboard layout of Amiga 1000 computer that was launched in July 1985. All versions of Amiga OS up to 3.1 used ISO8859-1. Since the demise of Commodore International in 1994 all further versions of AmigaOS (3.5, 3.9) continued to have ISO8859-1 codepage set enhanced with Euro Currency character, but without a leading firm capable to impose official standards both Amiga and its clone variants (MorphOS, AROS) did not update officially to ISO 8859-15 neither follow a common approach in the introduction of Euro character in 2001. MorphOS 2.0 and further versions are UNICODE UTF-8 compliant.

Relationship to ISO/IEC 8859-15

Although ISO/IEC 8859-1 has enough characters for most French text, it is missing a few letters that are less common. It is also missing a single-glyph representation for the letter IJ, two Finnish letters used for transcription of some foreign names and in a few loanwords (Š and Ž), typographic quotation marks and dashes, and common symbols such as the euro sign (€) and dagger (†).

In order to provide some of these characters, ISO/IEC 8859-15 was developed as an update of ISO/IEC 8859-1. This required, however, the removal of some infrequently-used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics: ¤, ¦, ¨, ´, ¸, ¼, ½, and ¾.

Codepage layout

Since all 191 characters encoded by ISO/IEC 8859-1 are 'graphic' (ISO's term for characters that are not control codes) and are compatible with most web browsers, they can be shown as glyphs in the following table. Since the space, no-break space, and soft hyphen characters would not normally be visible, they are represented by abbreviations for their names. All other characters are represented literally. Row and column headings indicate the hexadecimal digit combinations to produce the eight-bit code value; e.g., the letter L is at code value 4C.

Under each glyph, the numeric value of its codepoint is given, first in hexadecimal, then in decimal, and finally in octal.

ISO/IEC 8859-1 (Latin-1)
—0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F
 
0−
 
                               
 
1−
 
                               
 
2−
 
SP
0020
32 040
!
0021
33 041
"
0022
34 042
#
0023
35 043
$
0024
36 044
%
0025
37 045
&
0026
38 046
'
0027
39 047
(
0028
40 050
)
0029
41 051
*
002A
42 052
+
002B
43 053
,
002C
44 054
-
002D
45 055
.
002E
46 056
/
002F
47 057
 
3−
 
0
0030
48 060
1
0031
49 061
2
0032
50 062
3
0033
51 063
4
0034
52 064
5
0035
53 065
6
0036
54 066
7
0037
55 067
8
0038
56 070
9
0039
57 071
:
003A
58 072
;
003B
59 073
<
003C
60 074
=
003D
61 075
>
003E
62 076
?
003F
63 077
 
4−
 
@
0040
64 100
A
0041
65 101
B
0042
66 102
C
0043
67 103
D
0044
68 104
E
0045
69 105
F
0046
70 106
G
0047
71 107
H
0048
72 110
I
0049
73 111
J
004A
74 112
K
004B
75 113
L
004C
76 114
M
004D
77 115
N
004E
78 116
O
004F
79 117
 
5−
 
P
0050
80 120
Q
0051
81 121
R
0052
82 122
S
0053
83 123
T
0054
84 124
U
0055
85 125
V
0056
86 126
W
0057
87 127
X
0058
88 130
Y
0059
89 131
Z
005A
90 132
[
005B
91 133
\
005C
92 134
]
005D
93 135
^
005E
94 136
_
005F
95 137
 
6−
 
`
0060
96 140
a
0061
97 141
b
0062
98 142
c
0063
99 143
d
0064
100 144
e
0065
101 145
f
0066
102 146
g
0067
103 147
h
0068
104 150
i
0069
105 151
j
006A
106 152
k
006B
107 153
l
006C
108 154
m
006D
109 155
n
006E
110 156
o
006F
111 157
 
7−
 
p
0070
112 160
q
0071
113 161
r
0072
114 162
s
0073
115 163
t
0074
116 164
u
0075
117 165
v
0076
118 166
w
0077
119 167
x
0078
120 170
y
0079
121 171
z
007A
122 172
{
007B
123 173
|
007C
124 174
}
007D
125 175
~
007E
126 176
 
 
8−
 
                               
 
9−
 
                               
 
A−
 
NBSP
00A0
160 240
¡
00A1
161 241
¢
00A2
162 242
£
00A3
163 243
¤
00A4
164 244
¥
00A5
165 245
¦
00A6
166 246
§
00A7
167 247
¨
00A8
168 250
©
00A9
169 251
ª
00AA
170 252
«
00AB
171 253
¬
00AC
172 254
SHY
00AD
173 255
®
00AE
174 256
¯
00AF
175 257
 
B−
 
°
00B0
176 260
±
00B1
177 261
²
00B2
178 262
³
00B3
179 263
´
00B4
180 264
µ
00B5
181 265

00B6
182 266
·
00B7
183 267
¸
00B8
184 270
¹
00B9
185 271
º
00BA
186 272
»
00BB
187 273
¼
00BC
188 274
½
00BD
189 275
¾
00BE
190 276
¿
00BF
191 277
 
C−
 
À
00C0
192 300
Á
00C1
193 301
Â
00C2
194 302
Ã
00C3
195 303
Ä
00C4
196 304
Å
00C5
197 305
Æ
00C6
198 306
Ç
00C7
199 307
È
00C8
200 310
É
00C9
201 311
Ê
00CA
202 312
Ë
00CB
203 313
Ì
00CC
204 314
Í
00CD
205 315
Î
00CE
206 316
Ï
00CF
207 317
 
D−
 
Ð
00D0
208 320
Ñ
00D1
209 321
Ò
00D2
210 322
Ó
00D3
211 323
Ô
00D4
212 324
Õ
00D5
213 325
Ö
00D6
214 326
×
00D7
215 327
Ø
00D8
216 330
Ù
00D9
217 331
Ú
00DA
218 332
Û
00DB
219 333
Ü
00DC
220 334
Ý
00DD
221 335
Þ
00DE
222 336
ß
00DF
223 337
 
E−
 
à
00E0
224 340
á
00E1
225 341
â
00E2
226 342
ã
00E3
227 343
ä
00E4
228 344
å
00E5
229 345
æ
00E6
230 346
ç
00E7
231 347
è
00E8
232 350
é
00E9
233 351
ê
00EA
234 352
ë
00EB
235 353
ì
00EC
236 354
í
00ED
237 355
î
00EE
238 356
ï
00EF
239 357
 
F−
 
ð
00F0
240 360
ñ
00F1
241 361
ò
00F2
242 362
ó
00F3
243 363
ô
00F4
244 364
õ
00F5
245 365
ö
00F6
246 366
÷
00F7
247 367
ø
00F8
248 370
ù
00F9
249 371
ú
00FA
250 372
û
00FB
251 373
ü
00FC
252 374
ý
00FD
253 375
þ
00FE
254 376
ÿ
00FF
255 377
—0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F

Code values 00–1F, 7F–9F are not assigned to characters by ISO/IEC 8859-1.

The lower range 20 to 7E (the G0 subset) maps exactly to the same coded G0 subset of the ISO 646 US variant (commonly known as ASCII), whose ISO 2022 standard switch sequence is "ESC ( B". The higher range A0 to FF (the G1 subset) maps exactly to the same subset initiated by the ISO 2022 standard switch sequence "ESC . A".

Related character maps

The ISO/IEC 8859-1 standard has long been the basis of a number of character maps, also known as character sets, charsets, or code pages, the most popular being ISO-8859-1 (note the extra hyphen) and Windows-1252. Both of these maps are a superset of ISO/IEC 8859-1; they supplement the standard's 191 character assignments by mapping additional characters to at least some portion of the code value ranges 00–1F, 7F, and 80–9F.

Advertisements

ISO-8859-1

In 1992, the IANA registered the character map ISO_8859-1:1987, more commonly known by its preferred MIME name of ISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on the Internet. This map assigns the C0 and C1 control characters to the code values 00–1F, 7F, and 80–9F. It thus provides for 256 characters via every possible 8-bit value.

ISO-8859-1 is (according to the standards at least) the default encoding of documents delivered via HTTP with a MIME type beginning with "text/". It is the default encoding of the values of certain descriptive HTTP headers, and is the standard encoding used by the X Window System on most Unix machines in locales which use that character set. It was also the basis of the repertoire of characters allowed in HTML 3.2 documents (HTML 4.0, however, is based on Unicode).

Escape sequences (from ISO/IEC 6429 or ISO/IEC 2022) are not to be interpreted in documents labeled as ISO-8859-1 encoded. As well as the canonical name and preferred MIME name mentioned above, the following other aliases are registered for ISO-8859-1: ISO_8859-1, ISO-8859-1, iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819. ISO-8859-1 was also incorporated as the first 256 code points of ISO/IEC 10646 and Unicode.

Code point Control character Abbreviation
00 Null NUL
01 Start Of Heading SOH
02 Start of Text STX
03 End of Text ETX
04 End Of Transmission EOT
05 Enquiry ENQ
06 Acknowledge ACK
07 Bell BEL
08 Backspace BS
09 Horizontal Tab HT
0A Line Feed LF
0B Vertical Tab VT
0C Form Feed FF
0D Carriage Return CR
0E Shift Out SO
0F Shift In SI
10 Data Link Escape DLE
11 Device Control 1 DC1
12 Device Control 2 DC2
13 Device Control 3 DC3
14 Device Control 4 DC4
15 Negative Acknowledge NAK
16 Synchronous idle SYN
17 End of Transmission Block ETB
18 Cancel CAN
19 End of Medium EM
1A Substitute (character) SUB
1B Escape character ESC
1C File separator FS
1D Group separator GS
1E Record separator RS
1F Unit separator US
7F Delete DEL
 
Code point Control character Abbreviation
80 Padding Character PAD
81 High Octet Preset HOP
82 Break Permitted Here BPH
83 No Break Here NBH
84 Index IND
85 Next Line NEL
86 Start of Selected Area SSA
87 End of Selected Area ESA
88 Character Tabulation Set HTS
89 Character Tabulation with Justification HTJ
8A Line Tabulation Set VTS
8B Partial Line Forward PLD
8C Partial Line Backward PLU
8D Reverse Line Feed RI
8E Single Shift 2 SS2
8F Single Shift 3 SS3
90 Device Control String DCS
91 Private Use 1 PU1
92 Private Use 2 PU2
93 Set Transmit State STS
94 Cancel Character CCH
95 Message Waiting MW
96 Start of Guarded Area SPA
97 End of Guarded Area EPA
98 Start of String SOS
99 Single Graphic Character Introducer SGCI
9A Single Character Introducer SCI
9B Control Sequence Introducer CSI
9C String Terminator ST
9D Operating System Command OSC
9E Privacy Message PM
9F Application Program Command APC
ISO-8859-1
—0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F
 
0−
 
NUL
0000
0
SOH
0001
1
STX
0002
2
ETX
0003
3
EOT
0004
4
ENQ
0005
5
ACK
0006
6
BEL
0007
7
BS
0008
8
HT
0009
9
LF
000A
10
VT
000B
11
FF
000C
12
CR
000D
13
SO
000E
14
SI
000F
15
 
1−
 
DLE
0010
16
DC1
0011
17
DC2
0012
18
DC3
0013
19
DC4
0014
20
NAK
0015
21
SYN
0016
22
ETB
0017
23
CAN
0018
24
EM
0019
25
SUB
001A
26
ESC
001B
27
FS
001C
28
GS
001D
29
RS
001E
30
US
001F
31
 
2−
 
SP
0020
32
!
0021
33
"
0022
34
#
0023
35
$
0024
36
%
0025
37
&
0026
38
'
0027
39
(
0028
40
)
0029
41
*
002A
42
+
002B
43
,
002C
44
-
002D
45
.
002E
46
/
002F
47
 
3−
 
0
0030
48
1
0031
49
2
0032
50
3
0033
51
4
0034
52
5
0035
53
6
0036
54
7
0037
55
8
0038
56
9
0039
57
:
003A
58
;
003B
59
<
003C
60
=
003D
61
>
003E
62
?
003F
63
 
4−
 
@
0040
64
A
0041
65
B
0042
66
C
0043
67
D
0044
68
E
0045
69
F
0046
70
G
0047
71
H
0048
72
I
0049
73
J
004A
74
K
004B
75
L
004C
76
M
004D
77
N
004E
78
O
004F
79
 
5−
 
P
0050
80
Q
0051
81
R
0052
82
S
0053
83
T
0054
84
U
0055
85
V
0056
86
W
0057
87
X
0058
88
Y
0059
89
Z
005A
90
[
005B
91
\
005C
92
]
005D
93
^
005E
94
_
005F
95
 
6−
 
`
0060
96
a
0061
97
b
0062
98
c
0063
99
d
0064
100
e
0065
101
f
0066
102
g
0067
103
h
0068
104
i
0069
105
j
006A
106
k
006B
107
l
006C
108
m
006D
109
n
006E
110
o
006F
111
 
7−
 
p
0070
112
q
0071
113
r
0072
114
s
0073
115
t
0074
116
u
0075
117
v
0076
118
w
0077
119
x
0078
120
y
0079
121
z
007A
122
{
007B
123
|
007C
124
}
007D
125
~
007E
126
DEL
007F
127
 
8−
 
PAD
0080
128
HOP
0081
129
BPH
0082
130
NBH
0083
131
IND
0084
132
NEL
0085
133
SSA
0086
134
ESA
0087
135
HTS
0088
136
HTJ
0089
137
VTS
008A
138
PLD
008B
139
PLU
008C
140
RI
008D
141
SS2
008E
142
SS3
008F
143
 
9−
 
DCS
0090
144
PU1
0091
145
PU2
0092
146
STS
0093
147
CCH
0094
148
MW
0095
149
SPA
0096
150
EPA
0097
151
SOS
0098
152
SGCI
0099
153
SCI
009A
154
CSI
009B
155
ST
009C
156
OSC
009D
157
PM
009E
158
APC
009F
159
 
A−
 
NBSP
00A0
160
¡
00A1
161
¢
00A2
162
£
00A3
163
¤
00A4
164
¥
00A5
165
¦
00A6
166
§
00A7
167
¨
00A8
168
©
00A9
169
ª
00AA
170
«
00AB
171
¬
00AC
172
SHY
00AD
173
®
00AE
174
¯
00AF
175
 
B−
 
°
00B0
176
±
00B1
177
²
00B2
178
³
00B3
179
´
00B4
180
µ
00B5
181

00B6
182
·
00B7
183
¸
00B8
184
¹
00B9
185
º
00BA
186
»
00BB
187
¼
00BC
188
½
00BD
189
¾
00BE
190
¿
00BF
191
 
C−
 
À
00C0
192
Á
00C1
193
Â
00C2
194
Ã
00C3
195
Ä
00C4
196
Å
00C5
197
Æ
00C6
198
Ç
00C7
199
È
00C8
200
É
00C9
201
Ê
00CA
202
Ë
00CB
203
Ì
00CC
204
Í
00CD
205
Î
00CE
206
Ï
00CF
207
 
D−
 
Ð
00D0
208
Ñ
00D1
209
Ò
00D2
210
Ó
00D3
211
Ô
00D4
212
Õ
00D5
213
Ö
00D6
214
×
00D7
215
Ø
00D8
216
Ù
00D9
217
Ú
00DA
218
Û
00DB
219
Ü
00DC
220
Ý
00DD
221
Þ
00DE
222
ß
00DF
223
 
E−
 
à
00E0
224
á
00E1
225
â
00E2
226
ã
00E3
227
ä
00E4
228
å
00E5
229
æ
00E6
230
ç
00E7
231
è
00E8
232
é
00E9
233
ê
00EA
234
ë
00EB
235
ì
00EC
236
í
00ED
237
î
00EE
238
ï
00EF
239
 
F−
 
ð
00F0
240
ñ
00F1
241
ò
00F2
242
ó
00F3
243
ô
00F4
244
õ
00F5
245
ö
00F6
246
÷
00F7
247
ø
00F8
248
ù
00F9
249
ú
00FA
250
û
00FB
251
ü
00FC
252
ý
00FD
253
þ
00FE
254
ÿ
00FF
255
—0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F

Note that most of these control characters are not made for use in portable ISO-8859-1 encoded plain text documents, but only within specific protocols or devices, except a few ones whose behavior are standardized: TAB (09), LF (0A), CR (0D) and NEL (85); all but the first one are used to encode end of lines or to separate paragraphs, and TAB is often considered equivalent to whitespace. However FF (0C) is commonly accepted in some applications interpreting plain-text documents as an additional ignorable whitespace at the beginning of lines, to mark the position of an explicit page break when printing.

However, some encodings allow using BS (08) to create additional characters by emulating the superposition of multiple characters on printing devices.

Some ISO standards assign specific functions to some controls (for example in ISO 2022) where SO (0E), SI (0F), DLE (10), ESC (1B) and SS2 (8E) are used to control the encoding of characters after them or to switch between multiple encodings.

The NUL character (00) is commonly used as a string terminator in some programming languages, or as a filler in database records that must be ignored and is not part of the encoded text. STX (02) and ETX (03) are commonly used for delimiting frames in some transmission protocols. SUB (1A) is also commonly used as a replacement character to mark errors detected in input transmission streams, and it may be rendered graphically. DC1 (11) and DC3 (13) are commonly used in the XON/XOFF protocol for controlling the transmission speed. Finally, EM (19) or EOT (04) may be used as an end-of-file marker in some text file formats.

ISO-8859-1 and Windows-1252 confusion

It is very common to mislabel text data with the charset label ISO-8859-1, even though the data is really Windows-1252 encoded. In Windows-1252, codes between 0x80 and 0x9F are used for letters and punctuation, whereas they are control codes in ISO-8859-1. Many web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters in order to accommodate such mislabeling but it is not standard behaviour and care should be taken to avoid generating these characters in ISO-8859-1 labeled content. However, the draft HTML 5 specification requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.[1]

Similar character sets

The Apple Macintosh computer introduced a character encoding called Mac Roman, or Mac-Roman, in 1984. It was meant to be suitable for Western European desktop publishing. It is a superset of ASCII, like ISO-8859-1, and has most of the characters that are in ISO-8859-1 but in a totally different arrangement. A later version, registered with IANA as "Macintosh", replaced the generic currency sign ¤ with the euro sign €. The few printable characters that are in ISO 8859-1 but not in this set are often a source of trouble when editing text on websites using older Macintosh browsers (including the last version of Internet Explorer for Mac). However the extra characters that Windows-1252 has in the C1 codepoint range are all supported in MacRoman and except for the few missing ISO-8859-1 characters a Macintosh can send/receive files (and email) that are encoded/marked as ISO-8859-1 (with the C1 Control Characters) and Windows-1252 by remapping the glyph's codepoint numbers.

DOS had code page 850, which had all printable characters that ISO-8859-1 had (albeit in a totally different arrangement) plus the most widely used graphic characters from code page 437.

See also

Notes

External links


Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message