Space character: Wikis


Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

(Redirected to Space (punctuation) article)

From Wikipedia, the free encyclopedia

 

Punctuation

apostrophe ( ' )
brackets ( [ ], ( ), { }, ⟨ ⟩ )
colon ( : )
comma ( , )
dashes ( , , , )
ellipses ( , ... )
exclamation mark ( ! )
full stop/period ( . )
guillemets ( « » )
hyphen ( -, )
question mark ( ? )
quotation marks ( ‘ ’, “ ” )
semicolon ( ; )
slash/stroke ( / )
solidus ( )
Word dividers
spaces ( ) () () ( ) () () ()
interpunct ( · )
General typography
ampersand ( & )
at sign ( @ )
asterisk ( * )
backslash ( \ )
bullet ( )
caret ( ^ )
copyright symbol ( © )
currency generic: ( ¤ )
specific: ฿, ¢, $, , ƒ, , , , £, , ¥, , ,
daggers ( , )
degree ( ° )
ditto mark ( )
inverted exclamation mark ( ¡ )
inverted question mark ( ¿ )
number sign/pound/hash ( # )
numero sign ( )
ordinal indicator (º, ª)
percent (etc.) ( %, ‰, )
pilcrow ( )
prime ( )
registered trademark ( ® )
section sign ( § )
service mark ( )
sound recording copyright symbol ( )
tilde ( ~ )
trademark ( )
underscore/understrike ( _ )
vertical/broken bar, pipe ( |, ¦ )
Uncommon typography
asterism ( )
falsum ( )
index/fist ( )
therefore sign ( )
because sign ( )
interrobang ( )
irony mark/percontation point ( ؟ )
lozenge ( )
reference mark ( )
tie ( )

In writing, a space ( ) is a blank area devoid of content, serving to separate words, letters, numbers, and punctuation. Conventions for interword and intersentence spaces vary among languages, and in some cases the spacing rules are quite complex.

Latin was originally written scripta continua, without any word separators. There was a brief use of interpuncts (centred dots) to make reading Latin easier, but that practice was abandoned sometime around the year 200 CE. In around 600–800 CE, blank spaces started being inserted between words in Latin, and that practice carried over to all languages using the Latin alphabet (e.g. English). In typesetting, spaces have historically been of multiple lengths with particular space-lengths being used for specific typographic purposes, such as separating words or separating sentences or separating punctuation from words. Following the invention of the typewriter and the subsequent overlap of designer style-preferences and computer-technology limitations, much of this reader-centric variation has been lost in normal use.

In computer representation of text, spaces of various sizes, styles, or language characteristics (different space characters) are indicated with unique code points.

Contents

Use of the space in natural languages

Spaces between words

Modern English uses a space to separate words, but not all languages follow this practice. Spaces were not used to separate words in Latin until roughly AD 600–AD 800. Ancient Hebrew and Arabic did use spaces, partly to compensate in clarity for the lack of vowels. Traditionally, all CJK languages have no spaces: modern Chinese and Japanese (except when written with little or no kanji) still do not, but modern Korean uses spaces.

Spaces between sentences

There are three main conventions relating to the number of spaces used to separate sentences within the same paragraph:

  • one normal-width space (or French spacing). This is the current convention in countries that use the modern Latin alphabet.
  • one widened space, typically two to three times wider than an inter-word space (traditional typography)
  • two normal-width spaces (double spacing, English spacing or American typewriter spacing). This is a historical American typing convention that has been reversed in modern print media.

"Double spacing" can also refer to a style of line spacing: the insertion of a full additional empty line between lines of text. This is commonly used for text which may incorporate later markup or modifications, such as proof-readers' copies, legal documents, or academic assignments for correction.

Spaces and unit symbols

In Canadian Style: A Guide to Writing and Editing:

  • When symbols are used, the prefix symbol and unit symbols are run together:
5 cm
7 hL
4 dag
13 kPa
  • When a symbol consists entirely of letters, leave a full space between the quantity and the symbol:
45 kg not 45kg
  • When the symbol includes a non-letter character as well as letter, leave no space:
32°C not 32° C or 32 °C
However, the International System of Units, or SI, requires a space to be used to separate the unit symbol from the numerical value, and this also applies to the symbol for the degree Celsius, as 32 °C. The only exceptions to this rule in the SI are for the symbols for degree, minute and second for plane angle, as 30° 22′ 8″.[1] Wikipedia's style guide also follows the SI standard.
  • For the sake of clarity, a hyphen may be inserted between a numeral and a symbol used adjectivally:
35-mm film
60-W bulb
However, some other style guides, including Wikipedia's, deprecate hyphenation in these cases. The SI allows a hyphen between the numeral and the unit only when the name of the unit is spelled out, as 35-millimetre film.[1]

Space characters and digital typography

Variable-width general-purpose space

In computer character encodings, there is a normal general-purpose space (Unicode character U+0020; 32 decimal) whose width will vary according to the design of the typeface. Typical values range from 1/5-em to 1/3-em (in digital typography an em is equal to the nominal size of the font, so for a 10-point font the space will probably be between 2 and 3.3 points). Sophisticated fonts may have differently sized spaces for bold, italic, and small-caps faces, and often compositors will manually adjust the width of the space depending on the size and prominence of the text.

In addition to this general-purpose space, it is possible to encode a space of a specific width. See the table below for a complete list.

(In monospaced proofreading copy, only em- and en-spaces are represented using this character (which is called an em-quad or an en-quad), while other types of spaces are represented with a number sign.

Breaking and non-breaking spaces

By default, computer programs usually assume that, in flowing text, a line break may as necessary be inserted at the position of a space. The non-breaking space, U+00A0 (160 decimal), is intended to render the same as a normal space but prevents line-wrapping at that position.

However, there are programs which do not follow this intent exactly, for example even such a modern and widespread web browser like Mozilla Firefox version 3.5.3, released in 2009. It (correctly) suppresses the line-wrapping when rendering the non-breaking space, but it (incorrectly) ignores the word-spacing CSS style property value. Other programs may also suffer from the same flaw. Following simple HTML code demonstrates this flaw on affected browsers (copy/paste it into some text editor, save as test.htm, and open it in your browser):

<html><body style="word-spacing:1em">
This paragraph shows extreme wide<br>
spaces between words, because of<br>
1em word-spacing CSS style property.
<br><br>
This&nbsp;paragraph&nbsp;contains&nbsp;non-breaking<br>
spaces&nbsp;and&nbsp;should&nbsp;show&nbsp;the&nbsp;same<br>
spaces&nbsp;like&nbsp;the&nbsp;first&nbsp;one.<br>
</body></html>

Note: As of version 3.6, Firefox does respect the word-spacing CSS style property value.

The generic Unicode space is often[citation needed] considered insignificant when appearing at the end of a line of text, or when part of a sequence of whitespace characters, so it may be omitted or "collapsed" in such circumstances. The non-breaking space is expressly non-collapsible and may be used to indent text, though best World Wide Web practice prescribes using CSS for this purpose.

Hair spaces around dashes

In American typography, both en dashes and em dashes are set continuous with the text (as illustrated by use in the Chicago Manual of Style, 6.80, 6.83–86). However, an em dash can optionally be surrounded with a so-called hair space, U+200A (8202 decimal), or thin space, U+2009 (8201 decimal). The latter can be written in HTML by using the named entity &thinsp; and the former can be written using numeric character reference &#x200A; or &#8202;. This space should be much thinner than a normal space, and is seldom used on its own.

Normal space versus hair space
(as rendered by your browser)
Normal space left right
Normal space with em dash left — right
Thin space with em dash left — right
Hair space with em dash left — right
No space with em dash left—right

Table of spaces

Unicode defines[2] several space characters with specific semantics and rendering characteristics, as shown in the table below. Depending on the browser and fonts used to view this table, not all spaces may display properly:

Space characters defined in Unicode
Code Dec Break HTML Name Block Display Description
U+0020 32 Yes Space Basic Latin ] [ Normal space, same as ASCII character 0x20
U+00A0 160 No &nbsp; No-Break Space Latin-1 Supplement ] [ Identical to U+0020, but not a point at which a line may be broken
U+1680 5760 Yes Ogham Space Mark Ogham ] [ Used for interword separation in Ogham text. Normally a vertical line in vertical text or a horizontal line in horizontal text, but may also be a blank space in "stemless" fonts. Requires an Ogham font.
U+180E 6158 Yes Mongolian Vowel Separator (MVS) Mongolian ]᠎[ A narrow space character (not to be confused with "thin space", below) used in Mongolian to cause the final two characters of a word to take on different shapes.[3]
U+2002 8194 Yes &ensp; En Space,
Nut
General Punctuation ] [ Width of one en (half of one em). U+2000 En Quad is canonically equivalent to this character (En Space is preferred).
U+2003 8195 Yes &emsp; Em Space,
Mutton
General Punctuation ] [ Width of one em. U+2001 Em Quad is canonically equivalent to this character (Em Space is preferred).
U+2004 8196 Yes Three-Per-Em Space,
Thick Space
General Punctuation ] [ One third of an em wide
U+2005 8197 Yes Four-Per-Em Space,
Mid Space
General Punctuation ] [ One fourth of an em wide
U+2006 8198 Yes Six-Per-Em Space General Punctuation ] [ One sixth of an em wide. In computer typography sometimes equated to U+2009.
U+2007 8199 No Figure Space General Punctuation ] [ In fonts with monospaced digits, equal to the width of one digit
U+2008 8200 Yes Punctuation Space General Punctuation ] [ As wide as the narrow punctuation in a font, i.e. the advance width of the period or comma.[4]
U+2009 8201 Yes &thinsp; Thin Space General Punctuation ] [ One fifth (sometimes one sixth) of an em wide. Recommended for use as a thousands separator for measures made with SI units. Unlike U+2002 to U+2008, its width may get adjusted in typesetting.[5]
U+200A 8202 Yes Hair Space General Punctuation ] [ Thinner than a thin space
U+200B 8203 Yes Zero Width Space (ZWSP) General Punctuation ]​[ Used to indicate word boundaries to text processing systems when using scripts that do not use explicit spacing.
U+200C 8204 Yes &zwnj; Zero Width Non Joiner (ZWNJ) General Punctuation ]‌[ When placed between two characters that would otherwise be connected, a ZWNJ causes them to be printed in their final and initial forms, respectively.
U+200D 8205 Yes &zwj; Zero Width Joiner (ZWJ) General Punctuation ]‍[ When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms.
U+202F 8239 No Narrow No-Break Space General Punctuation ] [ Similar in function to U+00A0 No-Break Space. Introduced in Unicode 3.0 for Mongolian,[6] to separate a suffix from the word stem without indicating a word boundary. When used with Mongolian, its width is usually one third of the normal space; in other context, its width resembles that of the Thin Space (U+2009) at least with some fonts.
U+205F 8287 Yes Medium Mathematical Space (MMSP) General Punctuation ] [ Used in mathematical formulae. Four-eighteenths of an em.[7] In mathematical typography, the widths of spaces are usually given in integral multiples of an eighteenth of an em, and 4/18 em may be used in several situations, for example between the a and the + and between the + and the b in the expression a + b.[8]
U+2060 8288 No Word Joiner General Punctuation ]⁠[ Identical to U+200B, but not a point at which a line may be broken. Introduced in Unicode 3.2 to replace the deprecated "zero width no-break space" function of the U+FEFF character.
U+3000 12288 Yes Ideographic Space CJK Symbols and Punctuation ] [ As wide as a CJK character cell (fullwidth)
U+FEFF 65279 No Zero Width No-Break Space
= Byte Order Mark (BOM)
Arabic Presentation Forms-B ][ Used primarily as a Byte Order Mark. Use as an indication of non-breaking is deprecated as of Unicode 3.2, see U+2060 instead.

Unicode also provides some visible characters to stand in for space when necessary in the "Control Pictures" block: the Symbol For Space (U+2420), the Blank Symbol (U+2422), and the Open Box (U+2423). The interpunct · is also often used to represent a space in word processing programs such as Microsoft Word.

Use of the space in computing

In programming language syntax, spaces are frequently used to explicitly separate tokens. Aside from this use, spaces and other whitespace characters are usually ignored by modern programming languages. Exceptions are Haskell, occam, ABC, and Python, which use the amount of whitespace in indentation to indicate the bounds of a block, and a whimsical language called Whitespace, where whitespace is the only meaningful syntactical element.

Text editors, word processors, and desktop publishing software differ in how they represent whitespace on the screen, and how they represent spaces at the ends of lines longer than the screen or column width. In some cases, spaces are shown simply as blank space; in other cases they may be represented by an interpunct or other symbols. Many different characters (described below) could be used to produce spaces, and non-character functions (such as margins and tab settings) can also affect whitespace.

Space characters in markup languages

Generalised markup languages, such as SGML, do not treat space characters differently from other characters.

However, special-purpose markup languages may do. In particular, web markup languages such as XML and HTML treat whitespace characters specially, including space characters, for programmers' convenience. One or more space characters read by conforming Display-time processors of those markup languages are collapsed to 0 or 1 space, depending on their semantic context. For example, double (or more) spaces within text are collapsed to a single space, and spaces which appear on either side of the "=" that separates an attribute name from its value have no effect on the interpretation of the document. Element end tags can contain trailing spaces, and empty-element tags in XML can contain spaces before the "/>".

In XML attribute values, sequences of whitespace characters are treated as a single space when the document is read by a parser.[9] Whitespace in XML element content is not changed in this way by the parser, but an application receiving information from the parser may choose to apply similar rules to element content. An XML document author can use the xml:space="preserve" attribute on an element to force the parser to discourage the downstream application from altering whitespace in that element's content.

In most HTML elements, a sequence of whitespace characters is treated as a single inter-word separator, which may manifest as a single space character when rendering text in a language that normally inserts such space between words.[10] Conforming HTML renderers are required to apply a more literal treatment of whitespace within a few prescribed elements, such as the pre tag and any element for which CSS has been used to apply pre-like whitespace processing. In such elements, space characters will not be "collapsed" into inter-word separators.

In both XML and HTML, the non-breaking space character, along with other non-"standard" spaces, is not treated as collapsible "whitespace", so it is not subject to the rules above.

See also

References

  1. ^ a b The International System of Units (SI) (8 ed.). International Bureau of Weights and Measures (BIPM). 2006. p. 133. http://www.bipm.org/utils/common/pdf/si_brochure_8_en.pdf. .
  2. ^ The Unicode Standard ver. 5.2.0 – section 6.2 table 6-2, and section 16.2 Line and Word Breaking
  3. ^ Gillam, Richard (2002). Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard. Addison-Wesley. ISBN 0-201-70052-2. 
  4. ^ "Character design standards - space characters". Character design standards. Microsoft. 1998–1999. http://www.microsoft.com/typography/developers/fdsspec/spaces.htm. Retrieved 2009-05-18. 
  5. ^ The Unicode Standard 5.0, printed edition, p.205
  6. ^ ISO/IEC 10646-1:1993/FDAM 29:1999(E)
  7. ^ "General Punctuation" (PDF). The Unicode Standard 5.1. Unicode Inc. 1991–2008. http://www.unicode.org/charts/PDF/U2000.pdf. Retrieved 2009-05-13. 
  8. ^ Sargent, Murray III (2006-08-29). "Unicode Nearly Plain Text Encoding of Mathematics (Version 2)". Unicode Technical Note #28. Unicode Inc. pp. 19–20. http://www.unicode.org/notes/tn28/tn28-2.html. Retrieved 2009-05-19. 
  9. ^ http://www.w3.org/TR/REC-xml/#AVNormalize
  10. ^ http://www.w3.org/TR/html4/struct/text.html#h-9.1

External links








Got something to say? Make a comment.
Your name
Your email address
Message