The Full Wiki

Underscore: Wikis

Advertisements
  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

From Wikipedia, the free encyclopedia

_

Punctuation

apostrophe ( ' )
brackets ( [ ], ( ), { }, ⟨ ⟩ )
colon ( : )
comma ( , )
dashes ( , , , )
ellipses ( , ... )
exclamation mark ( ! )
full stop/period ( . )
guillemets ( « » )
hyphen ( -, )
question mark ( ? )
quotation marks ( ‘ ’, “ ” )
semicolon ( ; )
slash/stroke ( / )
solidus ( )
Word dividers
spaces ( ) () () ( ) () () ()
interpunct ( · )
General typography
ampersand ( & )
at sign ( @ )
asterisk ( * )
backslash ( \ )
bullet ( )
caret ( ^ )
copyright symbol ( © )
currency generic: ( ¤ )
specific: ฿, ¢, $, , ƒ, , , , £, , ¥, , ,
daggers ( , )
degree ( ° )
ditto mark ( )
inverted exclamation mark ( ¡ )
inverted question mark ( ¿ )
number sign/pound/hash ( # )
numero sign ( )
ordinal indicator (º, ª)
percent (etc.) ( %, ‰, )
pilcrow ( )
prime ( )
registered trademark ( ® )
section sign ( § )
service mark ( )
sound recording copyright symbol ( )
tilde ( ~ )
trademark ( )
underscore/understrike ( _ )
vertical/broken bar, pipe ( |, ¦ )
Uncommon typography
asterism ( )
falsum ( )
index/fist ( )
therefore sign ( )
because sign ( )
interrobang ( )
irony mark/percontation point ( ؟ )
lozenge ( )
reference mark ( )
tie ( )

The underscore [ _ ] (also called understrike, low line, or low dash) is a character that originally appeared on the typewriter and was primarily used to underline words. To produce an underlined word, the word was typed, the typewriter carriage was moved back to the beginning of the word, and the word was overtyped with the underscore character.

This character is sometimes used to create visual spacing within a sequence of characters, where a white space character is not permitted, e.g., in computer filenames, e-mail addresses, and in World Wide Web URLs. Some computer applications will automatically underline text surrounded by underscores: _underlined_ will render underlined. It is often used in ASCII-only media (E-mail, IRC, Instant Messaging) for this purpose. When the underscore is used for emphasis in this fashion, it is usually interpreted as indicating that the enclosed text is underlined or italicised (as opposed to bold, which is indicated by *asterisks*).

The underscore is not the same character as the dash character, although one convention for text news wires is to use an underscore when an em-dash or en-dash is desired, or when other non-standard characters such as bullets would be appropriate. A series of underscores (like _________) may be used to create a blank to be filled in on a form. It is also sometimes used to create a horizontal line, if no other method is available; hyphens and dashes are often used for a similar purpose.

The ASCII value of this character is 95. On the standard US or UK 101/102 computer keyboard it shares a key with the hyphen on the top row, to the right of the 0 key.

Contents

Underscores as diacritic

The underscore is used as a diacritic mark, "combining low line", in some African and Native American languages.

Not to be confused is the combining macron below.

Usage in computing

Advertisements

Origins of underscores in identifiers

In programs of any significant size, there is a need for descriptive (hence multi-word) identifiers, like "previous balance" or "end of file". However, spaces are not typically permitted inside identifiers, as they are treated as delimiters between tokens. Writing the words together as in "endoffile" is not satisfactory because the names often become unreadable. Therefore, the programming language COBOL allowed a hyphen ("-") to be used between words of compound identifiers, as in "END-OF-FILE".

Most programming languages, however, interpret the hyphen as a subtraction operator and do not allow the character in identifier names. The common punched card character sets of the time had no lower-case letters and no special character that would be adequate as a word separator in identifiers. However, by the late 1960s the ASCII character set standard had been established, allowing the designers of the C language to adopt the underscore character "_" as a word joiner. Underscore-separated compounds like "end_of_file" are still prevalent in C programs and libraries.

Programmers working in the tradition of linkage-oriented languages, especially the Unix C tradition (and later C++), had many concerns to address. Early Unix systems (and early personal computers in general) provided linkage models where external identifiers were limited to a short length, often as few as the initial eight characters. Many clashes were possible within the external identifier linkage space which potentially mingles code generated by various high level compilers, runtime libraries required by each of these compilers, compiler generated helper functions, and program startup code, of which some fraction was inevitably compiled from system assembly language. Within this collision domain the underscore character quickly became entrenched as the primary mechanism for differentiating the external linkage space. It was common practice for C compilers to prepend a leading underscore to all external scope program identifiers to avert clashes with contributions from runtime language support. Furthermore, when the C/C++ compiler needed to introduce names into external linkage as part of the translation process, these names were often distinguished with some combination of multiple leading or trailing underscores.

This practice was later codified as part of the C and C++ language standards, in which the use of leading underscores was reserved for the implementation.

A second, independent collision domain was the C preprocessor. The C language preprocessor is unusual in that it does not respect any language-defined scoping model or reserved namespace, not even C language keywords. This problem was generally addressed by writing macros in macro case which mostly mixes upper case letters with dividing underscores:

#define OPEN_FILE_LIMIT  (15)  

Once again the implementation must often supply hidden macros, and once again dressing up these "hidden behind the scenes" identifiers with multiple leading or trailing underscores became accepted practice. As this practice became pervasive on both levels, the underscore gained a cognitive association with system level programming, hidden technicalities, and the messy entrails of language support.

The C language linkage model further complicated matters by not supporting a strong module-level linkage model. In the C language the concept of module was initially rather loose. There was no language distinction between function names intended for linkage to other compilation units and function names intended only for use within a single compilation unit to simplify the implementation. The C language provides the static keyword which makes it possible to hide names from external linkage, but this was rarely employed, as it also obscured these names from most runtime debugging tools.

A common early convention was to use names (often prosaic) consisting mostly of lower case letters and underscores for names in external linkage not intended for use by other translation units such as a local function named count_obscure_piddly_flags and camel case or some variant for primary application calls such as EditSaveFile.

Ruby and Perl use $_ as a special variable described as the "default input and pattern matching space" - any output defaults to that variable, and may be omitted.

See also


Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message