Null character: Wikis

  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

From Wikipedia, the free encyclopedia

The null character (also null terminator) is a character with the value zero, present in the ASCII and Unicode character sets, and available in nearly all mainstream programming languages.[1]

The original meaning of this character was like NOP — when sent to a printer or a terminal, it does nothing (some terminals, however, incorrectly display it as space). On punched tape this character is represented with no holes at all, so a new unpunched tape is initially filled with null characters, and often text could be "inserted" at a reserved space of null characters by punching the new characters into the tape over the nulls.

Today the character has much more significance in C and its derivatives and in many data formats, where it serves as a reserved character used to signify the end of a string[2], often called a null-terminated string[3]. This allows the string to be any length with only the overhead of one byte, while the alternative of storing a count requires either a string length limit of 255 or an overhead of more than one byte.

Contents

Representation

The null character is often represented as the escape sequence \0 in source code. Care must be taken with a digit following this escape: in C up to two octal digits are consumed and "\012" is in fact the byte with the octal value 128. This has often tripped up programmers, and is a common bug in programs designed to output source code. A safe sequence that always works in C is "\000", yet many other languages will treat that a null byte followed by two ASCII '0' characters. A few (mostly obsolete) languages will consume all following digits, pretty much requiring all ASCII digits after the null to be escaped as well!

Other source code sequences that often work are \x00 or the Unicode representation \u0000.

In caret notation for control characters the null character is ^@. Since ^A is character 1, then 0 must use the ASCII character before 'A' which is '@'. In fact on many keyboards you can type a null character by holding down Ctrl and pressing @ (which usually required also holding Shift and pressing another key such as 2 or P). It is also common to be able to type a null with Ctrl 2 or Ctrl space.

In documentation it is often represented as a single-em-width symbol containing the letters "NUL". In Unicode, there is a character with a corresponding glyph for visual representation of the null character, "symbol for null", U+2400 () — not to be confused with the actual null character, U+0000.

Security exploit: Poison null byte

"Poison null byte" was originally used by Olaf Kirch in a Bugtraq post in October 1998. It was further explored in Phrack[4]. Actual exploits using this are much older.

The "poison null byte" exploit takes advantage of how strings with a known length can contain null bytes and what happens when that string is converted for use with an API that uses null terminated strings. The end result is that by carefully placing a null byte in the string, the attacker is able to force the string to end at that point, even after the application has appended more characters to the string, like for example, a filename extension. Some examples of poison null byte usages includes:

  • Terminating a file name string, such as removing a mandatory file extension.
  • Terminating/commenting a SQL statement when executing code dynamically, such as Oracle EXECUTE IMMEDIATE.

Typically, the "poison null byte" is exploited along with another type of exploit such as directory traversal or SQL injection; poison null byte is often used to simplify or enhance other attacks.

References

  1. ^ "A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string literal." — ANSI/ISO 9899:1990 (the ANSI C standard), section 5.2.1
  2. ^ "A string is a contiguous sequence of characters terminated by and including the first null character" — ANSI/ISO 9899:1990 (the ANSI C standard), section 7.1.1
  3. ^ "A null-terminated byte string, or NTBS, is a character sequence whose highest-addressed element with defined content has the value zero (the terminating null character)" — ISO/IEC 14882 (the ISO C++ standard), section 17.3.2.1.3.1
  4. ^ Issue 55, article 7

See also

You can see examples here: Nullbyte








Got something to say? Make a comment.
Your name
Your email address
Message