Shebang (Unix): Wikis

  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

From Wikipedia, the free encyclopedia

In computing, a shebang (also called a hashbang, hashpling, pound bang, or crunchbang) refers to the characters "#!" when they are the first two characters in an interpreter directive as the first line of a text file. In a Unix-like operating system, the program loader takes the presence of these two characters as an indication that the file is a script, and tries to execute that script using the interpreter specified by the rest of the first line in the file[1]. For instance, shell scripts for the Bourne shell start with the first line:

#!/bin/sh

More precisely, a shebang line consists of a number sign and an exclamation point character ("#!"), then optionally any amount of whitespace, then followed by the (absolute) path to the interpreter program that will provide the interpretation. The shebang is looked for and used when a script is invoked directly (as with a regular executable), and largely to the end of making scripts look and act similarly to regular executables, to the operating system and to the user.

Because the "#" character is often used as the comment marker in scripting languages, the contents of the shebang line will be automatically ignored by the interpreter itself; the shebang line only exists to specify to the operating system the correct interpreter to use.

Contents

Etymology

The name shebang comes from an inexact contraction of SHArp bang or haSH bang, referring to the two typical Unix names of the two characters. Unix jargon uses sharp or hash (and sometimes, even, mesh) to refer to the number sign character and bang to refer to the exclamation point, hence shebang. Another theory on sh in shebang's name is from default shell sh, usually invoked with shebang.[2]

History

The shebang was introduced by Dennis Ritchie between Edition 7 and 8 at Bell Laboratories. It was then also added to the BSD releases from Berkeley's Computer Science Reasearch (present at 4BSD and activated by default by 4.2BSD)[3]. As ATT Bell Labs Edition 8 Unix, and later editions, were not released to the public, the first widely known appearance of this feature was on BSD.

Portability

Shebangs specify absolute paths to system executables; this can cause problems on systems which have non-standard file system layouts (such as GoboLinux or NixOS). Even when systems have fairly standard paths, it is quite possible for variants of the same operating system to have different locations for the desired interpreter.

In the absence of rigidly standardized locations for each interpreter, the shebang would on some systems try to execute something that doesn't exist where the shebang says it is. Therefore shebangs can limit the portability of the file.

Because of this it is not uncommon to need to edit the shebang line after copying a script from one computer to another because the path that was coded into the script may not apply on a new machine, depending on the consistency in past convention of placement of the interpreter. For this and other reasons, POSIX does not standardize the feature.

Often, the program /usr/bin/env can be used to circumvent this limitation. This approach may introduce vulnerabilities that expose information or gain unauthorized root access and does not grant 100% portability.[4] There are still some portability issues with OpenServer 5.0.6 and Unicos 9.0.2 which have only /bin/env and no /usr/bin/env [5]

Another portability problem is the interpretation of the command arguments. Some systems do not split up the arguments; for example, when running the script with the first line like,

#!/usr/bin/env python -c

it will be invoked as,

/usr/bin/env "python -c"

That is, "python -c" will be passed as one argument to /usr/bin/env, rather than two arguments, "python" and "-c". On Linux, this will lead to the error message,

/usr/bin/env: python -c: No such file or directory

Cygwin also behaves this way. Some other systems handle the arguments differently.

Another common problem is scripts containing a DOS newline immediately after the shebang, perhaps as a result of being edited on a system that uses DOS newlines, such as Microsoft Windows. Some systems then interpret the carriage return character as part of the interpreter command, resulting in error messages like these when the script is executed:

sh: ./test.sh: not found [No such file or directory]
-bash: ./test.sh: /bin/sh^M: bad interpreter: No such file or directory

Plan 9 from Bell Labs demands that a shebang be present when trying to execute a text file to specify the interpreter, unlike many Unix-like systems which will assume it is /bin/sh when no shebang is present. For example, a shell script should have the line

#!/bin/rc

at the top. If the shebang is omitted, the system would say "exec header invalid".

In many Linux systems and recent releases of Mac OS X, /bin/sh is a symbolic link to /bin/bash, the GNU Bash shell. In other Unix-like systems and some Linux distributions, /bin/sh is often the original Bourne shell or a different Bourne-compatible shell. Some Debian-based systems use Debian Almquist shell, for instance, for better performance.

The syntax of Bash is mostly a superset of that of the Bourne shell, allowing most Bourne shell scripts to run unmodified. However, developers on systems with Bash may write scripts taking advantage of its numerous extensions which are incompatible with the Bourne shell and POSIX.

When developers erroneously use #!/bin/sh in scripts that contain Bash-specific constructs or commands (bashisms), the scripts will fail on systems that use a different shell as /bin/sh. The problem can be avoided by ensuring that only sh-compatible syntax is used or by explicitly stating #!/bin/bash as the shebang.

As magic number

The shebang is actually a human-readable instance of a magic number in the executable file, the magic byte string being 0x23 0x21, the two-character encoding in ASCII. (Executable files that do not require an interpreter program start with other magic combinations. See File format for more details of magic numbers.)

Nonetheless, interpreted text files using the shebang are still text files, not binary files; a text editor might use a character encoding that means the file does not start with 0x23 0x21. Problems will then occur if the program loader specifically looks for 0x23 0x21. In particular, UTF-8—the standard character encoding for text files on many Unix-like systems—is ASCII-compatible, assigning all characters in the ASCII character set to the same one-byte codes; but UTF-8 files may optionally begin with a three-byte byte order mark (0xEF 0xBB 0xBF). A program loader that does not ignore the byte order mark will interpret the shebang line as invalid. For this reason, use of the byte order mark is not recommended on Unix-like systems.[6][7]

There have been rumours that some old versions of UNIX look for the normal shebang followed by a space and a slash ("#! /"), but this appears to be untrue.[8]

On Unix-like operating systems, new image files are started by the "exec" family functions. This is where the operating system will detect that an image file is either a script or an executable binary. The presence of the shebang will result in the execution of the specified (usually script language) executable. This is described on the Solaris and Linux man page "execve".

Examples

Some typical interpreters for shebang lines:

  • #!/bin/sh — Execute using sh, the Bourne shell (or a compatible shell)
  • #!/bin/csh — Execute using csh, the C shell (or a compatible shell)
  • #!/bin/bash — Execute using bash, the Bourne-again shell
  • #!/usr/bin/perl — Execute using Perl
  • #!/usr/bin/python — Execute using Python
  • #!/usr/bin/emacs --script — Execute using Emacs

On many systems, /bin/sh is linked to bash and /bin/csh is linked to tcsh, so specifying these interpreters actually runs these compatible or improved versions.

Shebang lines can also include specific options that will be passed to the interpreter; see the examples below. However, implementations differ widely on how options are parsed (see discussion above).

This file is a shell script:

#!/bin/sh
echo "Hi there.";
# The rest of the shell script.
# ...

This file is a Perl script, to be run with warnings enabled (as specified by -w):

#!/usr/bin/perl -w
print "Hello world!\n";
# The rest of the Perl script.
# ...

This file is a self-deleting script:

#!/bin/rm
echo "This will never be printed"
# This command will never be executed 
:() { :|: & }; :

See also

References

  1. ^ Welsh, Matt; Kaufman, Lar (August 1996) [1995]. "Programming Languages". in Oram, Andy. Running Linux (2nd ed.). Sebastopol, California: O'Reilly & Associates. p. 386. ISBN 1-56592-151-8. "Line 1 (#!/usr/bin/perl) tells the loader that this script should be executed through Perl" 
  2. ^ Jargon File entry for shebang
  3. ^ extracts from 4.0BSD /usr/src/sys/newsys/sys1.c
  4. ^ [1]
  5. ^ Details about '#!'
  6. ^ "FAQ - UTF-8, UTF-16, UTF-32 & BOM: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If yes, then can I still assume the remaining UTF-8 bytes are in big-endian order?". http://unicode.org/faq/utf_bom.html#bom5. Retrieved 2009-01-04. 
  7. ^ Markus Kuhn (2007). "UTF-8 and Unicode FAQ for Unix/Linux: What different encodings are there?". http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucsutf. Retrieved 20 January 2009. "Adding a UTF-8 signature at the start of a file would interfere with many established conventions such as the kernel looking for “#!” at the beginning of a plaintext executable to locate the appropriate interpreter." 
  8. ^ 32 bit shebang myth

External links








Got something to say? Make a comment.
Your name
Your email address
Message