The Full Wiki

File extension: Wikis

Advertisements

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

Advertisements
(Redirected to Filename extension article)

From Wikipedia, the free encyclopedia

A filename extension is a suffix to the name of a computer file applied to indicate the encoding convention (file format) of its contents.

In some operating systems (for example Unix) it is optional, while in some others (such as DOS) it is a requirement. Some operating systems limit the length of the extension (such as DOS and OS/2, to three characters) while others (such as Unix) do not. Some operating systems (for example RISC OS) do not use filename extensions. Unix accepts the separator dot as a legal character but does not give it a special recognition on the OS level.

Contents

Usage

Filename extensions can be considered a type of metadata. They are commonly used to infer information about the way data might be stored in the file. The exact definition, giving the criteria for deciding what part of the file name is its extension, belongs to the rules of the specific filesystem used; usually the extension is the substring which follows the last occurrence, if any, of the dot character (e.g. txt is the extension of the filename readme.txt, html the extension of mysite.index.html). On file systems of mainframe systems such as MVS, VMS, and PC systems such as CP/M and derivative systems such as MS-DOS, the extension is a separate namespace from the filename. Under Microsoft's DOS and Windows, some extensions, including EXE, COM, BAT, CMD, and VBS indicate that a file is an executable program. This is different from Unix-like operating systems, where a suffix is not a separate namespace, and where even having a suffix is voluntary, as file system permissions are used to decide whether a file is executable.

With the advent of graphical user interfaces, the issue of file management and interface behavior arose. Microsoft Windows allowed multiple applications to be associated with a given extension, and different actions were available for selecting the required application, such as a context menu offering a choice between viewing, editing or printing the file.

Pre-OS X versions of the Mac OS disposed of filename extensions entirely, instead using a file type code to identify the file format. Additionally, a creator code was specified to determine which application would be launched when the file's icon was double-clicked. Mac OS X, however, uses filename suffixes, as well as type and creator codes, as a consequence of being derived from the Unix-like NEXTSTEP operating system, which did not have type or creator code support in its file system.

Historical limitations

Filename extensions were used in Digital Equipment Corporation (DEC) operating systems (for example, TOPS-10, OS/8 and RT-11). CP/M adopted the convention and MS-DOS, as a re-implementation of CP/M, did so as well.

The DEC operating systems internally split the filename into a "base name" and a filename extension, with the "base name" limited to five to eight characters (initially six in TOPS-10 and RT-11, and nine in RSX and VMS) and the extension limited to two or three characters; when a filename/filename extension combination was typed in commands, a dot (.) was placed between the filename and filename extension. CP/M worked the same way; the filename was limited to eight characters and the filename extension was limited to three characters, with a dot between them. Early versions of the FAT filesystem used in MS-DOS and Microsoft Windows imposed the same limitations. This is sometimes referred to as the 8.3 filename convention, and since the word basename is eight letters long and ext is a reasonable abbreviation for extension, it can be generalized as:

BASENAME.EXT

When doing a file listing, the base name and extension would be separated by spaces:

Volume in drive A: is LINUX BOOT 
 Volume Serial Number is 2410-07EF
 Directory for A:\

 LDLINUX  SYS      5480 1999-04-19  23:24 
 VMLINUZ         530921 1999-04-19  23:24 
 BOOT     MSG       559 1999-04-19  23:24 
 EXPERT   MSG       668 1999-04-19  23:24 
 GENERAL  MSG       986 1999-04-19  23:24 
 KICKIT   MSG       979 1999-04-19  23:24 
 PARAM    MSG       875 1999-04-19  23:24 
 RESCUE   MSG      1020 1999-04-19  23:24 
 SYSLINUX CFG       420 1999-04-19  23:24 
 INITRD   IMG    878502 1999-04-19  23:24 
        10 files           1,420,410 bytes
                              35,840 bytes free

Improvements

The filename extension was originally used to easily determine the file's generic type. The need to condense a file's type into three characters frequently led to inscrutable extensions. Examples include using .GFX for graphics files, .TXT for plain text, and .MUS for music. However, because many different software programs have been made that all handle these data types (and others) in a variety of ways, filename extensions started to become closely associated with certain products—even specific product versions. For example, early WordStar files used .WS or .WSn, where n was the program's version number. Also, filename extensions began to conflict between separate files. One example is .rpm, used for both RPM Package Manager packages and RealPlayer Media files;[1] others being .qif, shared by DESQview fonts, Quicken financial ledgers, and QuickTime pictures,[2], and .gba, shared between GrabIt scripts and Game Boy Advance ROM images.[3]

Some other operating systems such as Multics that used filename extensions generally had much more liberal sizes for filenames. Many allowed full filename lengths of 14 or more characters, and maximum name lengths up to 255 were not uncommon. The file systems in operating systems such as Unix stored the file name as a single string, not split into base name and extension components, with the '.' being just another character allowed in file names. Such systems generally allow for variable-length filenames, permitting more than one dot, and hence multiple suffixes. Some components of Multics and Unix, and applications running on them, used suffixes, in some cases, to indicate file types, but they did not use them as much — for example, executables and ordinary text files had no suffixes in their names.

The High Performance File System (HPFS), used in Microsoft and IBM's OS/2 also supported long file names, and did not divide the file name into a name and an extension. The convention of using suffixes continued, even though HPFS supported extended attributes for files, allowing a file's type to be stored with the file as an extended attribute.

Microsoft's Windows NT's native file system, NTFS, supported long file names and did not divide the file name into a name and an extension, but again, the convention of using suffixes to simulate extensions continued, for compatibility with existing versions of Windows.

When the Internet age first arrived, those using Windows systems that were still restricted to 8.3 filename formats had to create web pages with names ending in .HTM, while those using Macintosh or Unix computers could use the recommended .html filename extension. This also became a problem for programmers experimenting with the Java programming language, since it requires source code files to have the four-letter suffix .java and compiles object code output files with the five-letter .class suffix.[4]

Eventually, Windows introduced support for long file names, and removed the 8.3 name/extension split in file names, in an extended version of the commonly used FAT file system called VFAT. VFAT first appeared in Windows NT 3.5 and Windows 95. The internal implementation of long file names in VFAT is largely considered to be a kludge, but it removed the important length restriction, and allowed files to have a mix of upper case and lower case letters, on machines that would not run Windows NT well. However, the use of three-character extensions under Microsoft Windows has continued, originally for backward compatibility with older versions of Windows and now by habit, along with the problems it creates.

Command name issues

The use of a filename extension in a command name appears occasionally, usually as a side effect of the command having been implemented as a script (in Bourne shell, Python, etc.) and the interpreter name being suffixed to the command name, a practice common on systems like Windows and MacOS, which rely on globally-set associations between filename extension and interpreter, but sharply deprecated in Unix-derived systems like Linux and Apple's OS X, where the interpreter is normally specified as a header in the script.

On association-based systems, the filename extension is generally mapped to a single, system-wide selection of interpreter for that extension (such as ".py" meaning to use Python), and the command itself is runnable from the command line even if the extension is omitted (assuming appropriate setup is done). If the implementation language is changed, the command name extension is changed as well, and the OS provides a consistent API by allowing the same extension-less version of the command to be used in both cases. This method suffers somewhat from the essentially global nature of the association mapping, as well as from developers' incomplete avoidance of extensions when calling programs, and that developers can't force that avoidance. Windows is the only remaining widespread employer of this mechanism.

On systems with interpreter directives, command name extensions have no special significance, and are by standard practice not used, since the primary method to set interpreters for scripts is to start them with an single line specifying the interpreter to use (which could be viewed as a degenerate resource fork).

Developers coming from association-based based culture to the interpreter directive culture often make the very distinctive error of including command name extensions. Embedding the implementation detail of the language used introduces a problem where the command's implementation language cannot be changed (for example, from shell to C++) without either breaking any tool that refers to the old script name, or retaining the now inaccurate old extension, both of which are generally considered harmful.

Security issues

The default behavior of Windows Explorer, the file browser provided with Microsoft Windows, is for filename extensions not to be shown. Malicious users have tried to spread computer viruses and computer worms by using file names formed like LOVE-LETTER-FOR-YOU.TXT.vbs. The hope is that this will appear as LOVE-LETTER-FOR-YOU.TXT, a harmless text file, without alerting the user to the fact that it is a harmful computer program, in this case written in VBScript.

Later Windows versions (starting with Windows XP Service Pack 2 and Windows Server 2003) included customizable lists of filename extensions that should be considered 'dangerous' in certain 'zones' of operation, such as when downloaded from the web or received as an e-mail attachment. Modern antivirus software systems also help to defend users against such attempted attacks where possible.

There have been instances of malware crafted to exploit vulnerabilities in some Windows applications which could cause a stack-based buffer overflow when opening a file with an overly long, unhandled filename extension.

See also: Criticism of Microsoft Windows hiding of filename extensions

Alternatives

In network contexts, files are regarded as streams of bits and do not have filenames or extensions.

On the internet, the type of a bitstream is stated as the internet media type of the stream (also called the MIME type or content type). This is given in a line of text preceding the stream, such as:

Content-type: text/plain

BeOS, whose BFS file system supports extended attributes, would tag a file with its internet media type as an extended attribute. The KDE and GNOME desktop environments associate an internet media type with a file by examining both the filename suffix and the contents of the file, in the fashion of the file command, as a heuristic. They choose the application to launch when a file is opened based on that internet media type, reducing the dependency on filename extensions. Mac OS X uses both filename extensions and media types, as well as file type codes, to select a Uniform Type Identifier by which to identify the file type internally.

See also

References

  1. ^ File Extension .RPM Details from filext.com
  2. ^ File Extension .QIF Details from filext.com
  3. ^ File Extension .GBA Details
  4. ^ "javac - Java programming language compiler". Sun Microsystems, Inc.. 2004. http://java.sun.com/j2se/1.5.0/docs/tooldocs/windows/javac.html. Retrieved 2009-05-31. "Source code file names must have .java suffixes, class file names must have .class suffixes, and both source and class files must have root names that identify the class." 

External links


Simple English

A file extension is a way of showing the type of a computer file, and a clue to what program it should be opened with. File extensions are usually three letters long and come after the name of the file.

Some examples of common file extensions are:

  • TXT files are plain text files
  • JPG are picture files in the JPEG format
  • MP3 are music files in the MP3 format
  • MPEG are motion picture encoded video files
  • HTM or *HTML are Hyper Text Markup Language files such as web pages
  • PHP are web server scripts which create web pages
  • ODT are Open Document text files
  • DOC are text documents in Microsoft Word format
  • XLS are Microsoft Excel spreadsheet documents
  • PPT are Microsoft Power Point presentation files
  • EXE are Microsoft Windows executable program files
  • DLL are Dynamic Link Libraries, a type of executable file, in MS Windows
  • ZIP are Compressed files (Lempel Ziff algorithm in an archive)
  • Z are Unix / Linux compressed files
  • bz2 are bzip2 block compressed files (very good compression)
  • jar are Java archive files
  • JSP are Java server pages files
  • java are Java source code files
  • class are Java compiled source code files
  • oc are Java run-time library files
  • tar are Unix / Linux tape archive files
  • sh are Unix / Linux shell script files
  • awk are Unix C like pattern processing language source code files
  • sed are Unix stream editor command files
  • lex are Unix / Linux lexical analyzer C code generator specification files
  • c are C programming language source files
  • o are C programming language compiled files

asp are active server page There are many other commonly used file extensions.


You can change the program defaults so that the computer knows which program to open each type of file extension with.


Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message