The Full Wiki

7z: Wikis

Advertisements
  

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

From Wikipedia, the free encyclopedia

7z
Filename extension .7z
Internet media type application/x-7z-compressed
Developed by Igor Pavlov
Type of format Data compression
Free file format? Yes: GNU Lesser General Public License

7z is a compressed archive file format that supports several different data compression, encryption and pre-processing filters. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. The LZMA SDK 4.62 was placed in the public domain in December 2008. The latest version of 7-Zip and LZMA SDK is version 4.65.

The MIME type of 7z is application/x-7z-compressed.

The official 7z file format specification is distributed with 7-Zip's source code. The specification can be found in plain text format in the doc\ subdirectory of the source code distribution.

Contents

Features and enhancements

The 7z format provides the following main features:

  • Open, modular architecture which allows any compression, conversion, or encryption method to be stacked.
  • High compression ratios (depending on the compression method used)
  • Strong Rijndael/AES-256 encryption.
  • Large file support (up to approximately 16 exabytes).
  • Unicode file names
  • Support for solid compression, where multiple files of like type are compressed within a single stream, in order to exploit the combined redundancy inherent in similar files.
  • Compression and encryption of archive headers.

The format's open architecture allows additional future compression methods to be added to the standard.

Advertisements

Compression method filters

The following compression methods are currently defined:

  • LZMA – A variation of the LZ77 algorithm, using a sliding dictionary up to 4 GB in length for duplicate string elimination. The LZ stage is followed by entropy coding using a Markov chain based range coder and binary trees.
  • LZMA2 - modified version of LZMA. it provides the following advantages: less expansion of incompressible data,[1] better multithreading support.
  • Bzip2 – The standard Burrows–Wheeler transform algorithm. Bzip2 uses two reversible transformations; BWT, then Move to front with Huffman coding for symbol reduction (the actual compression element).
  • PPMd – Dmitry Shkarin's 2002 PPMdH (PPMII/cPPMII) with small changes: PPMII is an improved version of the 1984 PPM compression algorithm (prediction by partial matching).
  • DEFLATE – Standard algorithm based on 32 kB LZ77 (LZSS actually) and Huffman coding. Deflate is found in several file formats including ZIP, gzip, PNG and PDF. 7-Zip contains a from-scratch DEFLATE encoder that frequently beats the de facto standard zlib version in compression size, but at the expense of CPU usage.

A suite of recompression tools called AdvanceCOMP contains a copy of the DEFLATE encoder from the 7-Zip implementation; these utilities can often be used to further compress the size of existing gzip, ZIP, PNG, or MNG files.

Pre-processing filters

The LZMA SDK comes with the BCJ / BCJ2 preprocessor included, so that later stages are able to achieve greater compression: For x86, ARM, PowerPC (PPC), IA64 and ARM Thumb processors, jump targets are normalized before compression by changing relative position into absolute values. For x86, this means that near jumps, calls and conditional jumps (but not short jumps and conditional jumps) are converted from the machine language "jump 1655 bytes backwards" style notation to normalized "jump to address 5554" style notation.

  • BCJ - Converter for 32-bit x86 executables. Normalise target addresses of near jumps and calls from relative distances to absolute destinations.
  • BCJ2 - Pre-processor for 32-bit x86 executables. BCJ2 is an improvement on BCJ, adding additional x86 jump/call instruction processing. Near jump, near call, conditional near jump targets are split out and compressed separately in another stream.
  • Delta encoding - delta filter, basic preprocessor for multimedia data.

Similar executable pre-processing technology is included in other software; the RAR compressor features displacement compression for 32-bit x86 executables and IA64 Itanium executables, and the UPX runtime executable file compressor includes support for working with 16 bit values within DOS binary files.

Encryption

The 7z format supports encryption with the AES algorithm with a 256-bit key. The key is generated from a user-supplied passphrase using an algorithm based on the SHA-256 hash algorithm. The SHA-256 is executed 218 (262144) times[2] which causes a significant delay on slow PCs before compression or extraction starts. This technique is called key strengthening and is used to make a brute-force search for the passphrase more difficult. The 7z format provides the option to encrypt the filenames of a 7z archive.

Limitations

The 7z format does not store UNIX owner/group permissions, and hence can be inappropriate for backup/archival purposes. A workaround is to convert data to a tar bitstream before compressing with 7z.

The 7z format does not allow extraction of some "broken files" — that is (for example) if one has the first segment of a series of 7z files, 7z cannot give the start of the files within the archive — it must wait until all segments are downloaded. The format 7z also lacks recovery records, which might be a problem when limited file corruption has occurred.

References

  1. ^ Collin, Lasse. "lzma.h". LZMA source code header file. lines 36–38. http://www.google.com/codesearch/p?hl=en#iR1SCQLM-vQ/src/base-prerelease/R-latest.tar.gz%7Cec1_gA5QXsk/R-rc/src/extra/xz/api/lzma/lzma.h. Retrieved 2010-01-03. "Compared to LZMA1, LZMA2 adds support for LZMA_SYNC_FLUSH, uncompressed chunks (smaller expansion when trying to compress uncompressible data), possibility to change lc/lp/pb in the middle of encoding, and some other internal improvements." 
  2. ^ 7-zip source code

Salomon, David (2007). Data compression: the complete reference. Springer. p. 241. ISBN 1846286026. 

See also

External links


Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message