The Full Wiki

Lzma: Wikis

Advertisements

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

(Redirected to Lempel-Ziv-Markov chain algorithm article)

From Wikipedia, the free encyclopedia

The Lempel-Ziv-Markov chain algorithm (LZMA) is an algorithm used to perform data compression. It has been under development since 1998[1][2] and is used in the 7z format of the 7-Zip archiver. This algorithm uses a dictionary compression scheme somewhat similar to LZ77 and features a high compression ratio (generally higher than bzip2[3][4]) and a variable compression-dictionary size (up to 4 GiB).[5]

Contents

Overview

LZMA uses an improved and optimized version of the LZ77 compression algorithm, backed by a range encoder. Streams of data, repeated-sequence size and repeated-sequence location seem to be compressed separately.[citation needed]

Algorithm

In LZMA compression, the compressed stream is a stream of bits, encoded using adaptive binary range coder. The stream is divided into packets, each packet describing either a single byte, or an LZ77 sequence with its length and distance implicitly or explicitly encoded.
There are 7 types of packets:[citation needed]

packed code (bit sequence) packet description
0 + byteCode A single byte encoded using an adaptive binary range coder. The range coder uses context based on some number of the most significant bits of the previous byte. Depending on the state machine, this can also be a single byte encoded as a difference from the byte at the last used LZ77 distance.
1+0 + len + dist A typical LZ77 sequence describing sequence length and distance.
1+1+0+0 A one-byte LZ77 sequence. Distance is equal to the last used LZ77 distance.
1+1+0+1 + len An LZ77 sequence. Distance is equal to the last used LZ77 distance.
1+1+1+0 + len An LZ77 sequence. Distance is equal to the second last used LZ77 distance.
1+1+1+1+0 + len An LZ77 sequence. Distance is equal to the third last used LZ77 distance.
1+1+1+1+1 + len An LZ77 sequence. Distance is equal to the fourth last used LZ77 distance.

The length is encoded as follows:

Length code (bit sequence) Description
0+ 3 bits The length encoded using 3 bits, gives the lengths range from 2 to 9.
1+0+ 3 bits The length encoded using 3 bits, gives the lengths range from 10 to 17.
1+1+ 8 bits The length encoded using 8 bits, gives the lengths range from 18 to 273.

The distance is encoded as follows:
First a distance class is encoded using 6 bits. The 5 other bits of the distance code encode the information about how many direct distance bits need to be extracted from the stream.

7-Zip reference implementation

The LZMA implementation extracted from 7-Zip is available as LZMA SDK and has been put by Igor Pavlov under the public domain.[6] It was originally distributed under the terms of both the LGPL and Common Public License, with a special exception for linked binaries. Version 4.61 beta was released into the public domain on November 23 2008. In this version there is also a LZMA2 version available ( Improved version of LZMA)

The reference open source LZMA compression library is written in C++ and has the following properties:

  • Compression speed: approximately 1 MB per second on a 2 GHz CPU
  • Decompression speed: 10-20 MB per second on a 2 GHz CPU
  • Support for multi-threading.

In addition to the original C++, the LZMA SDK contains reference implementations of LZMA compression and decompression ported to ANSI C, C#, and Java.[6] There also exists an implementation in Pascal and Python bindings for the C++ library.[7][8]

This (the 7-Zip) implementation uses several variants of hash chains, binary trees and Patricia tries as the basis for its dictionary search algorithm.

Decompression-only code for LZMA generally compiles to around 5 kiB and the amount of RAM required during decompression is principally determined by the size of the sliding window used during compression. Small code size and relatively low memory overhead, particularly with smaller dictionary lengths, and free source code make the LZMA decompression algorithm well-suited to embedded applications.

Uses

Software that uses or supports LZMA:

  • SQL Backup from Red-Gate. Simple compression and delivery system for SQL Server backups.
  • cramfs and SquashFS, with applied patches
  • Inno Setup
  • Nullsoft Scriptable Install System
  • Peazip, GUI frontend to command-line 7z and POSIX 7z binaries
  • UPX, from version 2.92 (beta) and above, features optional LZMA compression
  • The RPM Package Manager has experimental LZMA support since version 4.6.0[9] and stable support for LZMA/XZ since RPM 4.7[10]
  • PiSi (Packages Installed Successfully, as Intended) Pardus[citation needed]
  • The Slackware distribution switched from tgz (based on gzip) to txz (based on xz) package format from version 13.0
  • The Dragora distribution switched from tbz2 (based on bzip2) to tlz (based on lzip) package format from version 1.1-rc1
  • dpkg, from version 1.13.35.
  • Libarchive, BSD-licensed library used in FreeBSD and Mac OS X[11].
  • GNU Tar, from version 1.20 (using the "--lzma" switch or "--auto-compress" (-a) with the archive's filename ending in .tar.lzma or .tlz).
  • OpenCTM, lossless compression of 3D triangle meshes.
  • It is also a part of the Cygwin distribution for Windows and used there at least by MiKTeX.[12]
  • PowerArchiver
  • WinZip
  • GNU GRUB (version 2), for compressing core.img (only on x86 BIOS platforms).
  • Unity game engine uses LZMA for compressing web player data files.
  • Linux (as the compressed kernel image format, since version 2.6.30)
  • U-Boot bootloader as optional compression method for the kernel image files, since version v2008.10.
  • Coreboot
  • EDK II
  • Simplelinux uses LZMA compression for its core filesystems and module applications.
  • Esenthel Engine uses LZMA for data compression.
  • AVG Download Manager uses the LZMA compression for it's packages to minimize the installation overhead as much as possible

References

  1. ^ The SDK history file states that it was in development from 1996, and first used in 7-zip 2001-08-30. Aside from some unreferenced comments about 1998, the algorithm appears to have been unpublished before its use in 7-zip.
  2. ^ Igor Pavlov has asserted multiple times on SourceForge that the algorithm is his own creation.. "One of many forum posts with this claim.". Archived from the original on 25 August 2009. http://www.webcitation.org/5jImhbWcv. Retrieved 25 August 2009. 
  3. ^ Collin, Lasse (2005-05-31). "A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA". The Tukaani Project. http://tukaani.org/lzma/benchmarks.html. Retrieved 2008-09-02. 
  4. ^ Klausmann, Tobias (2008-05-08). "Gzip, Bzip2 and Lzma compared". Blog of an Alpha animal. http://blog.i-no.de//archives/2008/05/08/index.html#e2008-05-08T16_35_13.txt. Retrieved 2008-09-02. 
  5. ^ Overview of the LZMA format 7z Format
  6. ^ a b http://www.7-zip.org/sdk.html
  7. ^ http://www.birtles.org.uk/programming/
  8. ^ http://www.joachim-bauch.de/projects/python/pylzma/
  9. ^ "RPM 4.6.0 (4.6.0-rc3)". rpm.org. http://rpm.org/wiki/Releases/4.6.0. Retrieved 2008-12-31. 
  10. ^ "RPM 4.7". rpm.org. http://rpm.org/wiki/Releases/4.7.0#XZpayloadcompression. Retrieved 2009-07-15. 
  11. ^ http://code.google.com/p/libarchive/
  12. ^ Schenk, Christian. "Creating a custom package repository". miktex.org. http://docs.miktex.org/packaging/. Retrieved 2008-10-15. 

External links

Advertisements

Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message