| 98th | Top file formats |
![]() |
|
| Filename extension | .djvu, .djv |
|---|---|
| Internet media type | image/vnd.djvu |
| Type code | DJVU |
| Developed by | AT&T Research |
| Initial release | 1996 |
| Latest release | Version 27[1] / July, 2006 |
| Type of format | Image file formats |
| Website | www.djvu.org |
DjVu (pronounced like déjà vu) is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, and photographs. It uses technologies such as image layer separation of text and background/images, progressive loading, arithmetic coding, and lossy compression for bitonal (monochrome) images. This allows for high-quality, readable images to be stored in a minimum of space, so that they can be made available on the web.
DjVu has been promoted as an alternative[2] to PDF, as it gives smaller files than PDF for most scanned documents. The DjVu developers report[3] that color magazine pages compress to 40–70 kB, black and white technical papers compress to 15–40 kB, and ancient manuscripts compress to around 100 kB; a satisfactory JPEG image typically requires 500 kB. Like PDF, DjVu can contain an OCR text layer, making it easy to perform cut and paste and text search operations.
Contents |
The DjVu technology was originally developed[3] by Yann LeCun, Léon Bottou, Patrick Haffner, and Paul G. Howard at AT&T Laboratories in 1996.
Due to the high compression ratio and ease of which large volumes of texts can be converted into .djvu format, a large number of academic texts that are being circulated on the warez scene are also in .djvu format, with pdf files a close second[citation needed].
|
|
DjVu divides a single image into many different images, then compresses them separately. To create a DjVu file, the initial image is first separated into three images: a background image, a foreground image, and a mask image. The background and foreground images are typically lower-resolution color images (e.g., 100dpi); the mask image is a high-resolution bilevel image (e.g., 300dpi) and is typically where the text is stored. The background and foreground images are then compressed using a wavelet-based compression algorithm named IW44[3]. The mask image is compressed using a method called JB2 (similar to JBIG2). The JB2 encoding method identifies nearly-identical shapes on the page, such as multiple occurrences of a particular character in a given font, style, and size. It compresses the bitmap of each unique shape separately, and then encodes the locations where each shape appears on the page. Thus, instead of compressing a letter "e" in a given font multiple times, it compresses the letter "e" once (as a compressed bit image) and then records every place on the page it occurs.
Optionally, these shapes may be mapped to ASCII codes (either by hand or potentially by a text recognition system), and stored in the DjVu file. If this mapping exists, it is possible to select and copy text.
DjVu is a free file format[2].
In 2002, the DjVu file format was chosen by the Internet Archive as the format in which its Million Book Project provides scanned public domain books online (along with TIFF and PDF).[4]
The file format specification is published as well as source code for the reference library[citation needed].
The ownership rights to the commercial development of the encoding software have been transferred to different companies over the years, including AT&T, LizardTech, Celartem and Caminova.
The original authors maintain a GPLed implementation named "DjVuLibre".
|
||||||||
| DjVu | |
|---|---|
| File extension: | .djvu, .djv |
| MIME type: | image/vnd.djvu |
| Type code: | DJVU |
| Developed by: | AT&T Research |
| Type of format: | Image file formats |
DjVu (pronounced like déjà vu) is a computer file format. It is made mostly to store scanned documents. It is especially used for things with a mix of words, line drawings, and photographs inside. DjVu has been sold as an alternative[1] to PDF, as it gives smaller files than PDF for most scanned documents. The DjVu developers report[2] that color magazine pages make smaller to 40–70 kB. Black and white technical papers make it smaller to 15–40 kB. Old manuscripts make smaller to around 100 kB; a satisfactory JPEG image usually needs 500 kB. Like PDF, DjVu can have a OCR text layer. This makes it easy to cut and paste, and search for text.
|
|