baf78e7 Merge pull request #111 from Cyan4973/dev 6f50184 Updated NEWS acae59a Fixed : default sparse mode disabled on stdout, to support ` >>` redirection scenario reported by Takayuki Matsuoka (#110) 91c1b9a Performance fix : big compression speed boost for clang (+30%) 0fb0392 Merge branch 'dev' of github.com:Cyan4973/lz4 into dev bb22103 Merge pull request #107 from t-mat/issue/103pr2 7d72c0c Added LZF e769a0e Combine unique .lz4 file info ffff9ee Support iterative edit and testing a8fdeb4 Add log output of succeeded decompression to test script 5151c30 Add log of same files and sha1 hash of unique files to versionstest 26065c3 Fixed : LZ4IO exits too early when frame crc not present, reported by Yongwoon Cho (#106) 87e560e minor tests improvements a9ff13a minor test refactor 58b5aad Fixed sparse issue with non seekable streams (#105) 60d657a removed "flush" argument to please Travis's python version cdb136d cosmetic changes, 2nd try c779c80 minor cosmetic changes for Takayuki's testVersions 26021db Merge branch 'dev' of github.com:Cyan4973/lz4 into dev e3b5bf3 Merge pull request #104 from t-mat/issue/103pr2 fe11e0b Issue #103 : Add compatibility test between releases f02c467 Added a few more interoperability tests (32bits vs 64 bits) 05c3f66 Updated a few comments 9607848 Fixed minor typo 45e1995 minor introduction update 7d182b8 Merge pull request #102 from Cyan4973/dev fdd0029 minor parsing update 672bfde Updated comments b4ef93a Fixed typo efbebd2 Added : LZ4_compress_destSize() 1c3e633 Added compilation flag -Wcast-qual 05b0aa6 Updated readme e05088d Updated lz4hc API b4348a4 Fixed minor Visual warning 1171303 Updated streaming examples 1b17bf2 New lz4 API, using LZ4_compress_fast() b495c91 Merge pull request #99 from eberge/dev 791512c Fixed bug 9318 2a974d7 refactored lz4hc 1e751a7 Install the lz4frame.h header in the cmake build ad2dd6d moved lz4frame context types to incomplete typedef c9cbb8f Increased aligment requirement for lz4frame context pointer ef55dfb Modified lz4frame context typedef, to enforce stricter alignment condition 87a1c70 Fixed minor static analyzer warning d6dc0a4 streaming API : Improved ring buffer management bda38fd Merge pull request #98 from eberge/dev a9a24e8 cmake support for AIX, HPUX, Solaris and Windows 9c6fb8b Added LZ4_compress_fast_extState() b05d3d7 Frame content size disabled by default when using LZ4F_compressFrame(), to be in better coherence with the advanced API LZ4F_compress_update() b805d58 Removed obsolete functions from lz4 cli f11afaf Removed LZ4_compress() (obsolete) from lz4 72e6794 Updated LZ4F_freeDecompressionContext(), to provide stage hint as result a01e10d Changed LZ4F compressionLevel from unsigned to signed, in anticipation for LZ4_compress_fast() integration. cbcdd88 Fixed frame concatenation e18aa90 Fixed frametest c035b7a Restored make-lz4 197982e Fixed unfinished frame (issue #75) 409f816 Updated LZ4F_getFrameInfo() behavior, related to uncomplete frame header decoding attempts 47c3040 added --no-sparse test 9fd4f1f Sparse file support is now enabled by default 7644bee test error message in multiple files mode bce2eeb Reclassified some notification messages as errors 9e92bee stronger arm tests 2ed9dcc fix minor "divide by zero" risk 633c1ca fixed minor leak 0ed2e71 Static analyzer generates error codes on bug suspicion 2cf8a19 minor header refactoring 634e4ee Merge pull request #96 from t-mat/improve-pr-95 e328d41 minor optimization for small files 13c6e16 Removed status notification in multiple-files mode d153aaa Add LZ4F_OBSOLETE_ENUM() to describe obsolete enums a430b85 Multiple files decompression refactoring 4e574e7 Updated lz4frame error names d37926b Merge pull request #95 from t-mat/issue/90 240b554 Merge pull request #94 from t-mat/dev 3d46d4b Fix LZ4_DEPRECATED() in lz4hc.h 5f732e1 Merge pull request #91 from t-mat/dev 175890f Issue#90 : Change old enum names to new one 585bab8 Issue#90 : Change old enum to macro to maximize compatibility 081bcca Issue#90 : Add LZ4F_ prefix e1283c7 Fix LZ4_DEPRECATED() for older/non-gcc/clang/MSVC compilers d7298d2 Replace GCC_VERSION with LZ4_GCC_VERSION 9851583 Merge pull request #93 from drcrallen/descriptiveFrameErrors b664a72 Revert "Revert "Add more descriptive frame errors"" 3f4f623 Valgrind tests generate errors 83e350d Merge branch 'dev' of github.com:Cyan4973/lz4 into dev 066e9d3 Merge pull request #92 from Cyan4973/revert-89-descriptiveFrameErrors 5a66527 Revert "Add more descriptive frame errors" 0dc8308 Merge pull request #89 from drcrallen/descriptiveFrameErrors 05a46fc Changes LZ4F_compressBound() definition using NULL prefsPtr to cover worst case instead of default. 348f509 lz4io refactoring eabc6d8 New valgrind test with multiple files 113b150 Fix leak issue with compression of multiple files c64200d Improved performance when compressing a lot of small files cc24124 minor compatibility fixes ccba7a0 Merge pull request #86 from KyleJHarper/origin/r129/multiple_inputs_patch d535214 Add more descriptive frame errors bc28fc1 Merge pull request #87 from t-mat/fix-example2 fd77bad Replace obsolete functions b036eaa Add snprintf macro for MSVC 7f2f1fc Added support for continuation of file compression and decompression if input files are missing. Should more closely match gzip/bzip2/xz and so forth. Also removed a debug print accidentally left in. 0169502 Added new LZ4IO_decompressMultipleFilenames to allow decompression of multiple files with the -m switch added in r128 (ref: google code issue 151). Limitation: will only process files matching LZ4_EXTENSION macro, which for now seems reasonable. da11725 new memory leak test for fullbench using multi-files 2c79887 Shortened tests durations 42e5bc4 Updated badges 2852b9e Fixed issue #84 8f49666 Fixed : minor coverity warning 8a61000 Fixed a few coverity warnings 138673d fixed minor g++ warning cc8d617 Merge pull request #82 from t-mat/add-lz4-prefix 81fdd9d Fixed a few Valgrind warnings ad86910 Add LZ4 prefix to deprecation macros 66b8a4a Fixed : minor Visual warnings 62ed153 Fixed : a few minor coverity warnings 9443f3d Extended obsolete warning messages to lz4hc 973e385 Implemented obsolete warning message be9d248 Update lz4hc API : LZ4_compressHC_safe() a07db74 Clarified lz4frame.h inline doc 8b8e5ef fixed minor sanitize warning c22a0e1 Updated : fuzzer tests can be programmed for a timelength a2864fd Fixed a few minor sanitize warnings 33134fb Added : sanitize test f344fbd Fixed a few warnings from -fsanitize=undefined 2f8a4c3 New LZ4_compress_safe() API 1853622 fixed over-cautious visual warning b41137f minor Makefile test refactoring 327cb04 minor memory leak fix and test 43e0535 fix g++ typecast 61d7416 updated doc 17f8614 added : memtest on fullbench d38b0b6 Merge pull request #73 from funcodeio/dev 979a991 memcpy speed as reference 157a739 Merge pull request #74 from Cyan4973/fastMode dd69902 Removed unused lines. 43eaf8f Merge pull request #72 from fzort/master f72761f new tests for large files with content size support (#70) 90c0104 Added : progress indicator, in fast and decompression modes 1d3ab5d Cygwin has fileno, so there's no need to use _fileno. 78d2dfd fullbench : tests of _limitedOutput variants intentionnally provides less memory space than safe (LZ4_compressBound()) ef7cd83 Fixed issue 160, reported by Eric Berge 28e237e simplified LZ4_compress_limitedOutput() 89eee0d Removed make dist 0615eb4 Stricter tests : treat warnings as errors 76a03c1 simplified LZ4_compress_withState() 6625068 simplified LZ4_compress() 886b199 Modified files rights 7b5e945 Removed Visual 2013 solution, as AppVeyor automated mode only works with a single solution 117ab8c Added : Visual 2013 solution 08b24af Updated Visual 2012 solution : + 3 projects (fullbench, frametest, datagen) a761546 Fix : minor warning under Visual bf146ec Removed .suo & .user files from Visual solutions 7db6678 Restored proper credit 3bba55c Fixed : Windows compilation Added : Appveyor badge 160661c Merge pull request #69 from Cyan4973/dev 8437a0e Fixed : Visual compilation 7c26b03 Updated make dist f174964 Added : Frame documentation in MarkDown format 880381c Removed HTML Frame Format documentation 5b9fb69 minor tweak 4783cb8 Updated readme 4c227a4 Added LZ4_compress_fast() 003af71 Merge pull request #67 from Cyan4973/dev 2a82619 fixed fullbench memory allocation error 6c69dc1 faster compression in 64 bits mode 44793b8 Updated documentation b93f629 changed file name eeb8bea Updated comments on LZ4F_getFrameInfo() 002ec60 restored lz4hc compression ratio 987e78c Merge pull request #66 from Cyan4973/dev 8cb06d5 lz4frame validates contentSize during decompression d5da787 Changed struct member to contentSize 2d4fed5 Merge pull request #65 from Cyan4973/dev ce71b07 converted to markdown friendly syntax 1ba37f3 Reference format doc 5780864 Fixed : Makefile b009767 windows friendly make clean 27f7d06 minor beautifier (make clean) b4755c7 Added : arm cross-compilation test a357f43 Fixed cast-align warnings on 32-bits 4a9335b Added : doc authorship e652285 Merge pull request #63 from t-mat/comment-on-example-directory 2af52a9 Add "Examples" subtree 679afea Add README.md as table of contents 19665c9 Add document for "Line by Line Text Compression" example 438fee9 Add document for "Double Buffer" example a38166b Add document : "Streaming API Basics" 80e71c6 Updated man page : "--[no-]X" 5950f72 Updated tests 6b923d5 Updated long commands, with reverse "--no-" variants d0f8d40 updated dist list 00c3208 Merge pull request #61 from Cyan4973/dev 7f436a1 lz4 cli supports frame content size a28b147 removed useless man pages 7cf4e5c Updates tests & Man pages f02adc7 new long commands 86715b2 Some more tests related to frame content size 7ee7256 frame content size support 7d87d43 Updated lz4io sparse file support (alignment properties) b54d256 minor lz4frame optimisation (no more malloc() on using LZ4F_compressFrame() in fast mode) da9402c minor lz4frame refactoring 859fe3b Updated LZ4 frame format documentation 8edb7f1 Added : Readme into lib directory, to explain what does each file e7fb4d1 lz4 utility supports "pass-through" mode 2a02455 minor refactoring 3a68324 skippable frames support 93849d1 minor CMakeLists update 471eabe Merge pull request #60 from Cyberunner23/master ef029a1 Removed checking of CMAKE_SYSTEM_PROCESSOR when adding -fPIC, breaks when that var is '64bit'. c9a2b14 removed -s command from lz4c specific list of legacy commands 207aafd Added : unlz4 symbolic link to "lz4 -d" 2b55752 changed "make install" default install directory to /usr/local bbcfe21 Added : clang test 8a87769 Fix : static analyzer test on Travis 45b0642 scan-build tests a18fb43 Merge pull request #58 from Cyan4973/sparseFile 248b761 windows sparse file support d11ac40 Improved sparse files support 45a357f Improved sparse file support e38c268 Fixed minor g++ warnings 12ab415 Preliminary support for sparse files e3f33d2 Fixed minor warnings 74a6b14 Merge pull request #57 from alexDarcy/master f2cc4be Updated Cmake configuration for non-gnu compiler ceec6fa g++ compatibility 6b0c39b Updated datagen (can create sparse files) e277511 Merge pull request #54 from t-mat/gc-issue/155 e1d9b59 Fixed : static library (x64 binary) 32a85fc NetBSD compatibility (#48) 488029e Updated : compress multiple files 046bd3a Merge pull request #52 from KyleJHarper/r128/multiple_inputs dcdd628 Fix sentinel bit pattern de5c930 Fix sentinel size miscalculation eed7952 Add GNU coreutil's is_nul() method to isSparse() b372f45 Add Neil's method to isSparse() 01a24af Improve isSparse() 4a5d92b Adjust coding style 97679fa Google Code Issue 155: lz4 cli should support sparse file https://code.google.com/p/lz4/issues/detail?id=155 fa27d23 Added support for multiple input files to act more like other compressors. For example: gzip file1 file2 file3. You can now do: lz4 [args] -m file1 file2 file3. Fixes 151. 67f3b41 Merge pull request #49 from t-mat/msvc-fseeki64 e68d1c9 restored lz4 hc compression ratio 41b6ed3 Replace fseek with _fseeki64 to avoid MSVC's 2GiB barrier 8f4e201 Fix : lz4frame.h within uninstaller 9fd92de Added : Visual project directory git-subtree-dir: src/lz4 git-subtree-split: baf78e7e4dcbdf824a76f990ffeb573d113bbbdb
12 KiB
LZ4 Frame Format Description
###Notices
Copyright (c) 2013-2015 Yann Collet
Permission is granted to copy and distribute this document for any purpose and without charge, including translations into other languages and incorporation into compilations, provided that the copyright notice and this notice are preserved, and that any substantive changes or deletions from the original are clearly marked. Distribution of this document is unlimited.
###Version
1.5.1 (31/03/2015)
Introduction
The purpose of this document is to define a lossless compressed data format, that is independent of CPU type, operating system, file system and character set, suitable for File compression, Pipe and streaming compression using the LZ4 algorithm.
The data can be produced or consumed, even for an arbitrarily long sequentially presented input data stream, using only an a priori bounded amount of intermediate storage, and hence can be used in data communications. The format uses the LZ4 compression method, and optional xxHash-32 checksum method, for detection of data corruption.
The data format defined by this specification does not attempt to allow random access to compressed data.
This specification is intended for use by implementers of software to compress data into LZ4 format and/or decompress data from LZ4 format. The text of the specification assumes a basic background in programming at the level of bits and other primitive data representations.
Unless otherwise indicated below, a compliant compressor must produce data sets that conform to the specifications presented here. It doesn’t need to support all options though.
A compliant decompressor must be able to decompress at least one working set of parameters that conforms to the specifications presented here. It may also ignore checksums. Whenever it does not support a specific parameter within the compressed stream, it must produce a non-ambiguous error code and associated error message explaining which parameter is unsupported.
General Structure of LZ4 Frame format
| MagicNb | F. Descriptor | Block | (...) | EndMark | C. Checksum |
|---|---|---|---|---|---|
| 4 bytes | 3-11 bytes | 4 bytes | 4 bytes |
Magic Number
4 Bytes, Little endian format. Value : 0x184D2204
Frame Descriptor
3 to 11 Bytes, to be detailed in the next part. Most important part of the spec.
Data Blocks
To be detailed later on. That’s where compressed data is stored.
EndMark
The flow of blocks ends when the last data block has a size of “0”. The size is expressed as a 32-bits value.
Content Checksum
Content Checksum verify that the full content has been decoded correctly. The content checksum is the result of xxh32() hash function digesting the original (decoded) data as input, and a seed of zero. Content checksum is only present when its associated flag is set in the frame descriptor. Content Checksum validates the result, that all blocks were fully transmitted in the correct order and without error, and also that the encoding/decoding process itself generated no distortion. Its usage is recommended.
Frame Concatenation
In some circumstances, it may be preferable to append multiple frames, for example in order to add new data to an existing compressed file without re-framing it.
In such case, each frame has its own set of descriptor flags. Each frame is considered independent. The only relation between frames is their sequential order.
The ability to decode multiple concatenated frames within a single stream or file is left outside of this specification. As an example, the reference lz4 command line utility behavior is to decode all concatenated frames in their sequential order.
Frame Descriptor
| FLG | BD | (Content Size) | HC |
|---|---|---|---|
| 1 byte | 1 byte | 0 - 8 bytes | 1 byte |
The descriptor uses a minimum of 3 bytes, and up to 11 bytes depending on optional parameters.
FLG byte
| BitNb | 7-6 | 5 | 4 | 3 | 2 | 1-0 |
|---|---|---|---|---|---|---|
| FieldName | Version | B.Indep | B.Checksum | C.Size | C.Checksum | Reserved |
BD byte
| BitNb | 7 | 6-5-4 | 3-2-1-0 |
|---|---|---|---|
| FieldName | Reserved | Block MaxSize | Reserved |
In the tables, bit 7 is highest bit, while bit 0 is lowest.
Version Number
2-bits field, must be set to “01”. Any other value cannot be decoded by this version of the specification. Other version numbers will use different flag layouts.
Block Independence flag
If this flag is set to “1”, blocks are independent. If this flag is set to “0”, each block depends on previous ones (up to LZ4 window size, which is 64 KB). In such case, it’s necessary to decode all blocks in sequence.
Block dependency improves compression ratio, especially for small blocks. On the other hand, it makes direct jumps or multi-threaded decoding impossible.
Block checksum flag
If this flag is set, each data block will be followed by a 4-bytes checksum, calculated by using the xxHash-32 algorithm on the raw (compressed) data block. The intention is to detect data corruption (storage or transmission errors) immediately, before decoding. Block checksum usage is optional.
Content Size flag
If this flag is set, the uncompressed size of data included within the frame will be present as an 8 bytes unsigned little endian value, after the flags. Content Size usage is optional.
Content checksum flag
If this flag is set, a content checksum will be appended after the EndMark.
Recommended value : “1” (content checksum is present)
Block Maximum Size
This information is intended to help the decoder allocate memory. Size here refers to the original (uncompressed) data size. Block Maximum Size is one value among the following table :
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| N/A | N/A | N/A | N/A | 64 KB | 256 KB | 1 MB | 4 MB |
The decoder may refuse to allocate block sizes above a (system-specific) size. Unused values may be used in a future revision of the spec. A decoder conformant to the current version of the spec is only able to decode blocksizes defined in this spec.
Reserved bits
Value of reserved bits must be 0 (zero). Reserved bit might be used in a future version of the specification, typically enabling new optional features. If this happens, a decoder respecting the current version of the specification shall not be able to decode such a frame.
Content Size
This is the original (uncompressed) size. This information is optional, and only present if the associated flag is set. Content size is provided using unsigned 8 Bytes, for a maximum of 16 HexaBytes. Format is Little endian. This value is informational, typically for display or memory allocation. It can be skipped by a decoder, or used to validate content correctness.
Header Checksum
One-byte checksum of combined descriptor fields, including optional ones.
The value is the second byte of xxh32() : (xxh32()>>8) & 0xFF
using zero as a seed,
and the full Frame Descriptor as an input
(including optional fields when they are present).
A wrong checksum indicates an error in the descriptor.
Header checksum is informational and can be skipped.
Data Blocks
| Block Size | data | (Block Checksum) |
|---|---|---|
| 4 bytes | 0 - 4 bytes |
Block Size
This field uses 4-bytes, format is little-endian.
The highest bit is “1” if data in the block is uncompressed.
The highest bit is “0” if data in the block is compressed by LZ4.
All other bits give the size, in bytes, of the following data block (the size does not include the block checksum if present).
Block Size shall never be larger than Block Maximum Size. Such a thing could happen for incompressible source data. In such case, such a data block shall be passed in uncompressed format.
Data
Where the actual data to decode stands. It might be compressed or not, depending on previous field indications. Uncompressed size of Data can be any size, up to “block maximum size”. Note that data block is not necessarily full : an arbitrary “flush” may happen anytime. Any block can be “partially filled”.
Block checksum
Only present if the associated flag is set. This is a 4-bytes checksum value, in little endian format, calculated by using the xxHash-32 algorithm on the raw (undecoded) data block, and a seed of zero. The intention is to detect data corruption (storage or transmission errors) before decoding.
Block checksum is cumulative with Content checksum.
Skippable Frames
| Magic Number | Frame Size | User Data |
|---|---|---|
| 4 bytes | 4 bytes |
Skippable frames allow the integration of user-defined data into a flow of concatenated frames. Its design is pretty straightforward, with the sole objective to allow the decoder to quickly skip over user-defined data and continue decoding.
For the purpose of facilitating identification, it is discouraged to start a flow of concatenated frames with a skippable frame. If there is a need to start such a flow with some user data encapsulated into a skippable frame, it’s recommended to start with a zero-byte LZ4 frame followed by a skippable frame. This will make it easier for file type identifiers.
Magic Number
4 Bytes, Little endian format. Value : 0x184D2A5X, which means any value from 0x184D2A50 to 0x184D2A5F. All 16 values are valid to identify a skippable frame.
Frame Size
This is the size, in bytes, of the following User Data (without including the magic number nor the size field itself). 4 Bytes, Little endian format, unsigned 32-bits. This means User Data can’t be bigger than (2^32-1) Bytes.
User Data
User Data can be anything. Data will just be skipped by the decoder.
Legacy frame
The Legacy frame format was defined into the initial versions of “LZ4Demo”. Newer compressors should not use this format anymore, as it is too restrictive.
Main characteristics of the legacy format :
- Fixed block size : 8 MB.
- All blocks must be completely filled, except the last one.
- All blocks are always compressed, even when compression is detrimental.
- The last block is detected either because it is followed by the “EOF” (End of File) mark, or because it is followed by a known Frame Magic Number.
- No checksum
- Convention is Little endian
| MagicNb | B.CSize | CData | B.CSize | CData | (...) | EndMark |
|---|---|---|---|---|---|---|
| 4 bytes | 4 bytes | CSize | 4 bytes | CSize | x times | EOF |
Magic Number
4 Bytes, Little endian format. Value : 0x184C2102
Block Compressed Size
This is the size, in bytes, of the following compressed data block. 4 Bytes, Little endian format.
Data
Where the actual compressed data stands. Data is always compressed, even when compression is detrimental.
EndMark
End of legacy frame is implicit only. It must be followed by a standard EOF (End Of File) signal, wether it is a file or a stream.
Alternatively, if the frame is followed by a valid Frame Magic Number, it is considered completed. It makes legacy frames compatible with frame concatenation.
Any other value will be interpreted as a block size, and trigger an error if it does not fit within acceptable range.
Version changes
1.5.1 : changed format to MarkDown compatible
1.5 : removed Dictionary ID from specification
1.4.1 : changed wording from “stream” to “frame”
1.4 : added skippable streams, re-added stream checksum
1.3 : modified header checksum
1.2 : reduced choice of “block size”, to postpone decision on “dynamic size of BlockSize Field”.
1.1 : optional fields are now part of the descriptor
1.0 : changed “block size” specification, adding a compressed/uncompressed flag
0.9 : reduced scale of “block maximum size” table
0.8 : removed : high compression flag
0.7 : removed : stream checksum
0.6 : settled : stream size uses 8 bytes, endian convention is little endian
0.5: added copyright notice
0.4 : changed format to Google Doc compatible OpenDocument