Define Compressed Data


Purpose

The Define Compressed Data function contains a compressed WHIP! data stream of opcode/operand pairs. Syntax
 
Opcode format Opcode Operand Format Comments
Extended Binary 0x0010 [<D>]*} Starts a compressed WHIP! data stream.
- 0x0123 [<D>]*} Obsolete opcode for DWF files of version v00.30 or lower. Starts a compressed DWF stream.
Extended Binary 0x0011 [<D>]*} Starts a zlib compressed WHIP! data stream.

Notes

In order to read/write WHIP! data, the user needs the zlib source. See ZLIB compression documentation and source code. ZLIB has also been submitted and is available as RFC 1950. Details This opcode allows for data compression in WHIP! data. The operand of the Define Compressed Data opcode is simply a compressed version of a normal WHIP! data stream of opcodes and operands. A WHIP! data reading application should decode the compressed data (the input stream) into an uncompressed form (the output stream) and execute the opcodes contained therein. In the process of decoding, a "history buffer" is maintained of the last 65,536 bytes that were transferred into the output stream.

Compression is achieved by finding repeating sequences of data and substituting shorter codes for these repeating patterns. WHIP! data’s use of relative logical coordinates for many drawing opcodes increases the number of repeating data sequences that may be compressed.

In the encoded stream, shown repeating in figure 1, each square represents one byte of data.

Figure 1. The encoded stream

Following the Define Compressed Data opcode, the operand (the compressed data stream) begins with a compression code byte (see figure 1), which is divided into two 4-bit parts. The lower four bits of the compression code, called the literal data run length, indicate how many bytes of literal data will follow, while the upper four bits, called the compressed data run length, indicate how many bytes of the original data stream were compressed by replacing them with an offset code.

The range of the four-bit literal data run length value includes the following:

For each literal byte in the input stream’s sequence, if there are any, the decompressor simply copies the byte to the output stream directly, and updates the history buffer accordingly.

The range of the four-bit compressed data run length value includes the following:

Following the offset code, if any, is yet another compression code byte that repeats the entire decompression sequence.

To decompress the data to the output stream, the offset code is used as an index into the "history buffer". The WHIP! data reading application simply copies the specified number of bytes from the history buffer into the output stream starting at the position indicated by the offset code, where an offset code of zero selects the youngest byte in the history buffer. Just as with literal data, as bytes are copied to the output stream they should also be added to the end of the history buffer.

The entire repeating sequence of compression data is terminated whenever a compression code has a value of zero, that is, when both the compressed data run length value is zero and the literal data run length value is zero. This zero valued compression code terminates the operand of the Define Compressed Data opcode, and is followed by the normal extended binary termination character, "}".

Example Table 1 shows an example of how the string "SHE SELLS SEASHELLS BY THE SEASHORE" would be compressed:
Table 1. String compression example


Original ASCII (Hex) Compressed (Hex) Comments
S 53 2F Compression code, 5 compressed bytes, and an "extended" number of literal bytes.
H 48 00 Extended literal run length of 15.
E 45 53 S
- 20 48 H
S 53 45 E
E 45 20 -
L 4C 53 S
L 4C 45 E
S 53 4C L
- 20 4C L
S 53 53 S
E 45 20 -
A 41 53 S
S 53 45 E
H 48 41 A
E 45 53 S
L 4C 48 H
L 4C 09 Two byte (little-endian) offset code of 0009 into the history buffer.
S 53 00 -
- 20 24 Compression code, 5 compressed bytes and 4 literal bytes.
B 42 42 B
Y 59 59 Y
- 20 20 -
T 54 54 T
H 48 16 Two byte (little-endian) offset code of 0016 into the history buffer.
E 45 00 -
- 20 06 Compression code, 0 compressed bytes, 6 literal bytes.
S 53 41 A
E 45 53 S
A 41 48 H
S 53 4F O
H 48 52 R
O 4F 45 E
R 52 00 Compression code terminator (zero compressed bytes, zero literal bytes).
E 45 7D } character (Extended binary opcode trailer).
Note: longer strings would, of course, get much higher compression ratios.