Quantum compression format

The Quantum compression format is a somewhat obscure method invented by a company called Cinematronics (written by an author identified as David Stafford, "davids a& cruzio.com"). Rights to it were purchased at some point by Microsoft, and Quantum is one of the possible compression methods in a CAB archive. This document does not describe the Quantum header nor the CAB details, only the compressed data.

Overview

The Quantum compressor is a large-window LZ77-type compressor combined with an integer arithmetic coder. In this it differs from gzip/zlib and LZX which use Huffman compressors instead of arithmetic. An arithmetic coder is typically more efficient, but slower. (As for the arithmetic coding patents, Quantum doesn't use skew coding or bit stuffing — it uses bit-plus-follow, which is apparently not patented. And the models are not those of the patented Q-coder. So you should be reasonably safe, but I Am Not A Lawyer)

Arithmetic basics

The Quantum arithmetic coder is similar to the bitwise coder described under "Practical Arithmetic Coding with Renormalization" at Charles Bloom's site. I will not describe it further here. What differentiates Quantum is the statistical models it uses as input to the coder.

Model Basics

All the models used by the Quantum compressor share the same structure and update technique. All have a table with some number of entries, each entry composed of a symbol and an associated cumulative frequency. The cumulative frequency is the frequency of all the symbols which are at a higher index in the table than that symbol — thus the last entry in the table has a cumulative frequency of 0. Each model also has a total frequency (the frequency of all the symbols in the table), and a number time_to_reorder.

All the models are initialized with the symbols in symbol order in the table, and with every symbol in the table having a frequency of 1. Thus the initial total frequency is equal to the number of entries in the table. The initial time_to_reorder value is 4.

The way the models are updated is as follows: Each time a symbol is encoded or decoded, all the cumulative frequencies up to (but not including) that symbol have 8 added to them. If this causes the total frequency to exceed 3800, time_to_reorder is decremented.

If time_to_reorder has not reached zero, each cumulative frequency in the table, starting with the last, is divided (with truncation) by 2. If this causes the cumulative frequency to become less than or equal to the cumulative frequency of the next entry, the entry cumulative frequency is instead set to the next entry cumulative frequency plus 1.

If time_to_reorder HAS reached zero, the frequencies (not cumulative frequencies) for each entry in the table are computed. Each frequency is divided by 2 with rounding. The table is then sorted by frequency (highest first) using an in-place selection sort (not stable!) and the cumulative frequencies recomputed. The time_to_reorder is set to 50.

Selector and literal models

To start decompressing a Quantum archive, read a symbol from the code stream using the selector model. This is a very simple model of 7 entries consisting of the symbols 0-6. What to do next depends on the value of the selector. If the selector is 0,1,2, or 3, a symbol is read using a literal model. These models encode symbols to be directly output. They are, respectively:

Selector Number of entries Starting symbol
0 64 0
1 64 64
2 64 128
3 64 192

LZ Compression

Selectors 4,5, and 6 implement Lempel-Ziv matches. Selector 4 encodes 3 character matches, selector 5 implements 4 character matches, and selector 6 encodes 5+ character matches. Two character matches are encoded as literals. All three selectors use a position_slot and extra_bits table given below, and selector 6 uses a length_slot and length_extra_bits table.

position_slot position_base extra_bits
0 0 0
1 1 0
2 2 0
3 3 0
4 4 1
5 6 1
6 8 2
7 12 2
8 16 3
9 24 3
10 32 4
11 48 4
12 64 5
13 96 5
14 128 6
15 192 6
16 256 7
17 384 7
18 512 8
19 768 8
20 1024 9
21 1536 9
22 2048 10
23 3072 10
24 4096 11
25 6144 11
26 8192 12
27 12288 12
28 16384 13
29 24576 13
30 32768 14
31 49152 14
32 65536 15
33 98304 15
34 131072 16
35 196608 16
36 262144 17
37 393216 17
38 524288 18
39 786432 18
40 1048576 19
41 1572864 19
length_slot length_base length_extra_bits
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 1
7 8 1
8 10 1
9 12 1
10 14 2
11 18 2
12 22 2
13 26 2
14 30 3
15 38 3
16 46 3
17 54 3
18 62 4
19 78 4
20 94 4
21 110 4
22 126 5
23 158 5
24 190 5
25 222 5
26 254 0

The match selectors use several models to encode match length slots and position slots. The size of each position model depends on the compression table size. For example, if the table size is 1K, there is no need to encode position slots of 1K or greater. The number of position slots for each table size, and thus the position model size, is given in the Appendix. The 4 and 5 selectors also have a smaller maximum position slot regardless of table size; this implies that small matches are not emitted beyond a certain point. (In fact, the original compressor often doesn't emit small matches even within the range of its position slot table, but that is an implementation detail.)

Selector Number of entries Starting symbol
4 max 24 0
5 max 36 0
6 (position) max 42 0
6 (length) 27 0
(note that the models are separate, even if they happen to be the same size. That is, they each update separately and are not affected by updates to each other)

To decompress a selector 4 or 5 match, read a symbol from the code stream using the appropriate model. Look up the symbol in the extra_bits table, and read that many bits. Add the position_base from the position_slot table to the extra bits, and add one. This is the match offset. The match length is 3 (selector 4) or 4 (selector 5).

To decompress a selector 6 match, read a symbol from the code stream using the selector 6 length model. Look up the symbol in the length_extra_bits table, and read that many bits. Add the length_base from the length_slot table to the extra bits, and add 5. This is the match length. Now read a symbol from the code stream using the selector 6 position model. Look up the symbol in the extra_bits table, and read that many bits. Add the position_base from the position_slot table to the extra bits just read, and add 1. This is the match offset.

Framing and Padding — Microsoft CAB

This paragraph is applicable only to the CAB implementation
In order to ensure that there are an integral number of uncompressed bytes per CAB block, no LZ match will cause the uncompressed stream to cross an $8000-byte boundary. Further, the arithmetic coder is stopped at each $8000 byte boundary, with any remaining bits in the block set to zero. The models and LZ window are not affected by crossing such a boundary.

Checksum — Cinematronics original

This paragraph is applicable only to the Cinematronics implementation
Each time you output a byte, whether from a literal or a match, XOR it into a checksum accumulator (inital value 0), then rotate the accumulator one bit left. There is no EOF symbol, so keep reading until you've output the number of bytes you know are in the file. Then read 16 bits from the stream. These 16 bits should match your checksum accumulator. You're done.

EOF when encoding

Note that because of the way the decoder works, it has already taken in 16 + bit_plus_follow bits past the point the encoder had stopped. Some of these bits are significant. The method used in the CACM87 coder to handle this does not seem to work. Instead, what seems to work is: put a 0. Put bit-plus-follow 1s. Put the bottom 15 bits of the low end of the current range. Then put the checksum. (this is not the actual method Quantum uses)

The method used in CACM87 is, after the last symbol is processed through the coder, the bit-plus-follow counter is incremented and the low end range value is checked. If it is below 1/4 (0x3FFF), a one (with follow) is output. Otherwise, a zero (with follow) is output. I'm not sure whether this is valid or works 'by accident' in their application.

I hit upon my method by considering what would happen if more symbols had been sent, with very low cumulative frequency. In that case, the high end of the range gets set equal (or nearly, anyway) to the low end of the range, and the bits output are exactly those from the low end of the range, with the bit-plus-follow bits from the last real symbol, if any, after the high bit of the low end of the range. (the constant 0 bit is because CS_L is always less than 1/2 at the end of the encoding loop).

Hmm... new method based on the MACM-96 encoder. That one outputs the code with the most trailing zeros. Not helpful efficiencywise for Quantum, but it cleans things up a bit. The important thing to realize is that all that is necessary is to send a code between CS_L and CS_H. But at termination, CS_H & 0x8000 = 1 and CS_L & 0x8000 = 0. So the code to be sent is simply 0x8000. Since we're sending a 1 bit first, the follow bits are 0 and therefore we send 0x8000 + bit_plus_follow zeros. This is also not the method Quantum uses; they appear to send a zero bit first.

If anyone knows the "correct" method, or the one Quantum uses, please let me know.

Appendix A: Quantum tables in C format

static int position_slot[] =
{
  0x00000, 0x00001, 0x00002, 0x00003, 0x00004, 0x00006, 0x00008, 0x0000c,
  0x00010, 0x00018, 0x00020, 0x00030, 0x00040, 0x00060, 0x00080, 0x000c0,
  0x00100, 0x00180, 0x00200, 0x00300, 0x00400, 0x00600, 0x00800, 0x00c00,
  0x01000, 0x01800, 0x02000, 0x03000, 0x04000, 0x06000, 0x08000, 0x0c000,
  0x10000, 0x18000, 0x20000, 0x30000, 0x40000, 0x60000, 0x80000, 0xc0000,
  0x100000, 0x180000
};
static int extra_bits[] = /* position extra_bits */
{
   0,  0,  0,  0,  1,  1,  2,  2,
   3,  3,  4,  4,  5,  5,  6,  6,
   7,  7,  8,  8,  9,  9, 10, 10,
  11, 11, 12, 12, 13, 13, 14, 14,
  15, 15, 16, 16, 17, 17, 18, 18,
  19, 19
};
static int length_slot[] =
{
  0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x08,
  0x0a, 0x0c, 0x0e, 0x12, 0x16, 0x1a, 0x1e, 0x26,
  0x2e, 0x36, 0x3e, 0x4e, 0x5e, 0x6e, 0x7e, 0x9e,
  0xbe, 0xde, 0xfe
};
static int length_extra_bits[] =
{
   0,  0,  0,  0,  0,  0,  1,  1,
   1,  1,  2,  2,  2,  2,  3,  3,
   3,  3,  4,  4,  4,  4,  5,  5,
   5,  5,  0
};
static int num_position_slots[] = /* number of position slots for (tsize - 10) */
{
  20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42
};

Appendix B -- the arithmetic coder (decoding section, partial)


ushort getfreq(ushort totfreq)
/* get frequency from code */
{
  ULONG range;
  ULONG freq;

  range = ((CS_H - CS_L)&0xFFFF) + 1;
  freq = ((CS_C - CS_L + 1) * totfreq - 1)/range;
  return freq & 0xFFFF;
}

int getcode(int cumfreqm1, int cumfreq, int totfreq)
/* the decoder renormalization loop */
{
  ULONG range;
  int uf;

  range = (CS_H - CS_L) + 1;
  CS_H = CS_L + ((cumfreqm1 * range)/totfreq) - 1;
  CS_L = CS_L + (cumfreq * range)/totfreq;

  while (1) {
    if ((CS_L & 0x8000) != (CS_H & 0x8000)) {
      if ((CS_L&0x4000) && !(CS_H&0x4000)) {
        /* underflow case */
        CS_C ^= 0x4000;
        CS_L &= 0x3FFF;
        CS_H |= 0x4000;
      }
      else 
        break;
    }
    CS_L <<= 1;
    CS_H = (CS_H << 1) | 1;
    CS_C  = (CS_C << 1) | getbit();
  }

struct modelsym {
  ushort sym;
  ushort cumfreq;
};

struct model {
  int entries; 
  struct modelsym syms[1];
};

getsym(struct model *model) {
  int freq;
  int i;
  int sym;

  freq = getfreq(model->syms[0].cumfreq);
  for (i = 1; i < model->entries; i++) {
    if (model->syms[i].cumfreq <= freq)
      break;
  }
  sym = model->syms[i-1].sym;

  getcode(model->syms[i-1].cumfreq,
          model->syms[i].cumfreq,
          model->syms[0].cumfreq);

  update_model(model, i);

  return sym;
}