Indexes and Compression

Compression Options:

LZWThe default compression. A variant of the Lempel-Ziv-Welch algorithm.
ROWCompresses index entries based on differences between rows (for use with fixed-length records only). It typically does not compress as well as LZW, but takes up less space in memory because the rows are expanded on demand.
'inplace'Causes the index to be built using the inplace compression format. The payload defaults to using lz4 compression.
'inplace:lz4hc' Causes inplace compression on the key fields and lz4hc compression on the payload. The resulting index can be smaller than using lz4.
'inplace:lz4s' Causes inplace compression on the key fields and lz4s compression on the payload. This uses the LZ4 stream API to avoid recompressing the data and reduce the index build times.
'inplace:lz4shc' Causes inplace compression on the key fields and lz4shc compression on the payload. This uses the high compression (HC) version of the LZ4 stream API to avoid recompressing the data and reduce the index build times. The default compression for inplace indexes in versions after versions 9.6.90, 9.8.66, and 9.10.12.

The inplace index compression format (introduced in version 9.2.0) improves compression of keyed fields and allows them to be searched without decompression. The original index compression implementation decompresses the rows when they are read from disk.

The lz4s and lz4hc inplace index compression formats (introduced in versions 9.6.90, 9.8.66, and 9.10.12 9.2.0 or later) improves compression and reduces build time. These formats require an engine that supports it. In other words, if you build an index using the lz4s or lz4shc formats, you must use a platform later than 9.6.90, 9.8.66, and 9.10.12 to read those indexes.

If you attempt to read an index with the inplace compression format on a system that does not support it, you will receive an error message.

Because the branch nodes can be searched without decompression more branch nodes fit into memory which can improve search performance. The lz4 compression used for the payload is significantly faster at decompressing leaf pages than the previous LZW compression. Whether performance is better with lz4hc (a high-compression variant of lz4) on the payload fields depends on the access characteristics of the data and how much of the index is cached in memory.

Compression Levels :

hclevelAn integer between 3 and 12 to specify the level of compression. The default is 3. Higher levels increase the compression, but also increase the compression times. This may be cost effective depending on the length of time the data is stored, and the storage costs compared to the compute costs to build the index.
maxcompressionThe maximum desired compression ratio. This avoids the leaf nodes getting too large when expanded, but increases the size of some indexes. The default is 20.
maxrecompressSpecifies the number of times the entire input dataset should be recompressed to free up space. Increasing the number decreases the size of the indexes, and will probably decrease the decompress time slightly (because there are fewer stream blocks), but will increase the build time. The default is 1.

Example:

Vehicles := DATASET('vehicles',
          {STRING2 st,STRING20 city,STRING20 lname},FLAT);

SearchTerms := RECORD
  Vehicles.st;
  Vehicles.city;
END; 
Payload     := RECORD
  Vehicles.lname;
END; 
VehicleKey := INDEX(Vehicles,SearchTerms,Payload,'vkey::st.city',
                    COMPRESSED('inplace:lz4shc,compressopt(hclevel=9,
                                                           maxcompression=25,
                                                           maxrecompress=4)'));
BUILD(VehicleKey);

See Also: DATASET, BUILDINDEX, JOIN, FETCH, KEYED/WILD