Dataset Compression Encoding Types

Encoding Types

Setting reasonable encoding types for GIS data based on different application purposes is very beneficial for improving system performance and saving storage space. Various encoding methods are provided, each with its own characteristics and applicable scenarios, as shown in the table below:

Encoding Types Description
Unencoded No encoding method is used.
SGL

SGL (SuperMap Grid LZW) is a custom compression storage format by SuperMap. Essentially, it is an improved LZW encoding method.

SGL enhances LZW, making it a more efficient compression storage method.

Currently, the SGL compression encoding method is adopted for compressed storage of grid datasets and DEM datasets in SuperMap. It is a lossless compression method suitable for raster datasets.

DCT

DCT (Discrete Cosine Transform) is a transformation encoding method widely used in image compression.

This transformation achieves a good balance between compression capability, reconstructed image quality, adaptation range, and algorithm complexity, making it the most widely used image compression technology today.

Its principle is to reduce the strong correlation present in the original spatial domain representation of images through transformation, allowing signals to be expressed more compactly. This method has high compression ratios and performance, but the encoding is lossy.

Since image datasets are generally not used for precise analysis, the DCT encoding method is a compression encoding method for image datasets and is suitable for image datasets.

LZW

LZW is a widely adopted dictionary-based compression method, initially used for compressing text data.

The principle of LZW encoding is to replace a string with a code, so subsequent identical strings use the same code. This encoding method can compress both repetitive and non-repetitive data.

It is a lossless compression encoding method suitable for compressing indexed color images and is applicable to raster and image datasets.

PNG The PNG compression encoding method supports images with various bit depths and is a lossless compression method suitable for image datasets.
JPEG

JPEG is a lossy compression method. While ensuring no perceptible difference in visual effects, the compression ratio can reach 1/20 to 1/40. It has a high compression degree and is suitable for image data used as background images.

Compound Compound mode, whose compression ratio is close to DCT Encode Type, mainly aims at the problem of boundary image block distortion caused by DCT compression. (For Image Dataset in RGB format).
4-byte Int 32 mode. Uses four bytes to store a coordinate value. Works with Vector Data but not with Point Dataset, Tabular Dataset, CAD Dataset, and 3D Vector Dataset.
3-byte Int 24 mode. Uses three bytes to store a coordinate value. Works with Vector Data but not with Point Dataset, Tabular Dataset, CAD Dataset, and 3D Vector Dataset.
2-byte Int 16 mode. Stores a coordinate value in two bytes. Works with Vector Data, but not with Point Dataset, Tabular Dataset, CAD Dataset, and 3D Vector Dataset.
Single-byte Byte mode. Uses one byte to store one coordinate value. Works with Vector Data, but not with Point Dataset, Tabular Dataset, CAD Dataset, and 3D Vector Dataset.

Applicable Scope of Encoding Types

Four compression encoding methods are supported for vector datasets: single-byte, 2-byte, 3-byte, and 4-byte encoding. These four compression encoding methods share the same compression mechanism but differ in compression ratios, all of which are lossy.

For raster datasets, four compression encoding methods are available: DCT, SGL, LZW, and composite encoding. Among them, DCT and composite encoding are lossy, while SGL and LZW are lossless.

It is important to note that point datasets, attribute table datasets, and CAD datasets do not support compression encoding.

As shown in the diagram, for an object in a line dataset with a minimum bounding rectangle width of Width and height of Height, the compression ratio for single-byte encoding is:

ratio = max(Width, Height) / 255

Here, 255 is the maximum value representable by a single byte. Assume the bottom-left corner of the line object is at (x0, y0). For a point (x, y) on the line object, the encoded coordinates are:
x' = byte[(x - x0) / ratio]
y' = byte[(y - y0) / ratio]

Thus, the encoded point coordinates are stored as byte data types, reducing the dataset's storage space to 1/8 of the unencoded size. Clearly, when double-precision (8-byte) data is encoded into a single byte, there is a loss of precision. The maximum precision loss for coordinate values is the value of ratio.

For other vector compression encoding methods, the principle is the same. The maximum precision loss for compressed coordinates is:
ratio = max(Width, Height) / maxValue

Here:

  • Width and Height are the width and height of the geometric object's minimum bounding rectangle.
  • maxValue is the maximum value representable by the encoding data type (255 for single-byte encoding, 65,535 for 2-byte, 16,777,215 for 3-byte, and 4,294,967,295 for 4-byte encoding).

Caution:

Raster Datasets: Only LZW and SGL are supported. If the original encoding method is DCT or composite, SGL encoding can be chosen. Non-grayscale 8-bit image palettes do not support DCT encoding and can be converted to LZW encoding.

  • Vector Compression Example: Take single-byte encoding for line datasets as an example. Assume the spatial data of an uncompressed line dataset is stored as 2-precision data. By using single-byte encoding, the storage is compressed as described above.
  • Images and Raster Datasets: Choosing an appropriate compression encoding method based on pixel format is beneficial for improving system performance and saving storage space. The table below lists suitable encoding methods for different pixel formats in image and raster datasets.
  • For example, Pixel Format is 1-bit monochrome data and supports the original Encode Type of LZW and PNG. When the original Encode Type is DCT, SGL and Compound, the Encoding Format of LZW can be selected.

Encoding Methods for Datasets

Encoding methods can be modified through functions like creating, copying, importing, or exporting datasets. Alternatively, you can right-click a dataset and check its encoding method in the properties window.

Create Dataset

When creating a new vector dataset, you can set its encoding method. A dropdown menu will provide available encoding methods based on the dataset type.

Copy Dataset

When copying a dataset, you can also set its encoding method. The dropdown menu will show encoding methods compatible with the dataset type.

Dataset Type Encode Type
Vector Dataset Single-byte, double-byte, three-byte, four-byte
DEM/Grid Dataset SGL,LZW
Image Dataset LZW,DCT,PNG

Import Dataset

When importing external-format datasets, you can select the encoding method. The default encoding for image datasets is DCT, while other formats are unencoded by default.

Export Image Dataset

Right-click within the map window, choose Export as Image Dataset from the context menu, and select the encoding method: DCT, LZW, PNG, or None (no encoding).