Skip to content

Compression Formats

The following table outlines the possible quantization and sparsity compression formats that are applied to a model during compression. The formats are determined according to the quantization scheme and sparsity type. For more details on the quantization schemes, see guides/compression_schemes.md.

Quantization Sparsity Quant Compressor Sparsity Compressor
W8A8 - int None int_quantized Dense
W8A8 - float None float_quantized Dense
W4A16 - float None nvfp4_pack_quantized Dense
W4A4 - float None nvfp4_pack_quantized Dense
W4A16 - int None pack_quantized Dense
W8A16 - int None pack_quantized Dense
W8A16 - float None naive_quantized Dense
W8A8 - int 2:4 int_quantized Sparse24
W8A8 - float 2:4 float_quantized Sparse24
W4A16 - int 2:4 marlin_24 Dense
W8A16 - int 2:4 marlin_24 Dense
W8A16 - float 2:4 naive_quantized Dense