Compression Formats

The following table outlines the possible quantization and sparsity compression formats that are applied to a model during compression. The formats are determined according to the quantization scheme and sparsity type. For more details on the quantization schemes, see guides/compression_schemes.md.

Quantization	Sparsity	Quant Compressor	Sparsity Compressor
W8A8 - int	None	int_quantized	Dense
W8A8 - float	None	float_quantized	Dense
W4A16 - float	None	nvfp4_pack_quantized	Dense
W4A4 - float	None	nvfp4_pack_quantized	Dense
W4A16 - int	None	pack_quantized	Dense
W8A16 - int	None	pack_quantized	Dense
W8A16 - float	None	naive_quantized	Dense
W8A8 - int	2:4	int_quantized	Sparse24
W8A8 - float	2:4	float_quantized	Sparse24
W4A16 - int	2:4	marlin_24	Dense
W8A16 - int	2:4	marlin_24	Dense
W8A16 - float	2:4	naive_quantized	Dense