llmcompressor.transformers.compression.sparsity_metadata_config
Classes:
-
SparsityConfigMetadata
–Class of helper functions for filling out a SparsityCompressionConfig with readable
SparsityConfigMetadata
Class of helper functions for filling out a SparsityCompressionConfig with readable metadata from the model
Methods:
-
fill_config_details
–Fills in informational sparsity parameters from a given model
-
from_pretrained
–Determines compression type and informational parameters for a given model
-
infer_global_sparsity
–Calculates the global percentage of sparse zero weights in the model
-
infer_sparsity_structure
–Determines what sparsity structure, if any, was applied.
-
is_sparse24_bitmask_supported
–Determines if sparse 24 bitmask sparse compressor is supported for a given model
fill_config_details staticmethod
fill_config_details(
config: SparsityCompressionConfig,
model: Module,
state_dict: Optional[Dict[str, Tensor]] = None,
)
Fills in informational sparsity parameters from a given model
Parameters:
-
config
SparsityCompressionConfig
) –sparsity config to fill in
-
model
Module
) –pytorch model to infer config parameters from
-
state_dict
Optional[Dict[str, Tensor]]
, default:None
) –optional state_dict to replace that in model, used for gathering global FSDP model info
Source code in llmcompressor/transformers/compression/sparsity_metadata_config.py
from_pretrained staticmethod
from_pretrained(
model: Module,
state_dict: Optional[Dict[str, Tensor]] = None,
compress: bool = False,
quantization_format: Optional[CompressionFormat] = None,
disable_sparse_compression: bool = False,
sparsity_structure: Optional[str] = None,
) -> Optional[SparsityCompressionConfig]
Determines compression type and informational parameters for a given model
Parameters:
-
model
Module
) –pytorch model to calculate sparsity config for
-
state_dict
Optional[Dict[str, Tensor]]
, default:None
) –optional state_dict to replace that in model, used for gathering global FSDP model info
-
compress
bool
, default:False
) –whether or not to compress the model on disk
-
quantization_format
Optional[CompressionFormat]
, default:None
) –the quantization compression format being used for the model
-
disable_sparse_compression
bool
, default:False
) –whether or not to compress the model with sparse compressors, If True, the sparse compression format will be dense, default is False.
-
sparsity_structure
Optional[str]
, default:None
) –sparsity structure for the model. Providing it as input will skip the step to infer it from the model directly
Returns:
-
Optional[SparsityCompressionConfig]
–compression config inferred from the model
Source code in llmcompressor/transformers/compression/sparsity_metadata_config.py
infer_global_sparsity staticmethod
Calculates the global percentage of sparse zero weights in the model
Parameters:
-
model
Module
) –pytorch model to infer sparsity of
-
state_dict
Optional[Dict[str, Tensor]]
, default:None
) –optional state_dict to replace that in model, used for gathering global FSDP model info
Returns:
-
float
–global sparsity of model
Source code in llmcompressor/transformers/compression/sparsity_metadata_config.py
infer_sparsity_structure staticmethod
infer_sparsity_structure(
model: Optional[Module] = None,
check_only_modifiers: Optional[bool] = False,
) -> str
Determines what sparsity structure, if any, was applied.
First, there is an attempt to dedue the sparsity structure from the currently active sparse session.
If that fails, the sparsity structure is inferred from the model (if provided)
Finally, if both fail, the sparsity structure is set to "unstructured"
Returns:
-
str
–sparsity structure as a string
Source code in llmcompressor/transformers/compression/sparsity_metadata_config.py
is_sparse24_bitmask_supported staticmethod
Determines if sparse 24 bitmask sparse compressor is supported for a given model and its sparsity structure in vLLM
Parameters:
-
model
Module
) –pytorch model to check for sparse 24 bit sparsity support
-
sparsity_structure
Optional[str]
, default:None
) –sparsity structure of the model, if not supplied it will be inferred
Returns:
-
bool
–whether or not sparse 24 bitmask compression is supported in vLLM for the given model