llmcompressor.modifiers.pruning.wanda.base
Classes:
-
WandaPruningModifier
–Modifier for applying the one-shot WANDA algorithm to a model
WandaPruningModifier
Bases: SparsityModifierBase
Modifier for applying the one-shot WANDA algorithm to a model from the paper: https://arxiv.org/abs/2306.11695
| Sample yaml: | test_stage: | sparsity_modifiers: | WandaPruningModifier: | sparsity: 0.5 | mask_structure: "2:4"
Lifecycle: - on_initialize - register_hook(module, calibrate_module, "forward") - run_sequential / run_layer_sequential / run_basic - make_empty_row_scalars - accumulate_row_scalars - on_sequential_batch_end - sparsify_weight - on_finalize - remove_hooks()
Parameters:
-
sparsity
Sparsity to compress model to
-
sparsity_profile
Can be set to 'owl' to use Outlier Weighed Layerwise Sparsity (OWL), more information can be found in the paper https://arxiv.org/pdf/2310.05175
-
mask_structure
String to define the structure of the mask to apply. Must be of the form N:M where N, M are integers that define a custom block shape. Defaults to 0:0 which represents an unstructured mask.
-
owl_m
Number of outliers to use for OWL
-
owl_lmbda
Lambda value to use for OWL
-
sequential_targets
list of layer names to compress during OBCQ, or 'ALL' to compress every layer in the model. Alias for
targets
-
targets
list of layer names to compress during OBCQ, or 'ALL' to compress every layer in the model. Alias for
sequential_targets
-
ignore
optional list of module class names or submodule names to not quantize even if they match a target. Defaults to empty list.
Methods:
-
calibrate_module
–Calibration hook used to accumulate the row scalars of the input to the module
-
compress_modules
–Sparsify modules which have been calibrated
calibrate_module
Calibration hook used to accumulate the row scalars of the input to the module
Parameters:
-
module
Module
) –module being calibrated
-
args
Tuple[Tensor, ...]
) –inputs to the module, the first element of which is the cannonical input
-
_output
Tensor
) –uncompressed module output, unused
Source code in llmcompressor/modifiers/pruning/wanda/base.py
compress_modules
Sparsify modules which have been calibrated