llmcompressor.modifiers.transform.spinquant.base
Classes:
-
SpinQuantModifier
–Implements the transforms according to "SpinQuant: LLM quantization
SpinQuantModifier
Bases: Modifier
Implements the transforms according to "SpinQuant: LLM quantization with learned rotations" (https://arxiv.org/abs/2405.16406)
Transforms (rotations) are extra layers added to a model which reduce the accuracy loss induced by quantization. This is achived through "rotating" weights and activations into a space with a smaller dynamic range of values, thus decreasing the range of scales required for quantization.
The SpinQuant authors describe four different rotations which can be applied to a model. R1 and R2 are "offline" rotations, meaning that they can be fused into existing weights and therefore do not induce runtime cost. R3 and R4 are "online" rotations, meaning that they require additional computation at runtime.
Lifecycle: - on_initialize - infer SpinQuantMappings & NormMappings - as needed, create transform schemes for R1, R2, R3, & R4 - on_start - normalize embeddings - fuse norm layers into subsequent Linear layers - apply TransformConfig - fuse transforms into weights for mergeable transforms - add hooks for online transforms - on sequential epoch end - on_end - on_finalize
Parameters:
-
rotations
A list containing the names of rotations to apply to the model. Possible rotations include R1, R2, R3, and R4
-
transform_type
The type of transform to apply to the model.
"hadamard"
has the least performance cost but only supports sizes which are powers of power of two."random-matrix"
has more performance cost, but supports a much larger set of sizes."random-matrix"
has the greatest performance cost, but supports any size -
randomize
if True, create distinct transforms for each application
-
learnable
if True, attach gradients to transform weights for training
-
precision
Precision at which all transforms should be applied. This applies to both weight fusing and online rotations
-
mappings
Specifies layers within a model to target for transforms. A mapping will be inferred if None is provided
-
norm_mappings
Specifies layers within a model to target for norm fusing. A mapping will be inferred if None is provided
-
transform_config
Optional transform config for overriding provided arguments