llmcompressor.modifiers.transform.quip.base
Classes:
-
QuIPModifier
–Implements the transforms according to
QuIPModifier
Bases: Modifier
Implements the transforms according to QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Transforms (rotations) are extra layers added to a model which reduce the accuracy loss induced by quantization. This is achieved through "rotating" weights and activations into a space with a smaller dynamic range of values, thus decreasing the range of scales required for quantization.
QuIP and QuIP# apply transforms to every linear layer, two of which are fused into the model weights and two of which remain as online rotations computed at runtime.
Lifecycle: - on_initialize - infer SpinQuantMappings & NormMappings - as needed, create transform schemes for R1, R2, R3, & R4 - on_start - normalize embeddings - fuse norm layers into subsequent Linear layers - apply TransformConfig - fuse transforms into weights for mergeable transforms - add hooks for online transforms - on sequential epoch end - on_end - on_finalize
Parameters:
-
transform_type
The type of transform to apply to the model.
"hadamard"
has the least performance cost but only supports sizes which are powers of power of two."random-hadamard"
has more performance cost, but supports a much larger set of sizes."random-matrix"
has the greatest performance cost, but supports any size -
randomize
If true, create distinct transforms for each application
-
learnable
If true, attach gradients to transform weights for training
-
precision
Precision at which all transforms should be applied. This applies to both weight fusing and online rotations
-
ignore
Modules to ignore when attaching transforms
-
transform_config
Optional transform config for overriding provided arguments