llmcompressor.modifiers.transform.quip
Modules:
-
base–
Classes:
-
QuIPModifier–Implements the transforms according to
QuIPModifier
Bases: Modifier
Implements the transforms according to QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Transforms (rotations) are extra layers added to a model which reduce the accuracy loss induced by quantization. This is achieved through "rotating" weights and activations into a space with a smaller dynamic range of values, thus decreasing the range of scales required for quantization.
QuIP and QuIP# apply transforms to every linear layer, two of which are fused into the model weights and two of which remain as online rotations computed at runtime.
Lifecycle: - on_initialize - as needed, create transform schemes for V (input) and U (output) - on_start - apply TransformConfig - fuse transforms into weights for mergeable transforms - add hooks for online transforms - on sequential epoch end - on_end - on_finalize
Parameters:
-
–rotationswhich rotation schemes to apply to the model. Including
"v"will rotate the input side of weights, and including"u"will rotate the output side of weights (note that v does not require u and vice-versa) -
–transform_typeThe type of transform to apply to the model.
"hadamard"has the least performance cost but only supports sizes which are powers of power of two."random-hadamard"has more performance cost, but supports a much larger set of sizes."random-matrix"has the greatest performance cost, but supports any size -
–randomizeIf true, create distinct transforms for each application
-
–learnableIf true, attach gradients to transform weights for training
-
–precisionPrecision at which all transforms should be applied. This applies to both weight fusing and online rotations
-
–transform_block_sizeBlock size to use for rotation matrices. The model's hidden_size must be evenly divisible by transform_block_size. Layers will be transformed by a block-diagonal matrix where each block is a matrix of this size. If None is provided, model's hidden_size will be used
-
–ignoreModules to ignore when attaching transforms
-
–transform_configOptional transform config for overriding provided arguments