llmcompressor.modifiers.transform.quip.base

Classes:

QuIPModifier –

Implements the transforms according to

QuIPModifier

Bases: Modifier

Implements the transforms according to QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Transforms (rotations) are extra layers added to a model which reduce the accuracy loss induced by quantization. This is achieved through "rotating" weights and activations into a space with a smaller dynamic range of values, thus decreasing the range of scales required for quantization.

QuIP and QuIP# apply transforms to every linear layer, two of which are fused into the model weights and two of which remain as online rotations computed at runtime.

Lifecycle: - on_initialize - as needed, create transform schemes for V (input) and U (output) - on_start - apply TransformConfig - fuse transforms into weights for mergeable transforms - add hooks for online transforms - on sequential epoch end - on_end - on_finalize

Parameters:

rotations
–

which rotation schemes to apply to the model. Including "v" will rotate the input side of weights, and including "u" will rotate the output side of weights (note that v does not require u and vice-versa)
transform_type
–

The type of transform to apply to the model. "hadamard" has the least performance cost but only supports sizes which are powers of power of two. "random-hadamard" has more performance cost, but supports a much larger set of sizes. "random-matrix" has the greatest performance cost, but supports any size
randomize
–

If true, create distinct transforms for each application
learnable
–

If true, attach gradients to transform weights for training
precision
–

Precision at which all transforms should be applied. This applies to both weight fusing and online rotations
transform_block_size
–

Block size to use for rotation matrices. The model's hidden_size must be evenly divisible by transform_block_size. Layers will be transformed by a block-diagonal matrix where each block is a matrix of this size. If None is provided, model's hidden_size will be used
ignore
–

Modules to ignore when attaching transforms
transform_config
–

Optional transform config for overriding provided arguments

llmcompressor.modifiers.transform.quip.base

QuIPModifier

`rotations`

`transform_type`

`randomize`

`learnable`

`precision`

`transform_block_size`

`ignore`

`transform_config`