Skip to content

llmcompressor.modifiers.transform.quip.base

Classes:

QuIPModifier

Bases: Modifier

Implements the transforms according to QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Transforms (rotations) are extra layers added to a model which reduce the accuracy loss induced by quantization. This is achieved through "rotating" weights and activations into a space with a smaller dynamic range of values, thus decreasing the range of scales required for quantization.

QuIP and QuIP# apply transforms to every linear layer, two of which are fused into the model weights and two of which remain as online rotations computed at runtime.

Lifecycle: - on_initialize - as needed, create transform schemes for V (input) and U (output) - on_start - apply TransformConfig - fuse transforms into weights for mergeable transforms - add hooks for online transforms - on sequential epoch end - on_end - on_finalize

Parameters:

  • rotations

    which rotation schemes to apply to the model. Including "v" will rotate the input side of weights, and including "u" will rotate the output side of weights (note that v does not require u and vice-versa)

  • transform_type

    The type of transform to apply to the model. "hadamard" has the least performance cost but only supports sizes which are powers of power of two. "random-hadamard" has more performance cost, but supports a much larger set of sizes. "random-matrix" has the greatest performance cost, but supports any size

  • randomize

    If true, create distinct transforms for each application

  • learnable

    If true, attach gradients to transform weights for training

  • precision

    Precision at which all transforms should be applied. This applies to both weight fusing and online rotations

  • transform_block_size

    Block size to use for rotation matrices. The model's hidden_size must be evenly divisible by transform_block_size. Layers will be transformed by a block-diagonal matrix where each block is a matrix of this size. If None is provided, model's hidden_size will be used

  • ignore

    Modules to ignore when attaching transforms

  • transform_config

    Optional transform config for overriding provided arguments