Skip to content

llmcompressor.modifiers.transform.quip

Modules:

Classes:

QuIPModifier

Bases: Modifier

Implements the transforms according to QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Transforms (rotations) are extra layers added to a model which reduce the accuracy loss induced by quantization. This is achieved through "rotating" weights and activations into a space with a smaller dynamic range of values, thus decreasing the range of scales required for quantization.

QuIP and QuIP# apply transforms to every linear layer, two of which are fused into the model weights and two of which remain as online rotations computed at runtime.

Lifecycle: - on_initialize - infer SpinQuantMappings & NormMappings - as needed, create transform schemes for R1, R2, R3, & R4 - on_start - normalize embeddings - fuse norm layers into subsequent Linear layers - apply TransformConfig - fuse transforms into weights for mergeable transforms - add hooks for online transforms - on sequential epoch end - on_end - on_finalize

Parameters:

  • transform_type

    The type of transform to apply to the model. "hadamard" has the least performance cost but only supports sizes which are powers of power of two. "random-hadamard" has more performance cost, but supports a much larger set of sizes. "random-matrix" has the greatest performance cost, but supports any size

  • randomize

    If true, create distinct transforms for each application

  • learnable

    If true, attach gradients to transform weights for training

  • precision

    Precision at which all transforms should be applied. This applies to both weight fusing and online rotations

  • ignore

    Modules to ignore when attaching transforms

  • transform_config

    Optional transform config for overriding provided arguments