llmcompressor.modifiers.transform.spinquant.mappings

Classes:

SpinQuantMapping –

SpinQuant needs to know the entire architecture of the model,

SpinQuantMapping

Bases: BaseModel

SpinQuant needs to know the entire architecture of the model, as R1, R2, R3, and R4 rotations need to be applied to specific layers (https://arxiv.org/pdf/2405.16406 Fig. 1).

Parameters:

embedding
–

name or regex of embedding layer
attn_q
–

name or regex of q_proj layer in attention block
attn_k
–

name or regex of k_proj layer in attention block
attn_v
–

name or regex of v_proj layer in attention block
attn_o
–

name or regex of o_proj layer in attention block
attn_head_dim
–

head_dim of the attention module, needed because R2 needs to be applied "head-wisely" to v_proj and o_proj
mlp_in
–

list of names or regexes for the mlp blocks that receive the input to the MLP block, usually up_proj and gate_proj
mlp_out
–

list of names or regexes for the mlp blocks that consitute the output of the MLP block, usually down_proj

llmcompressor.modifiers.transform.spinquant.mappings

SpinQuantMapping

`embedding`

`attn_q`

`attn_k`

`attn_v`

`attn_o`

`attn_head_dim`

`mlp_in`

`mlp_out`