llmcompressor.modifiers.utils.pytorch_helpers

PyTorch-specific helper functions for model compression.

Provides utility functions for PyTorch model operations including batch processing, padding mask application, and model architecture detection. Supports MoE (Mixture of Experts) models and specialized tensor operations for compression workflows.

Functions:

apply_pad_mask_to_batch –

Apply a mask to the input ids of a batch. This is used to zero out
is_moe_model –

Check if the model is a mixture of experts model

apply_pad_mask_to_batch

apply_pad_mask_to_batch(
    batch: Dict[str, Tensor],
) -> Dict[str, torch.Tensor]

Apply a mask to the input ids of a batch. This is used to zero out padding tokens so they do not contribute to the hessian calculation in the GPTQ and SparseGPT algorithms

Assumes that attention_mask only contains zeros and ones

Parameters:

batch
(Dict[str, Tensor]) –

batch to apply padding to if it exists

Returns:

Dict[str, Tensor] –

batch with padding zeroed out in the input_ids

Source code in llmcompressor/modifiers/utils/pytorch_helpers.py

def apply_pad_mask_to_batch(batch: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
    """
    Apply a mask to the input ids of a batch. This is used to zero out
    padding tokens so they do not contribute to the hessian calculation in the
    GPTQ and SparseGPT algorithms

    Assumes that `attention_mask` only contains zeros and ones

    :param batch: batch to apply padding to if it exists
    :return: batch with padding zeroed out in the input_ids
    """
    if "attention_mask" in batch:
        for key in ("input_ids", "decoder_input_ids"):
            if key in batch:
                batch[key] = batch[key] * batch["attention_mask"]

    return batch

is_moe_model

is_moe_model(model: Module) -> bool

Check if the model is a mixture of experts model