llmcompressor.pytorch.utils.sparsification_info.helpers

Functions:

get_leaf_operations –

Get the leaf operations in the model
get_precision_information –

Get the information about the precision of the operation.
is_quantized –

Check whether the operation is quantized (contains

get_leaf_operations

get_leaf_operations(
    model: Module,
    operations_to_skip: Optional[List[Module]] = None,
    operations_to_unwrap: Optional[List[Module]] = None,
) -> List[torch.nn.Module]

Get the leaf operations in the model (those that do not have operations as children)

Parameters:

model
(Module) –

the model to get the leaf operations from
operations_to_skip
(Optional[List[Module]], default: None ) –

a list of leaf operations that will be omitted when getting the leaf operations. If None passed, by default the Identity operation will be skipped
operations_to_unwrap
(Optional[List[Module]], default: None ) –

a list of operations that will be unwrapped when getting the leaf operations. Unwrapping means that we directly add the module(s) that is/are wrapped by the operation (i.e. operation's module attribute) to the list of leaf operations. If None passed, by default the QuantWrapper operation will be unwrapped

Returns:

List[Module] –

a list of the leaf operations

Source code in llmcompressor/pytorch/utils/sparsification_info/helpers.py

def get_leaf_operations(
    model: torch.nn.Module,
    operations_to_skip: Optional[List[torch.nn.Module]] = None,
    operations_to_unwrap: Optional[List[torch.nn.Module]] = None,
) -> List[torch.nn.Module]:
    """
    Get the leaf operations in the model
    (those that do not have operations as children)

    :param model: the model to get the leaf operations from
    :param operations_to_skip: a list of leaf operations that will be
        omitted when getting the leaf operations. If None passed, by
        default the Identity operation will be skipped
    :param operations_to_unwrap: a list of operations that will be unwrapped
        when getting the leaf operations. Unwrapping means that we directly
        add the module(s) that is/are wrapped by the operation (i.e. operation's
        `module` attribute) to the list
        of leaf operations. If None passed, by default the QuantWrapper
        operation will be unwrapped
    :return: a list of the leaf operations
    """
    if operations_to_skip is None:
        operations_to_skip = [Identity]

    if operations_to_unwrap is None:
        operations_to_unwrap = [QuantWrapper]

    leaf_operations = []
    children = list(model.children())

    if children == []:
        return model
    else:
        for child in children:
            if isinstance(child, tuple(operations_to_unwrap)):
                leaf_operations.append(child.module)
                continue
            try:
                leaf_operations.extend(get_leaf_operations(child))
            except TypeError:
                leaf_operations.append(get_leaf_operations(child))
    leaf_operations = [
        op for op in leaf_operations if not isinstance(op, tuple(operations_to_skip))
    ]
    return leaf_operations

get_precision_information

get_precision_information(
    operation: Module,
) -> Union[None, int, QuantizationScheme]

Get the information about the precision of the operation.

1) If operation is quantized, returns the quantization scheme of the operation. 2) If operation is not quantized, returns the numer of bits of the operation's weights. 3) If operation is not quantized and does not have a weights, returns None.

Parameters:

operation
(Module) –

the operation to get the quantization scheme from

Returns:

Union[None, int, QuantizationScheme] –

the quantization scheme of the operation, the number of bits of the operation's weights, or None if the operation is not quantized and does not have a weight

Source code in llmcompressor/pytorch/utils/sparsification_info/helpers.py

def get_precision_information(
    operation: torch.nn.Module,
) -> Union[None, int, "QuantizationScheme"]:  # noqa F821
    """
    Get the information about the precision of the operation.

    1)  If operation is quantized, returns the quantization
        scheme of the operation.
    2)  If operation is not quantized, returns the numer of bits
        of the operation's weights.
    3)  If operation is not quantized and does not have a weights,
        returns None.

    :param operation: the operation to get the quantization scheme from
    :return: the quantization scheme of the operation, the number of bits
        of the operation's weights, or None if the operation is not quantized
        and does not have a weight
    """

    if hasattr(operation, "quantization_scheme"):
        return getattr(operation, "quantization_scheme")
    elif hasattr(operation, "weight"):
        return _get_num_bits(operation.weight.dtype)
    else:
        return None

is_quantized

is_quantized(operation: Module) -> bool

Check whether the operation is quantized (contains a quantization scheme)

Source code in llmcompressor/pytorch/utils/sparsification_info/helpers.py

def is_quantized(operation: torch.nn.Module) -> bool:
    """
    Check whether the operation is quantized (contains
    a quantization scheme)
    """
    return hasattr(operation, "quantization_scheme")

llmcompressor.pytorch.utils.sparsification_info.helpers

get_leaf_operations

`model`

`operations_to_skip`

`operations_to_unwrap`

get_precision_information

`operation`

is_quantized