llmcompressor.observers.min_max
Classes:
-
MinMaxObserver
–Implements a quantization observer that calculates scale and zero point based on the
MinMaxObserver
Bases: Observer
Implements a quantization observer that calculates scale and zero point based on the minimum and maximum values of the tensor being observed. If averaging_constant is specified, then the scales are updated using a moving average
Methods:
-
calculate_gparam
–Generate a global scale using the observed min and max.
-
calculate_qparams
–Generate a scale and zero-point using the observed min and max.
-
calculate_updated_min_max
–Updates the observed min and max using a moving average smoothed by the
-
get_qparams_along_dim
–Calculate quantization parameters along the specified dimension
-
reset
–Reset the state of the observer, including min and maximum values
Source code in llmcompressor/observers/min_max.py
calculate_gparam
Generate a global scale using the observed min and max.
Parameters:
-
observed
Tensor
) –observed tensor to calculate quantization parameters for
Returns:
-
Tensor
–updated global scale derived from the observed tensor
Source code in llmcompressor/observers/min_max.py
calculate_qparams
calculate_qparams(
observed: Tensor,
reduce_dims: Optional[Tuple[int]] = None,
tensor_id: Optional[Any] = None,
global_scale: Optional[Tensor] = None,
) -> Tuple[torch.FloatTensor, torch.IntTensor]
Generate a scale and zero-point using the observed min and max.
Parameters:
-
observed
Tensor
) –observed tensor to calculate quantization parameters for
-
reduce_dims
Optional[Tuple[int]]
, default:None
) –optional tuple of dimensions to reduce along, returned scale and zero point will be shaped (1,) along the reduced dimensions
-
tensor_id
Optional[Any]
, default:None
) –Optional id if different ranges of observed tensors are passed, useful for sharding tensors by group_size
-
global_scale
Optional[Tensor]
, default:None
) –optional scale to further scale local quantization scales
Returns:
-
Tuple[FloatTensor, IntTensor]
–tuple of scale and zero point derived from the observed tensor
Source code in llmcompressor/observers/min_max.py
calculate_updated_min_max
calculate_updated_min_max(
observed: Tensor,
reduce_dims: Optional[Tuple[int]] = None,
tensor_id: Optional[Any] = None,
)
Updates the observed min and max using a moving average smoothed by the averaging_constant. Set the averaging_constant to 1.0 to disable averaging.
Parameters:
-
observed
Tensor
) –observed tensor to calculate quantization parameters for
-
reduce_dims
Optional[Tuple[int]]
, default:None
) –optional tuple of dimensions to reduce along, returned scale and zero point will be shaped (1,) along the reduced dimensions
-
tensor_id
Optional[Any]
, default:None
) –Optional id if different ranges of observed tensors are passed, useful for sharding tensors by group_size
Returns:
- –
updated min and max values
Source code in llmcompressor/observers/min_max.py
get_qparams_along_dim
get_qparams_along_dim(
observed: Tensor,
dim: int,
tensor_id: Optional[Any] = None,
global_scale: Optional[Tensor] = None,
)
Calculate quantization parameters along the specified dimension