llmcompressor.observers.mse
Classes:
-
MovingAverageMSEObserver–Implements a dynamic quantization observer that sets the scale and
MovingAverageMSEObserver
MovingAverageMSEObserver(
quantization_args: QuantizationArgs,
maxshrink: float = 0.2,
patience: int = 5,
averaging_constant: float = 0.01,
grid: float = 100.0,
norm: float = 2.4,
**kwargs,
)
Bases: Observer
Implements a dynamic quantization observer that sets the scale and zero point based on a moving average of the mse-clipped min and max observed values
Methods:
-
calculate_mse_min_max–Computes the mse-clipped min and max values of the observed tensor by
-
calculate_qparams–Updates the mse-clipped min and max values of the observed tensor using
-
calculate_updated_min_max–Updates the mse-clipped min and max values of the observed tensor using
-
reset–Reset the state of the observer, including min and maximum values
Source code in llmcompressor/observers/mse.py
calculate_mse_min_max
calculate_mse_min_max(
observed: Tensor,
reduce_dims: Optional[Tuple[int]] = None,
global_scale: Optional[Tensor] = None,
)
Computes the mse-clipped min and max values of the observed tensor by optimizing for quantization error
Parameters:
-
(observedTensor) –observed tensor to calculate quantization parameters for
-
(reduce_dimsOptional[Tuple[int]], default:None) –optional tuple of dimensions to reduce along, returned values will be shaped (1,) along the reduced dimensions
-
(global_scaleOptional[Tensor], default:None) –optional scale to further scale local quantization scales
Returns:
- –
tuple of min and max values derived from the observed tensor
Source code in llmcompressor/observers/mse.py
calculate_qparams
calculate_qparams(
observed: Tensor,
reduce_dims: Optional[Tuple[int]] = None,
tensor_id: Optional[Any] = None,
global_scale: Optional[Tensor] = None,
) -> Tuple[FloatTensor, IntTensor]
Updates the mse-clipped min and max values of the observed tensor using a moving average smoothed by the averaging_constant
Parameters:
-
(observedTensor) –observed tensor to calculate quantization parameters for
-
(reduce_dimsOptional[Tuple[int]], default:None) –optional tuple of dimensions to reduce along, returned scale and zero point will be shaped (1,) along the reduced dimensions
-
(tensor_idOptional[Any], default:None) –Optional id if different ranges of observed tensors are passed, useful for sharding tensors by group_size
-
(global_scaleOptional[Tensor], default:None) –optional scale to further scale local quantization scales
Returns:
-
Tuple[FloatTensor, IntTensor]–tuple of scale and zero point derived from the observed tensor
Source code in llmcompressor/observers/mse.py
calculate_updated_min_max
calculate_updated_min_max(
observed: Tensor,
reduce_dims: Optional[Tuple[int]] = None,
tensor_id: Optional[Any] = None,
global_scale: Optional[Tensor] = None,
) -> Tuple[FloatTensor, IntTensor]
Updates the mse-clipped min and max values of the observed tensor using a moving average smoothed by the averaging_constant
Parameters:
-
(observedTensor) –observed tensor to calculate quantization parameters for
-
(reduce_dimsOptional[Tuple[int]], default:None) –optional tuple of dimensions to reduce along, returned scale and zero point will be shaped (1,) along the reduced dimensions
-
(tensor_idOptional[Any], default:None) –Optional id if different ranges of observed tensors are passed, useful for sharding tensors by group_size
-
(global_scaleOptional[Tensor], default:None) –optional scale to further scale local quantization scales
Returns:
-
Tuple[FloatTensor, IntTensor]–updated min and max values derived from the observed value