llmcompressor.observers.mse
Classes:
-
MovingAverageMSEObserver
–Implements a dynamic quantization observer that sets the scale and
MovingAverageMSEObserver
MovingAverageMSEObserver(
quantization_args: QuantizationArgs,
maxshrink: float = 0.2,
patience: int = 5,
averaging_constant: float = 0.01,
grid: float = 100.0,
norm: float = 2.4,
**kwargs
)
Bases: Observer
Implements a dynamic quantization observer that sets the scale and zero point based on a moving average of the mse-clipped min and max observed values
Methods:
-
calculate_mse_min_max
–Computes the mse-clipped min and max values of the observed tensor by
-
calculate_qparams
–Updates the mse-clipped min and max values of the observed tensor using
-
calculate_updated_min_max
–Updates the mse-clipped min and max values of the observed tensor using
-
reset
–Reset the state of the observer, including min and maximum values
Source code in llmcompressor/observers/mse.py
calculate_mse_min_max
calculate_mse_min_max(
observed: Tensor,
reduce_dims: Optional[Tuple[int]] = None,
global_scale: Optional[Tensor] = None,
)
Computes the mse-clipped min and max values of the observed tensor by optimizing for quantization error
Parameters:
-
observed
Tensor
) –observed tensor to calculate quantization parameters for
-
reduce_dims
Optional[Tuple[int]]
, default:None
) –optional tuple of dimensions to reduce along, returned values will be shaped (1,) along the reduced dimensions
-
global_scale
Optional[Tensor]
, default:None
) –optional scale to further scale local quantization scales
Returns:
- –
tuple of min and max values derived from the observed tensor
Source code in llmcompressor/observers/mse.py
calculate_qparams
calculate_qparams(
observed: Tensor,
reduce_dims: Optional[Tuple[int]] = None,
tensor_id: Optional[Any] = None,
global_scale: Optional[Tensor] = None,
) -> Tuple[FloatTensor, IntTensor]
Updates the mse-clipped min and max values of the observed tensor using a moving average smoothed by the averaging_constant
Parameters:
-
observed
Tensor
) –observed tensor to calculate quantization parameters for
-
reduce_dims
Optional[Tuple[int]]
, default:None
) –optional tuple of dimensions to reduce along, returned scale and zero point will be shaped (1,) along the reduced dimensions
-
tensor_id
Optional[Any]
, default:None
) –Optional id if different ranges of observed tensors are passed, useful for sharding tensors by group_size
-
global_scale
Optional[Tensor]
, default:None
) –optional scale to further scale local quantization scales
Returns:
-
Tuple[FloatTensor, IntTensor]
–tuple of scale and zero point derived from the observed tensor
Source code in llmcompressor/observers/mse.py
calculate_updated_min_max
calculate_updated_min_max(
observed: Tensor,
reduce_dims: Optional[Tuple[int]] = None,
tensor_id: Optional[Any] = None,
global_scale: Optional[Tensor] = None,
) -> Tuple[FloatTensor, IntTensor]
Updates the mse-clipped min and max values of the observed tensor using a moving average smoothed by the averaging_constant
Parameters:
-
observed
Tensor
) –observed tensor to calculate quantization parameters for
-
reduce_dims
Optional[Tuple[int]]
, default:None
) –optional tuple of dimensions to reduce along, returned scale and zero point will be shaped (1,) along the reduced dimensions
-
tensor_id
Optional[Any]
, default:None
) –Optional id if different ranges of observed tensors are passed, useful for sharding tensors by group_size
-
global_scale
Optional[Tensor]
, default:None
) –optional scale to further scale local quantization scales
Returns:
-
Tuple[FloatTensor, IntTensor]
–updated min and max values derived from the observed value