llmcompressor.entrypoints.oneshot
Oneshot compression entrypoint for post-training model optimization.
Provides the main oneshot compression entry point for applying quantization, pruning, and other compression techniques to pre-trained models without additional training. Supports calibration-based compression with various pipeline configurations for efficient model optimization.
Classes:
-
Oneshot–Class responsible for carrying out one-shot calibration on a pretrained model.
Functions:
-
oneshot–Performs oneshot calibration on a model.
Oneshot
Class responsible for carrying out one-shot calibration on a pretrained model.
This class handles the entire lifecycle of one-shot calibration, including preprocessing (model and tokenizer/processor initialization), model optimization (quantization or sparsification), and postprocessing (saving outputs). The intructions for model optimization can be specified by using a recipe.
-
Input Keyword Arguments:
kwargsare parsed into:model_args: Arguments for loading and configuring a pretrained model (e.g.,AutoModelForCausalLM).dataset_args: Arguments for dataset-related configurations, such as calibration dataloaders.recipe_args: Arguments for defining and configuring recipes that specify optimization actions.
Parsers are defined in
src/llmcompressor/args/. -
Lifecycle Overview: The oneshot calibration lifecycle consists of three steps:
- Preprocessing:
- Instantiates a pretrained model and tokenizer/processor.
- Ensures input and output embedding layers are untied if they share tensors.
- Patches the model to include additional functionality for saving with quantization configurations.
- Oneshot Calibration:
- Optimizes the model using a global
CompressionSessionand applies recipe-defined modifiers (e.g.,GPTQModifier,SparseGPTModifier)
- Optimizes the model using a global
- Postprocessing:
- Saves the model, tokenizer/processor, and configuration to the specified
output_dir.
- Saves the model, tokenizer/processor, and configuration to the specified
- Preprocessing:
-
Usage:
Methods: init(**kwargs): Initializes the Oneshot object by parsing input arguments, performing preprocessing, and setting instance attributes.
__call__(**kwargs):
Performs the one-shot calibration process by preparing a calibration
dataloader, applying recipe modifiers to the model, and executing
postprocessing steps.
save():
Saves the calibrated model and tokenizer/processor to the specified
`output_dir`. Supports saving in compressed formats based on model
arguments.
apply_recipe_modifiers(calibration_dataloader, **kwargs):
Applies lifecycle actions (e.g., `initialize`, `finalize`) using modifiers
defined in the recipe. Each action is executed via the global
`CompressionSession`.
Initializes the Oneshot class with provided arguments.
Parses the input keyword arguments into model_args, dataset_args, and recipe_args. Performs preprocessing to initialize the model and tokenizer/processor.
Parameters:
-
–model_argsModelArguments parameters, responsible for controlling model loading and saving logic
-
–dataset_argsDatasetArguments parameters, responsible for controlling dataset loading, preprocessing and dataloader loading
-
–recipe_argsRecipeArguments parameters, responsible for containing recipe-related parameters
-
–output_dirPath to save the output model after carrying out oneshot
-
(log_dirOptional[str], default:'sparse_logs') –Path to save logs during oneshot run. Nothing is logged to file if None.
Methods:
-
apply_recipe_modifiers–Applies recipe modifiers to the model during the lifecycle.
Source code in llmcompressor/entrypoints/oneshot.py
apply_recipe_modifiers
apply_recipe_modifiers(
calibration_dataloader: Optional[DataLoader],
recipe_stage: Optional[str] = None,
)
Applies recipe modifiers to the model during the lifecycle.
The modifiers are defined in the recipe and executed via lifecycle actions (initialize, finalize) through the global CompressionSession.
Source code in llmcompressor/entrypoints/oneshot.py
oneshot
oneshot(
model: Union[str, PreTrainedModel],
distill_teacher: Optional[str] = None,
config_name: Optional[str] = None,
tokenizer: Optional[
Union[str, PreTrainedTokenizerBase]
] = None,
processor: Optional[Union[str, ProcessorMixin]] = None,
cache_dir: Optional[str] = None,
use_auth_token: bool = False,
precision: str = "auto",
tie_word_embeddings: bool = False,
trust_remote_code_model: bool = False,
save_compressed: bool = True,
model_revision: str = "main",
recipe: Optional[Union[str, List[str]]] = None,
recipe_args: Optional[List[str]] = None,
clear_sparse_session: bool = False,
stage: Optional[str] = None,
dataset: Optional[
Union[str, Dataset, DatasetDict]
] = None,
dataset_config_name: Optional[str] = None,
dataset_path: Optional[str] = None,
num_calibration_samples: int = 512,
shuffle_calibration_samples: bool = True,
max_seq_length: int = 384,
pad_to_max_length: bool = True,
text_column: str = "text",
concatenate_data: bool = False,
streaming: bool = False,
overwrite_cache: bool = False,
preprocessing_num_workers: Optional[int] = None,
min_tokens_per_module: Optional[float] = None,
calibrate_moe_context: bool = False,
quantization_aware_calibration: bool = True,
output_dir: Optional[str] = None,
log_dir: Optional[str] = "sparse_logs",
**kwargs,
) -> PreTrainedModel
Performs oneshot calibration on a model.
Model arguments
Parameters:
-
(modelUnion[str, PreTrainedModel]) –A pretrained model identifier from huggingface.co/models or a path to a local model. Required parameter.
-
(distill_teacherOptional[str], default:None) –Teacher model (a trained text generation model) for distillation.
-
(config_nameOptional[str], default:None) –Pretrained config name or path if not the same as model_name.
-
(tokenizerOptional[Union[str, PreTrainedTokenizerBase]], default:None) –Pretrained tokenizer name or path if not the same as model_name.
-
(processorOptional[Union[str, ProcessorMixin]], default:None) –Pretrained processor name or path if not the same as model_name.
-
(cache_dirOptional[str], default:None) –Where to store the pretrained data from huggingface.co.
-
(use_auth_tokenbool, default:False) –Whether to use Hugging Face auth token for private models.
-
(precisionstr, default:'auto') –Precision to cast model weights to, default to auto.
-
(tie_word_embeddingsbool, default:False) –Whether the model's input and output word embeddings should be tied.
-
(trust_remote_code_modelbool, default:False) –Whether to allow for custom models to execute their own modeling files.
-
(save_compressedbool, default:True) –Whether to compress sparse models during save.
-
(model_revisionstr, default:'main') –The specific model version to use (can be branch name, tag, or commit id). # Recipe arguments
-
(recipeOptional[Union[str, List[str]]], default:None) –Path to a LLM Compressor sparsification recipe.
-
(recipe_argsOptional[List[str]], default:None) –List of recipe arguments to evaluate, in the format "key1=value1", "key2=value2".
-
(clear_sparse_sessionbool, default:False) –Whether to clear CompressionSession/ CompressionLifecycle data between runs.
-
(stageOptional[str], default:None) –The stage of the recipe to use for oneshot. # Dataset arguments
-
(datasetOptional[Union[str, Dataset, DatasetDict]], default:None) –The name of the dataset to use (via the datasets library).
-
(dataset_config_nameOptional[str], default:None) –The configuration name of the dataset to use.
-
(dataset_pathOptional[str], default:None) –Path to a custom dataset. Supports json, csv, dvc.
-
(num_calibration_samplesint, default:512) –Number of samples to use for one-shot calibration.
-
(shuffle_calibration_samplesbool, default:True) –Whether to shuffle the dataset before calibration.
-
(max_seq_lengthint, default:384) –Maximum total input sequence length after tokenization.
-
(pad_to_max_lengthbool, default:True) –Whether to pad all samples to
max_seq_length. -
(text_columnstr, default:'text') –Key to use as the
textinput to tokenizer/processor. -
(concatenate_databool, default:False) –Whether to concatenate datapoints to fill max_seq_length.
-
(streamingbool, default:False) –True to stream data from a cloud dataset.
-
(overwrite_cachebool, default:False) –Whether to overwrite the cached preprocessed datasets.
-
(preprocessing_num_workersOptional[int], default:None) –Number of processes for preprocessing.
-
(min_tokens_per_moduleOptional[float], default:None) –Minimum percentage of tokens per module, relevant for MoE models.
-
(calibrate_moe_contextbool, default:False) –If during calibration, the MoE context should be enabled for the given model. This usually involves updating all MoE modules in the model for the duration of calibration.
-
(quantization_aware_calibrationbool, default:True) –Whether to enable quantization-aware calibration in the sequential pipeline. When True, quantization is applied during forward pass in calibration. When False, quantization is disabled during forward pass in calibration. Default is set to True. # Miscellaneous arguments
-
(output_dirOptional[str], default:None) –Path to save the output model after calibration. Nothing is saved if None.
-
(log_dirOptional[str], default:'sparse_logs') –Path to save logs during oneshot run. Nothing is logged to file if None.
Returns:
-
PreTrainedModel–The calibrated PreTrainedModel
Source code in llmcompressor/entrypoints/oneshot.py
210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 | |