llmcompressor.transformers
Tools for integrating LLM Compressor with transformers training flows.
Modules:
-
compression– -
finetune– -
tracing– -
utils–Utilities for applying sparsification algorithms to Hugging Face transformers flows
Classes:
-
SessionManagerMixIn–Mix-In class to extend the Hugging Face Trainer class to support LLM Compressor
-
TextGenerationDataset–Base class for text datasets. Applies the following transformations to a dataset
Functions:
-
is_model_ct_quantized_from_path–Determine if model from path is quantized based
SessionManagerMixIn
SessionManagerMixIn(
recipe: str,
model_args: ModelArguments,
dataset_args: Optional[DatasetArguments] = None,
teacher: Optional[Union[Module, str]] = None,
recipe_args: Optional[
Union[Dict[str, Any], str]
] = None,
**kwargs,
)
Mix-In class to extend the Hugging Face Trainer class to support LLM Compressor recipes for one-shot and finetuning flows.
Parameters:
-
(recipestr) –path to recipe file to apply during training
-
(recipe_argsOptional[Union[Dict[str, Any], str]], default:None) –additional kwargs to use for evaluating recipe
-
(dataset_argsOptional[DatasetArguments], default:None) –kwargs for configuring dataset loading
-
(teacherOptional[Union[Module, str]], default:None) –optional teacher model to use for distillation
Methods:
-
compute_loss–Override for the compute_loss to factor trigger callbacks and filter columns
-
create_optimizer–Override the optimizer to apply and update the recipe while training.
-
create_scheduler–Create an LR scheduler to work with the applied recipes. This is a placeholder
-
finalize_session–Wrap up training by finalizing all modifiers initialized in the current session
-
initialize_session–Initialize the CompressionSession from the specified epoch, evaluates the recipe
-
log_model_sparsification–Log the current model sparsification info including pruned and quantized states
-
maybe_log_model_sparsification–Log info on model sparsity and quantization if possible. Only print logs on the
-
save_model–Override of the save_model function and expects it to exist in the parent.
-
train–Run a sparsification training cycle. Runs initialization for the sparse session
-
training_step–Overrides the Trainer's training step to trigger the batch_start callback to
Source code in llmcompressor/transformers/finetune/session_mixin.py
compute_loss
compute_loss(
model: Module,
inputs: Dict[str, Any],
return_outputs: bool = False,
num_items_in_batch: Optional[Tensor] = None,
) -> Union[torch.Tensor, Tuple[torch.Tensor, Any]]
Override for the compute_loss to factor trigger callbacks and filter columns
Parameters:
-
(modelModule) –the model to compute the loss for
-
(inputsDict[str, Any]) –the inputs to pass through the model for calculating the loss
-
(return_outputsbool, default:False) –True to return the outputs with the loss, False otherwise
-
(num_items_in_batchOptional[Tensor], default:None) –the number of items which contribute to loss
Returns:
-
Union[Tensor, Tuple[Tensor, Any]]–the resulting loss if not return_outputs, otherwise a tuple containing the loss and the model's outputs
Source code in llmcompressor/transformers/finetune/session_mixin.py
create_optimizer
Override the optimizer to apply and update the recipe while training. create_optimizer must exist in the parent class and should set self.optimizer to the optimizer state and optionally set self.scaler if using amp.
Source code in llmcompressor/transformers/finetune/session_mixin.py
create_scheduler
Create an LR scheduler to work with the applied recipes. This is a placeholder that just calls the super method, but would be expanded upon if we ever implement a LearningRateModifier.
Parameters:
-
(num_training_stepsint) –the total number of training steps
-
(optimizerOptimizer, default:None) –pre-initialized optimizer
Source code in llmcompressor/transformers/finetune/session_mixin.py
finalize_session
Wrap up training by finalizing all modifiers initialized in the current session
Source code in llmcompressor/transformers/finetune/session_mixin.py
initialize_session
Initialize the CompressionSession from the specified epoch, evaluates the recipe and initialized the modifiers for the training session
Parameters:
-
(epochfloat) –Epoch to initialize session from, usually 0 unless loading from a checkpoint
-
(checkpointOptional[str], default:None) –Optional checkpoint to initialize from to continue training
-
(stageOptional[str], default:None) –Optional stage of recipe to run, or None to run all stages
Source code in llmcompressor/transformers/finetune/session_mixin.py
log_model_sparsification
Log the current model sparsification info including pruned and quantized states
Source code in llmcompressor/transformers/finetune/session_mixin.py
maybe_log_model_sparsification
Log info on model sparsity and quantization if possible. Only print logs on the main process, and avoid logging for quantized FSDP models
Source code in llmcompressor/transformers/finetune/session_mixin.py
save_model
save_model(
output_dir: str,
_internal_call: bool = False,
skip_sparsity_compression_stats: Optional[bool] = True,
)
Override of the save_model function and expects it to exist in the parent. Calls into super() to save the model and additionally saves any recipes that were used with the model within the model folder.
Parameters:
-
(output_dirstr) –the path to save the recipes into
-
(_internal_callbool, default:False) –True if this is an internal call from the trainer in super(). Called from self.save_model(output_dir, _internal_call=True) in transformers/trainer/Trainer::_save_checkpoint
Source code in llmcompressor/transformers/finetune/session_mixin.py
train
Run a sparsification training cycle. Runs initialization for the sparse session before calling super().train() and finalization of the session after.
Logs sparsification details for the trained model.
Parameters:
-
–argspositional args to pass to super().train()
-
(stageOptional[str], default:None) –Optional stage of recipe to run, or None to run all stages
-
–kwargskeyword args to pass to super().train()
Returns:
- –
the output from super.train()
Source code in llmcompressor/transformers/finetune/session_mixin.py
training_step
training_step(
model: Module,
inputs: Dict[str, Union[Tensor, Any]],
num_items_in_batch: Optional[int] = None,
) -> torch.Tensor
Overrides the Trainer's training step to trigger the batch_start callback to the modifiers, then calls the parent function.
Parameters:
-
(modelModule) –the model to compute the loss for
-
(inputsDict[str, Union[Tensor, Any]]) –the inputs to pass through the model for calculating the loss
Returns:
-
Tensor–output of the model
Source code in llmcompressor/transformers/finetune/session_mixin.py
TextGenerationDataset
Bases: RegistryMixin
Base class for text datasets. Applies the following transformations to a dataset in order to prepare the dataset to be loaded by a dataloader
- Load dataset from huggingface or local cache
- Preprocess dataset according to preprocess function or chat/dataset template
- Tokenize dataset using model tokenizer/processor
- Apply post processing such as grouping text and/or adding labels for finetuning
Parameters:
-
(dataset_argsDatasetArguments) –configuration settings for dataset loading
-
(splitstr) –split from dataset to load, for instance
testortrain[:5%] -
(processorProcessor) –processor or tokenizer to use on dataset
Methods:
-
load_dataset–Load the raw dataset from Hugging Face, using cached copy if available
-
map–Wrapper function around Dataset.map and IterableDataset.map.
Attributes:
-
preprocess(Union[Callable[[LazyRow], Any], None]) –The function must return keys which correspond to processor/tokenizer kwargs,
Source code in llmcompressor/transformers/finetune/data/base.py
preprocess cached property
The function must return keys which correspond to processor/tokenizer kwargs, optionally including PROMPT_KEY
load_dataset
Load the raw dataset from Hugging Face, using cached copy if available
Parameters:
-
–cache_dirdisk location to search for cached dataset
Returns:
- –
the requested dataset
Source code in llmcompressor/transformers/finetune/data/base.py
map
map(
dataset: Union[Dataset, IterableDataset],
function: Callable[[Any], Any],
**kwargs,
) -> Union[Dataset, IterableDataset]
Wrapper function around Dataset.map and IterableDataset.map.
If the dataset is streaming (in the case of IterableDataset), non-applicable arguments are ignored and the dataset features are resolved
Source code in llmcompressor/transformers/finetune/data/base.py
is_model_ct_quantized_from_path
Determine if model from path is quantized based on the config
Parameters:
-
(pathstr) –path to the model or HF stub
Returns:
-
bool–True if config contains quantization_config from the given path