llmcompressor.transformers
Tools for integrating LLM Compressor with transformers training flows.
Modules:
-
compression
– -
finetune
– -
sparsification
–Objects, classes, and methods for applying sparsification algorithms to
-
tracing
– -
utils
–Utilities for applying sparsification algorithms to Hugging Face transformers flows
Classes:
-
SessionManagerMixIn
–Mix-In class to extend the Hugging Face Trainer class to support LLM Compressor
-
TextGenerationDataset
–Base class for text datasets. Applies the following transformations to a dataset
Functions:
-
is_model_ct_quantized_from_path
–Determine if model from path is quantized based
SessionManagerMixIn
SessionManagerMixIn(
recipe: str,
model_args: ModelArguments,
dataset_args: Optional[DatasetArguments] = None,
teacher: Optional[Union[Module, str]] = None,
recipe_args: Optional[
Union[Dict[str, Any], str]
] = None,
**kwargs
)
Mix-In class to extend the Hugging Face Trainer class to support LLM Compressor recipes for one-shot and finetuning flows.
Parameters:
-
recipe
str
) –path to recipe file to apply during training
-
recipe_args
Optional[Union[Dict[str, Any], str]]
, default:None
) –additional kwargs to use for evaluating recipe
-
dataset_args
Optional[DatasetArguments]
, default:None
) –kwargs for configuring dataset loading
-
teacher
Optional[Union[Module, str]]
, default:None
) –optional teacher model to use for distillation
Methods:
-
compute_loss
–Override for the compute_loss to factor trigger callbacks and filter columns
-
create_optimizer
–Override the optimizer to apply and update the recipe while training.
-
create_scheduler
–Create an LR scheduler to work with the applied recipes. This is a placeholder
-
finalize_session
–Wrap up training by finalizing all modifiers initialized in the current session
-
initialize_session
–Initialize the CompressionSession from the specified epoch, evaluates the recipe
-
log_model_sparsification
–Log the current model sparsification info including pruned and quantized states
-
maybe_log_model_sparsification
–Log info on model sparsity and quantization if possible. Only print logs on the
-
save_model
–Override of the save_model function and expects it to exist in the parent.
-
train
–Run a sparsification training cycle. Runs initialization for the sparse session
-
training_step
–Overrides the Trainer's training step to trigger the batch_start callback to
Source code in llmcompressor/transformers/finetune/session_mixin.py
compute_loss
compute_loss(
model: Module,
inputs: Dict[str, Any],
return_outputs: bool = False,
num_items_in_batch: Optional[Tensor] = None,
) -> Union[torch.Tensor, Tuple[torch.Tensor, Any]]
Override for the compute_loss to factor trigger callbacks and filter columns
Parameters:
-
model
Module
) –the model to compute the loss for
-
inputs
Dict[str, Any]
) –the inputs to pass through the model for calculating the loss
-
return_outputs
bool
, default:False
) –True to return the outputs with the loss, False otherwise
-
num_items_in_batch
Optional[Tensor]
, default:None
) –the number of items which contribute to loss
Returns:
-
Union[Tensor, Tuple[Tensor, Any]]
–the resulting loss if not return_outputs, otherwise a tuple containing the loss and the model's outputs
Source code in llmcompressor/transformers/finetune/session_mixin.py
create_optimizer
Override the optimizer to apply and update the recipe while training. create_optimizer must exist in the parent class and should set self.optimizer to the optimizer state and optionally set self.scaler if using amp.
Source code in llmcompressor/transformers/finetune/session_mixin.py
create_scheduler
Create an LR scheduler to work with the applied recipes. This is a placeholder that just calls the super method, but would be expanded upon if we ever implement a LearningRateModifier.
Parameters:
-
num_training_steps
int
) –the total number of training steps
-
optimizer
Optimizer
, default:None
) –pre-initialized optimizer
Source code in llmcompressor/transformers/finetune/session_mixin.py
finalize_session
Wrap up training by finalizing all modifiers initialized in the current session
Source code in llmcompressor/transformers/finetune/session_mixin.py
initialize_session
Initialize the CompressionSession from the specified epoch, evaluates the recipe and initialized the modifiers for the training session
Parameters:
-
epoch
float
) –Epoch to initialize session from, usually 0 unless loading from a checkpoint
-
checkpoint
Optional[str]
, default:None
) –Optional checkpoint to initialize from to continue training
-
stage
Optional[str]
, default:None
) –Optional stage of recipe to run, or None to run all stages
Source code in llmcompressor/transformers/finetune/session_mixin.py
log_model_sparsification
Log the current model sparsification info including pruned and quantized states
Source code in llmcompressor/transformers/finetune/session_mixin.py
maybe_log_model_sparsification
Log info on model sparsity and quantization if possible. Only print logs on the main process, and avoid logging for quantized FSDP models
Source code in llmcompressor/transformers/finetune/session_mixin.py
save_model
save_model(
output_dir: str,
_internal_call: bool = False,
skip_sparsity_compression_stats: Optional[bool] = True,
)
Override of the save_model function and expects it to exist in the parent. Calls into super() to save the model and additionally saves any recipes that were used with the model within the model folder.
Parameters:
-
output_dir
str
) –the path to save the recipes into
-
_internal_call
bool
, default:False
) –True if this is an internal call from the trainer in super(). Called from self.save_model(output_dir, _internal_call=True) in transformers/trainer/Trainer::_save_checkpoint
Source code in llmcompressor/transformers/finetune/session_mixin.py
train
Run a sparsification training cycle. Runs initialization for the sparse session before calling super().train() and finalization of the session after.
Logs sparsification details for the trained model.
Parameters:
-
args
positional args to pass to super().train()
-
stage
Optional[str]
, default:None
) –Optional stage of recipe to run, or None to run all stages
-
kwargs
keyword args to pass to super().train()
Returns:
- –
the output from super.train()
Source code in llmcompressor/transformers/finetune/session_mixin.py
training_step
training_step(
model: Module,
inputs: Dict[str, Union[Tensor, Any]],
num_items_in_batch: Optional[int] = None,
) -> torch.Tensor
Overrides the Trainer's training step to trigger the batch_start callback to the modifiers, then calls the parent function.
Parameters:
-
model
Module
) –the model to compute the loss for
-
inputs
Dict[str, Union[Tensor, Any]]
) –the inputs to pass through the model for calculating the loss
Returns:
-
Tensor
–output of the model
Source code in llmcompressor/transformers/finetune/session_mixin.py
TextGenerationDataset
Bases: RegistryMixin
Base class for text datasets. Applies the following transformations to a dataset in order to prepare the dataset to be loaded by a dataloader
- Load dataset from huggingface or local cache
- Preprocess dataset according to preprocess function or chat/dataset template
- Tokenize dataset using model tokenizer/processor
- Apply post processing such as grouping text and/or adding labels for finetuning
Parameters:
-
dataset_args
DatasetArguments
) –configuration settings for dataset loading
-
split
str
) –split from dataset to load, for instance
test
ortrain[:5%]
-
processor
Processor
) –processor or tokenizer to use on dataset
Methods:
-
load_dataset
–Load the raw dataset from Hugging Face, using cached copy if available
-
map
–Wrapper function around Dataset.map and IterableDataset.map.
Attributes:
-
preprocess
(Union[Callable[[LazyRow], Any], None]
) –The function must return keys which correspond to processor/tokenizer kwargs,
Source code in llmcompressor/transformers/finetune/data/base.py
preprocess cached
property
The function must return keys which correspond to processor/tokenizer kwargs, optionally including PROMPT_KEY
load_dataset
Load the raw dataset from Hugging Face, using cached copy if available
Parameters:
-
cache_dir
disk location to search for cached dataset
Returns:
- –
the requested dataset
Source code in llmcompressor/transformers/finetune/data/base.py
map
map(
dataset: Union[Dataset, IterableDataset],
function: Callable[[Any], Any],
**kwargs
) -> Union[Dataset, IterableDataset]
Wrapper function around Dataset.map and IterableDataset.map.
If the dataset is streaming (in the case of IterableDataset), non-applicable arguments are ignored and the dataset features are resolved
Source code in llmcompressor/transformers/finetune/data/base.py
is_model_ct_quantized_from_path
Determine if model from path is quantized based on the config
Parameters:
-
path
str
) –path to the model or HF stub
Returns:
-
bool
–True if config contains quantization_config from the given path