llmcompressor.pipelines.cache
Classes:
-
IntermediateValue
–Dataclass which recursively defines offloaded values and which device to onload to
-
IntermediatesCache
–Cache which stores intermediate values (activations) produced by batched, sequential
IntermediateValue dataclass
Dataclass which recursively defines offloaded values and which device to onload to
Parameters:
-
value
Union[Tensor, IntermediateValue, Any]
) –either an offloaded Tensor, an primative value, or a recursable value
-
device
Union[device, None]
) –if the value is a Tensor, then the device to onload the tensor to, otherwise None
IntermediatesCache
IntermediatesCache(
batch_intermediates: Optional[
List[IntermediateValues]
] = None,
offload_device: Optional[device] = "cpu",
)
Cache which stores intermediate values (activations) produced by batched, sequential execution of models. Values are offloaded to the offload_device
when stored in the cache and onloaded to their original device when fetched from the cache. If offload_device
is None, values will not be offloaded at all.
Currently supports nested offloading of dataclass instances and tuples
Construct using empty
and from_dataloader
class methods
Methods:
-
append
–Append new values to the cache. The new values will be assigned the next
-
delete
–Delete values from the cache
-
empty
–Construct an empty cache
-
fetch
–Fetch values belonging to a batch
-
from_dataloader
–Initialize a cache with data from the provided dataloader
-
size
–Returns the memory used by cached values, keyed by device, in bytes
-
update
–Update/put values belonging to a batch
Source code in llmcompressor/pipelines/cache.py
append
Append new values to the cache. The new values will be assigned the next available batch index
Parameters:
-
values
Dict[str, Any]
) –dictionary mapping keys to values used for update
Source code in llmcompressor/pipelines/cache.py
delete
Delete values from the cache
Parameters:
-
batch_index
int
) –index of batch whose values will be deleted
-
consumed_names
Optional[List[str]]
, default:None
) –list of keys whose values will be deleted, defaults to removing all keys
Source code in llmcompressor/pipelines/cache.py
empty classmethod
Construct an empty cache
Parameters:
-
num_batches
int
) –the expected number of batches to be stored
-
offload_device
device
) –device to offload values to
Source code in llmcompressor/pipelines/cache.py
fetch
Fetch values belonging to a batch
Parameters:
-
batch_index
int
) –index of batch whose values are being fetched
-
input_names
Optional[List[str]]
, default:None
) –list of keys whose values are being fetched
Returns:
-
Dict[str, Any]
–dictionary mapping keys to onloaded values
Source code in llmcompressor/pipelines/cache.py
from_dataloader classmethod
from_dataloader(
dataloader: DataLoader,
model_device: device = torch.device("cpu"),
mask_padding: bool = True,
offload_device: Optional[device] = torch.device("cpu"),
)
Initialize a cache with data from the provided dataloader
Parameters:
-
dataloader
DataLoader
) –dataloader which generates values to be cached
-
model_device
device
, default:device('cpu')
) –device which values will be onloaded to when fetched
-
mask_padding
bool
, default:True
) –zero out padding tokens if True. This affects modifiers such as GPTQ and SparseGPT
-
offload_device
Optional[device]
, default:device('cpu')
) –device to offload values to
Source code in llmcompressor/pipelines/cache.py
size
Returns the memory used by cached values, keyed by device, in bytes
Returns:
-
Dict[device, int]
–dictionary mapping torch device to number of bytes in cache
Source code in llmcompressor/pipelines/cache.py
update
Update/put values belonging to a batch
Parameters:
-
batch_index
int
) –index of batch whose values will be updated
-
values
Dict[str, Any]
) –dictionary mapping keys to values used for update