llmcompressor.pipelines.cache
Classes:
-
IntermediateValue–Dataclass which recursively defines offloaded values and which device to onload to
-
IntermediatesCache–Cache which stores intermediate values (activations) produced by batched, sequential
IntermediateValue dataclass
Dataclass which recursively defines offloaded values and which device to onload to
Parameters:
-
(valueUnion[Tensor, IntermediateValue, Any]) –either an offloaded Tensor, an primative value, or a recursable value
-
(deviceUnion[device, None]) –if the value is a Tensor, then the device to onload the tensor to, otherwise None
IntermediatesCache
IntermediatesCache(
batch_intermediates: Optional[
List[IntermediateValues]
] = None,
offload_device: Optional[device] = "cpu",
)
Cache which stores intermediate values (activations) produced by batched, sequential execution of models. Values are offloaded to the offload_device when stored in the cache and onloaded to their original device when fetched from the cache. If offload_device is None, values will not be offloaded at all.
Currently supports nested offloading of dataclass instances and tuples
Construct using empty and from_dataloader class methods
Methods:
-
append–Append new values to the cache. The new values will be assigned the next
-
delete–Delete values from the cache
-
empty–Construct an empty cache
-
fetch–Fetch values belonging to a batch
-
from_dataloader–Initialize a cache with data from the provided dataloader
-
size–Returns the memory used by cached values, keyed by device, in bytes
-
update–Update/put values belonging to a batch
Source code in llmcompressor/pipelines/cache.py
append
Append new values to the cache. The new values will be assigned the next available batch index
Parameters:
-
(valuesDict[str, Any]) –dictionary mapping keys to values used for update
Source code in llmcompressor/pipelines/cache.py
delete
Delete values from the cache
Parameters:
-
(batch_indexint) –index of batch whose values will be deleted
-
(consumed_namesOptional[List[str]], default:None) –list of keys whose values will be deleted, defaults to removing all keys
Source code in llmcompressor/pipelines/cache.py
empty classmethod
Construct an empty cache
Parameters:
-
(num_batchesint) –the expected number of batches to be stored
-
(offload_devicedevice) –device to offload values to
Source code in llmcompressor/pipelines/cache.py
fetch
Fetch values belonging to a batch
Parameters:
-
(batch_indexint) –index of batch whose values are being fetched
-
(input_namesOptional[List[str]], default:None) –list of keys whose values are being fetched
Returns:
-
Dict[str, Any]–dictionary mapping keys to onloaded values
Source code in llmcompressor/pipelines/cache.py
from_dataloader classmethod
from_dataloader(
dataloader: DataLoader,
model_device: device = torch.device("cpu"),
mask_padding: bool = True,
offload_device: Optional[device] = torch.device("cpu"),
)
Initialize a cache with data from the provided dataloader
Parameters:
-
(dataloaderDataLoader) –dataloader which generates values to be cached
-
(model_devicedevice, default:device('cpu')) –device which values will be onloaded to when fetched
-
(mask_paddingbool, default:True) –zero out padding tokens if True. This affects modifiers such as GPTQ and SparseGPT
-
(offload_deviceOptional[device], default:device('cpu')) –device to offload values to
Source code in llmcompressor/pipelines/cache.py
size
Returns the memory used by cached values, keyed by device, in bytes
Returns:
-
Dict[device, int]–dictionary mapping torch device to number of bytes in cache
Source code in llmcompressor/pipelines/cache.py
update
Update/put values belonging to a batch
Parameters:
-
(batch_indexint) –index of batch whose values will be updated
-
(valuesDict[str, Any]) –dictionary mapping keys to values used for update