llmcompressor.transformers.finetune.data.peoples_speech
Classes:
-
PeoplesSpeech–ML Commons People's Speech audio dataset
PeoplesSpeech
Bases: TextGenerationDataset
ML Commons People's Speech audio dataset
Unfortunately, due to the specialized nature of audio model preprocessing, some model specific code must be defined here. This dataset has been tested with the WhisperForConditionalGeneration and Qwen2AudioForConditionalGeneration model classes
Parameters:
-
–data_argsconfiguration settings for dataset loading
-
(splitstr) –split from dataset to load, for instance
testortrain[:5%] -
(processorProcessor) –processor or tokenizer to use on dataset