llmcompressor.transformers.finetune.data.peoples_speech
Classes:
-
PeoplesSpeech
–ML Commons People's Speech audio dataset
PeoplesSpeech
Bases: TextGenerationDataset
ML Commons People's Speech audio dataset
Unfortunately, due to the specialized nature of audio model preprocessing, some model specific code must be defined here. This dataset has been tested with the WhisperForConditionalGeneration and Qwen2AudioForConditionalGeneration model classes
Parameters:
-
data_args
configuration settings for dataset loading
-
split
str
) –split from dataset to load, for instance
test
ortrain[:5%]
-
processor
Processor
) –processor or tokenizer to use on dataset