alkiskoudounas commited on
Commit
5516993
·
verified ·
1 Parent(s): f92a635

Created README

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - facebook/hubert-large-ls960-ft
5
+ tags:
6
+ - intent-classification
7
+ - slu
8
+ - audio-classification
9
+ metrics:
10
+ - accuracy
11
+ - f1
12
+ model-index:
13
+ - name: hubert-large-slurp
14
+ results: []
15
+ datasets:
16
+ - slurp
17
+ language:
18
+ - en
19
+ pipeline_tag: audio-classification
20
+ library_name: transformers
21
+ ---
22
+
23
+ # wav2vec2-base-SLURP
24
+
25
+ This model is a fine-tuned version of [facebook/hubert-large-ls960-ft](https://huggingface.co/facebook/hubert-large-ls960-ft) on the SLURP dataset for the intent classification task.
26
+
27
+ ## Model description
28
+
29
+ The base [Facebook's Hubert](https://ai.facebook.com/blog/hubert-self-supervised-representation-learning-for-speech-recognition-generation-and-compression) model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
30
+
31
+ ## Task and dataset description
32
+
33
+ Intent Classification (IC) classifies utterances into predefined classes to determine the intent of speakers.
34
+ The dataset used here is [SLURP](https://arxiv.org/abs/2011.13205), where each utterance is tagged with two intent labels: action and scenario.
35
+
36
+ ## Usage examples
37
+
38
+ You can use the model directly in the following manner:
39
+ ```python
40
+ import torch
41
+ import librosa
42
+ from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
43
+
44
+ ## Load an audio file
45
+ audio_array, sr = librosa.load("path_to_audio.wav", sr=16000)
46
+
47
+ ## Load model and feature extractor
48
+ model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/hubert-large-slurp")
49
+ feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/hubert-large-ls960-ft")
50
+
51
+ ## Extract features
52
+ inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt")
53
+
54
+ ## Compute logits
55
+ logits = model(**inputs).logits
56
+ ```
57
+
58
+ ## Training hyperparameters
59
+
60
+ The following hyperparameters were used during training:
61
+ - learning_rate: 5e-04
62
+ - train_batch_size: 32
63
+ - eval_batch_size: 32
64
+ - seed: 42
65
+ - gradient_accumulation_steps: 4
66
+ - total_train_batch_size: 128
67
+ - optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
68
+ - lr_scheduler_type: linear
69
+ - lr_scheduler_warmup_ratio: 0.1
70
+ - warmup_steps: 3000
71
+ - num_steps: 30000
72
+
73
+ ## Framework versions
74
+
75
+ - Datasets 3.2.0
76
+ - Pytorch 2.1.2
77
+ - Tokenizers 0.20.3
78
+ - Transformers 4.45.2
79
+
80
+ ## BibTeX entry and citation info
81
+
82
+ ```bibtex
83
+ @ARTICLE{koudounas2024taslp,
84
+ author={Koudounas, Alkis and Pastor, Eliana and Attanasio, Giuseppe and Mazzia, Vittorio and Giollo, Manuel and Gueudre, Thomas and Reale, Elisa and Cagliero, Luca and Cumani, Sandro and de Alfaro, Luca and Baralis, Elena and Amberti, Daniele},
85
+ journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
86
+ title={Towards Comprehensive Subgroup Performance Analysis in Speech Models},
87
+ year={2024},
88
+ volume={32},
89
+ number={},
90
+ pages={1468-1480},
91
+ keywords={Analytical models;Task analysis;Metadata;Speech processing;Behavioral sciences;Itemsets;Speech;Speech representation;E2E-SLU models;subgroup identification;model bias analysis;divergence},
92
+ doi={10.1109/TASLP.2024.3363447}}
93
+ ```