alkiskoudounas commited on
Commit
85023e1
·
verified ·
1 Parent(s): 09f83d3

Created README

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - facebook/wav2vec2-base
5
+ tags:
6
+ - intent-classification
7
+ - slu
8
+ - audio-classification
9
+ metrics:
10
+ - accuracy
11
+ - f1
12
+ model-index:
13
+ - name: hubert-base-slurp
14
+ results: []
15
+ datasets:
16
+ - slurp
17
+ language:
18
+ - en
19
+ pipeline_tag: audio-classification
20
+ library_name: transformers
21
+ ---
22
+
23
+ # HuBERT-base-SLURP
24
+
25
+ This model is a fine-tuned version of [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base) on the SLURP dataset for the intent classification task.
26
+
27
+ It achieves the following results on the test set:
28
+ - Accuracy: 0.696
29
+ - F1: 0.566
30
+
31
+ ## Model description
32
+
33
+ The base [Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
34
+
35
+ ## Task and dataset description
36
+
37
+ Intent Classification (IC) classifies utterances into predefined classes to determine the intent of speakers.
38
+ The dataset used here is [SLURP](https://arxiv.org/abs/2011.13205), where each utterance is tagged with two intent labels: action and scenario.
39
+
40
+ ## Usage examples
41
+
42
+ You can use the model directly in the following manner:
43
+ ```python
44
+ import torch
45
+ import librosa
46
+ from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
47
+
48
+ ## Load an audio file
49
+ audio_array, sr = librosa.load("path_to_audio.wav", sr=16000)
50
+
51
+ ## Load model and feature extractor
52
+ model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/wav2vec2-base-slurp")
53
+ feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/wav2vec2-base")
54
+
55
+ ## Extract features
56
+ inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt")
57
+
58
+ ## Compute logits
59
+ logits = model(**inputs).logits
60
+ ```
61
+
62
+ ## Training hyperparameters
63
+
64
+ The following hyperparameters were used during training:
65
+ - learning_rate: 5e-04
66
+ - train_batch_size: 32
67
+ - eval_batch_size: 32
68
+ - seed: 42
69
+ - gradient_accumulation_steps: 4
70
+ - total_train_batch_size: 128
71
+ - optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
72
+ - lr_scheduler_type: linear
73
+ - lr_scheduler_warmup_ratio: 0.1
74
+ - warmup_steps: 3000
75
+ - num_steps: 30000
76
+
77
+ ## Framework versions
78
+
79
+ - Datasets 3.2.0
80
+ - Pytorch 2.1.2
81
+ - Tokenizers 0.20.3
82
+ - Transformers 4.45.2
83
+
84
+ ## BibTeX entry and citation info
85
+
86
+ ```bibtex
87
+ @ARTICLE{koudounas2024taslp,
88
+ author={Koudounas, Alkis and Pastor, Eliana and Attanasio, Giuseppe and Mazzia, Vittorio and Giollo, Manuel and Gueudre, Thomas and Reale, Elisa and Cagliero, Luca and Cumani, Sandro and de Alfaro, Luca and Baralis, Elena and Amberti, Daniele},
89
+ journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
90
+ title={Towards Comprehensive Subgroup Performance Analysis in Speech Models},
91
+ year={2024},
92
+ volume={32},
93
+ number={},
94
+ pages={1468-1480},
95
+ keywords={Analytical models;Task analysis;Metadata;Speech processing;Behavioral sciences;Itemsets;Speech;Speech representation;E2E-SLU models;subgroup identification;model bias analysis;divergence},
96
+ doi={10.1109/TASLP.2024.3363447}}
97
+ ```