HF中国镜像站

alkiskoudounas
/

hubert-large-slurp

Audio Classification

intent-classification

Model card Files Files and versions Community

alkiskoudounas commited on Feb 6

Commit

5516993

·

verified ·

1 Parent(s): f92a635

Created README

Files changed (1) hide show

README.md +93 -0

README.md ADDED Viewed

	@@ -0,0 +1,93 @@

+---
+license: apache-2.0
+base_model:
+- facebook/hubert-large-ls960-ft
+tags:
+- intent-classification
+- slu
+- audio-classification
+metrics:
+- accuracy
+- f1
+model-index:
+- name: hubert-large-slurp
+  results: []
+datasets:
+- slurp
+language:
+- en
+pipeline_tag: audio-classification
+library_name: transformers
+---
+# wav2vec2-base-SLURP
+This model is a fine-tuned version of [facebook/hubert-large-ls960-ft](https://huggingface.co/facebook/hubert-large-ls960-ft) on the SLURP dataset for the intent classification task.
+## Model description
+The base [Facebook's Hubert](https://ai.facebook.com/blog/hubert-self-supervised-representation-learning-for-speech-recognition-generation-and-compression) model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
+## Task and dataset description
+Intent Classification (IC) classifies utterances into predefined classes to determine the intent of speakers.
+The dataset used here is [SLURP](https://arxiv.org/abs/2011.13205), where each utterance is tagged with two intent labels: action and scenario.
+## Usage examples
+You can use the model directly in the following manner:
+```python
+import torch
+import librosa
+from transformers import AutoModelForAudioClassification, AutoFeatureExtractor
+## Load an audio file
+audio_array, sr = librosa.load("path_to_audio.wav", sr=16000)
+## Load model and feature extractor
+model = AutoModelForAudioClassification.from_pretrained("alkiskoudounas/hubert-large-slurp")
+feature_extractor = AutoFeatureExtractor.from_pretrained("facebook/hubert-large-ls960-ft")
+## Extract features
+inputs = feature_extractor(audio_array.squeeze(), sampling_rate=feature_extractor.sampling_rate, padding=True, return_tensors="pt")
+## Compute logits
+logits = model(**inputs).logits
+```
+## Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-04
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 128
+- optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.1
+- warmup_steps: 3000
+- num_steps: 30000
+## Framework versions
+- Datasets 3.2.0
+- Pytorch 2.1.2
+- Tokenizers 0.20.3
+- Transformers 4.45.2
+## BibTeX entry and citation info
+```bibtex
+@ARTICLE{koudounas2024taslp,
+  author={Koudounas, Alkis and Pastor, Eliana and Attanasio, Giuseppe and Mazzia, Vittorio and Giollo, Manuel and Gueudre, Thomas and Reale, Elisa and Cagliero, Luca and Cumani, Sandro and de Alfaro, Luca and Baralis, Elena and Amberti, Daniele},
+  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
+  title={Towards Comprehensive Subgroup Performance Analysis in Speech Models},
+  year={2024},
+  volume={32},
+  number={},
+  pages={1468-1480},
+  keywords={Analytical models;Task analysis;Metadata;Speech processing;Behavioral sciences;Itemsets;Speech;Speech representation;E2E-SLU models;subgroup identification;model bias analysis;divergence},
+  doi={10.1109/TASLP.2024.3363447}}
+```