Automatic Speech Recognition
ESPnet
multilingual
audio
speech-translation
language-identification
pyf98 commited on
Commit
f880d37
·
verified ·
1 Parent(s): e4d3e3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -11,8 +11,8 @@ datasets:
11
  license: cc-by-4.0
12
  ---
13
 
14
- [OWSM-CTC](https://arxiv.org/abs/2402.12654) is an encoder-only speech foundation model based on hierarchical multi-task self-conditioned CTC.
15
- It is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the previous [encoder-decoder OWSM](https://arxiv.org/abs/2401.16658).
16
 
17
  Due to time constraint, the model used in the paper was trained for 40 "epochs". The new model trained for 45 "epochs" (approximately three entire passes on the full data) is also added in this repo in order to match the setup of encoder-decoder OWSM. It can have better performance than the old one in many test sets.
18
 
 
11
  license: cc-by-4.0
12
  ---
13
 
14
+ [OWSM-CTC](https://aclanthology.org/2024.acl-long.549/) (Peng et al., ACL 2024) is an encoder-only speech foundation model based on hierarchical multi-task self-conditioned CTC.
15
+ It is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the project, [Open Whisper-style Speech Model (OWSM)](https://arxiv.org/abs/2401.16658).
16
 
17
  Due to time constraint, the model used in the paper was trained for 40 "epochs". The new model trained for 45 "epochs" (approximately three entire passes on the full data) is also added in this repo in order to match the setup of encoder-decoder OWSM. It can have better performance than the old one in many test sets.
18