File size: 4,308 Bytes
b93dec3
dc40947
b93dec3
 
 
 
 
 
 
 
 
 
 
 
 
924c34a
 
6f769e0
 
51ce635
 
 
d9750a0
 
 
 
e2d1418
d9750a0
 
 
9ffc9d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d9750a0
 
 
 
070fc86
 
 
 
 
6d4a6be
 
 
 
 
 
 
3d9aeca
6d4a6be
 
 
 
 
 
070fc86
a21f9c5
 
 
 
 
 
 
 
 
 
 
 
5d184ce
 
e6f117c
 
 
 
 
 
 
 
 
5d184ce
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
license: cc-by-nc-sa-4.0
language:
- en
- pt
- es
- zh
- nl
- fr
- de
- it
- ja
- pl
pipeline_tag: audio-to-audio
tags:
- audio
- voice
- voice conversion
- singing voice conversion
- vc
- svc
- multilingual
---

# FreeSVC: Zero-shot Multilingual Singing Voice Conversion

**FreeSVC** is a promising multilingual zero-shot singing voice conversion model. It enables the conversion of singing voices across languages without the need for extensive language-specific training. [GitHub repository](https://github.com/freds0/free-svc). [Paper arXiv pre-print](https://arxiv.org/abs/2501.05586).

## Supported Languages

| Language    | ID  | Status       | Speech Data | Singing Data |
|------------|-----|--------------|-------------|--------------|
| Chinese    | 0   | ✅ Full      | 255h        | 70h        |
| Dutch      | 1   | ✅ Full      | Part of CML | -           |
| English    | 2   | ✅ Full      | 921h        | 47h         |
| French     | 3   | ✅ Full      | Part of CML | -           |
| German     | 4   | ✅ Full      | Part of CML | -           |
| Italian    | 5   | ✅ Full      | Part of CML | -           |
| Japanese   | 6   | ✅ Full      | 30h         | -           |
| Other*     | 7   | ⚠️ Partial   | -           | 10h         |
| Polish     | 8   | ✅ Full      | Part of CML | -           |
| Portuguese | 9   | ✅ Full      | Part of CML | -           |
| Spanish    | 10  | ✅ Full      | Part of CML | -           |

*Note: The "Other" category is used for vocal techniques without content.

## Model Overview
FreeSVC leverages an enhanced VITS architecture integrated with Speaker-invariant Clustering (SPIN) and the ECAPA2 speaker encoder. This combination effectively separates speaker characteristics from linguistic content, ensuring high-quality and natural-sounding voice conversions across multiple languages.

## Training Datasets

FreeSVC was trained on a diverse set of speech and singing datasets covering multiple languages:

| **Dataset**          | **Hours**  | **Language** | **Type**    |
|----------------------|------------|--------------|--------------|
| AISHELL-1            | 170h       | Chinese      | Speech      |
| AISHELL-3            | 85h        | Chinese      | Speech      |
| CML-TTS              | 3.1k       | 7 Languages  | Speech      |
| HiFiTTS              | 292h       | English      | Speech      |
| JVS                  | 30h        | Japanese     | Speech      |
| LibriTTS-R           | 585h       | English      | Speech      |
| NUS (NHSS)           | 7h         | English      | Speech, Singing        |
| OpenSinger           | 50h        | Chinese      | Singing     |
| Opencpop             | 5h         | Chinese      | Singing     |
| PopBuTFy             | 10h, 40h   | Chinese, English | Singing |
| POPCS                | 5h         | Chinese      | Singing     |
| VCTK                 | 44h        | English      | Speech      |
| VocalSet             | 10h        | Other      | Singing     |

## License

FreeSVC is released under the **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)** license. This means:

- The model **can only be used for research and non-commercial purposes**. Any commercial use is strictly prohibited.
- Any derivative works must be **shared under the same license**.
- Proper attribution must be given when using the model.

Users must also **comply with the licenses of the original datasets** used for training. Some datasets may have additional restrictions beyond CC BY-NC-SA 4.0. Ensure you review and adhere to their terms before using the model.

For full details, refer to the [CC BY-NC-SA 4.0 License](https://creativecommons.org/licenses/by-nc-sa/4.0/).

## Citation
```
@misc{ferreira2025freesvczeroshotmultilingualsinging,
      title={FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion}, 
      author={Alef Iury Siqueira Ferreira and Lucas Rafael Gris and Augusto Seben da Rosa and Frederico Santos de Oliveira and Edresson Casanova and Rafael Teixeira Sousa and Arnaldo Candido Junior and Anderson da Silva Soares and Arlindo Galvão Filho},
      year={2025},
      eprint={2501.05586},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2501.05586}, 
}
```