Mingjie Chen, University of Sheffield
Introduction This thesis aims to study disentanglement learning methods for Voice Conversion (VC). This demo page presents audio demos from three experiments of this thesis. The first experiment studies VQ-WAE, IN-WAE, WAE and SVQ-WAE models on a many-to-many VC task on VCTK dataset. The second experiment focuses on comparing model performance robustness of WAGAN-VC and baseline models. The third experiment explore four types of systems composing different linguistic encoder and decoder on three VC tasks. Details of experiment setups will be introduced in each experiment part.
In this experiment, we provide audio samples from two proposed models (IN-WAE, SVQ-WAE) and two baseline models (WAE and VQ-WAE).
Source | Target | VQ-WAE | WAE | IN-WAE | SVQ-WAE |
---|---|---|---|---|---|
In this experiment, three models (StarGAN-VC, StarGAN-VC2 and WAGAN-VC) are studied under two sessions. Session 1 explores six training data situations with varying numbers of speakers (N) and numbers of training sample per speaker (M), while keeping a fixing number of training samples. Session 2 explores four training data situations with a fixing number of speakers and decreasing numbers of training samples per speakers.
N | M | Source | Target | StarGAN-VC | StarGAN-VC2 | WAGAN-VC |
---|---|---|---|---|---|---|
109 | 35 | |||||
90 | 40 | |||||
60 | 60 | |||||
40 | 90 | |||||
20 | 180 | |||||
10 | 360 |
N | M | Source | Target | StarGAN-VC | StarGAN-VC2 | WAGAN-VC |
---|---|---|---|---|---|---|
109 | 35 | |||||
109 | 20 | |||||
109 | 10 | |||||
109 | 5 |
In this section, four encoder-decoder VC systems are compared on three VC tasks. We firstly present the four VC systems with different linguistic encoders and different decoders. Then we present audio demos on three VC tasks, including a many-to-many VC task on VCTK, a intral-lingual one-shot VC task on VCC2020 and a cross-lingual one-shot VC task on VCC2020.
System index | Linguistic encoder | Decoder |
---|---|---|
Sys-1 | VQ-Wav2vec | Taco-AR |
Sys-2 | VQ-Wav2vec | FastSpeech |
Sys-3 | ASR-BNE | Taco-AR |
Sys-1 | ASR-BNE | FastSpeech |
Source | Target | Sys-1 | Sys-2 | Sys-3 | Sys-4 |
---|---|---|---|---|---|
Source | Target | Sys-1 | Sys-2 | Sys-3 | Sys-4 |
---|---|---|---|---|---|
Source | Target | Sys-1 | Sys-2 | Sys-3 | Sys-4 |
---|---|---|---|---|---|