----------------------------> Model Architecture <-----------------------



Fig.1 The training phase of the proposed CycleGAN-based cross-lingual VC framework. Blue box represents the cross-lingual VC model for prosody conversion called F0-CycleGAN and yellow box is the cross-lingual VC model for spectrum conversion called MCEP-CycleGAN.


Fig.2 The run-time conversion phase of the proposed CycleGAN-based cross-lingual VC framework. Colored boxes represent the trained models in Fig. 1.

Experimental Setup:

Baseline: Converts spectrum with CycleGAN, and F0 is converted through LG-based linear transformation;
Proposed Method: Converts the spectrum with CycleGAN, and F0 is decomposed with CWT, then converted by CycleGAN;

----------------------------> Speech Samples <-----------------------------



Source Baseline Proposed Method Target
English-to-Mandarin
Mandarin-to-English