Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN

----------------------------> Model Architecture <-----------------------

Fig.1 The training phase of the proposed CycleGAN-based cross-lingual VC framework. Blue box represents the cross-lingual VC model for prosody conversion called F0-CycleGAN and yellow box is the cross-lingual VC model for spectrum conversion called MCEP-CycleGAN.

Fig.2 The run-time conversion phase of the proposed CycleGAN-based cross-lingual VC framework. Colored boxes represent the trained models in Fig. 1.

Experimental Setup:

Baseline: Converts spectrum with CycleGAN, and F0 is converted through LG-based linear transformation;

Proposed Method: Converts the spectrum with CycleGAN, and F0 is decomposed with CWT, then converted by CycleGAN;

----------------------------> Speech Samples <-----------------------------

	Source	Baseline	Proposed Method	Target
English-to-Mandarin



















Mandarin-to-English