----------------------------> Model Architecture <-----------------------
Baseline: Converts spectrum with CycleGAN, and F0 is converted through LG-based linear transformation;
Proposed Method: Converts the spectrum with CycleGAN, and F0 is decomposed with CWT, then converted by CycleGAN;
----------------------------> Speech Samples <-----------------------------
Source
Baseline
Proposed Method
Target
English-to-Mandarin
Mandarin-to-English