LSTM-Based Robust Voicing Decision Applied to DNN-Based Speech Synthesis


Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

The quality of statistical parametric speech synthesis (SPSS) relies on voiced/unvoiced classification. Errors in voicing decision can contribute to significant degradation in speech quality. This paper proposes a robust voicing detection method based on power spectrum and long short term memory (LSTM) network for SPSS. The performance of the proposed method is evaluated using CMU Arctic, Keele and MIR-1K databases. Further, the effectiveness of the proposed method is analyzed for deep neural network (DNN)-based SPSS. The results show that the proposed method can better classify the voiced and unvoiced speech segments, which significantly improves the speech quality.

About the authors

R. Pradeep

Advanced Technology Development Center

Author for correspondence.
Email: rpradeep@iitkgp.ac.in
India, IIT Kharagpur, 721302

M. Kiran Reddy

Department of Computer Science and Engineering

Email: rpradeep@iitkgp.ac.in
India, IIT Kharagpur, 721302

K. Sreenivasa Rao

Department of Computer Science and Engineering

Email: rpradeep@iitkgp.ac.in
India, IIT Kharagpur, 721302

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2019 Allerton Press, Inc.