Method for measuring voice source parameters for linear predictive speech coding systems
- Authors: Savchenko V.V.1, Savchenko L.V.1
-
Affiliations:
- National Research University Higher School of Economics
- Issue: Vol 74, No 6 (2025)
- Pages: 74-84
- Section: ACOUSTIC MEASUREMENTS
- URL: https://journals.rcsi.science/0368-1025/article/view/380356
- ID: 380356
Cite item
Abstract
In the context of the current direction of research in the fi eld of acoustic measurements – non-invasive analysis of the voice source – the problem of measuring excitation parameters for a vocoder with linear prediction is considered. The acute problem of high computational complexity of known methods of its solution based on the technique of “analysis by synthesis” is indicated. In order to overcome this problem, a high-speed acoustic measurement method has been developed based on the criterion of the minimum average sample value of the linear prediction error. It is shown that this criterion implements the principle of minimizing the energy consumption of the announcer for the speech production. An example of technical implementing the developed method is considered, and estimates of its computational complexity are given. It is shown that, compared to the well-known method of multi-pulse excitation of a linear prediction vocoder using two address books: adaptive and stochastic, the costs of implementation of the proposed method are reduced by several orders of magnitude. To confi rm this conclusion, a natural experiment was conducted using the author's software on a set of vowel phonemes from a control speaker. It is shown that by optimizing the excitation signal shape, the mean sample value of the linear prediction error is signifi cantly reduced. The obtained results can be useful in developing new and upgrading existing systems and technologies for speech coding and synthesis, mobile speech communication and other applications of digital speech signal processing with data compression based on the linear prediction model.
About the authors
V. V. Savchenko
National Research University Higher School of Economics
Email: vvsavchenko@yandex.ru
ORCID iD: 0000-0003-3045-3337
L. V. Savchenko
National Research University Higher School of Economics
Email: vvsavchenko@yandex.ru
ORCID iD: 0000-0002-2776-5471
References
Ternström S. Special Issue on current trends and future directions in voice acoustics measurement. Applied Sciences, 13(6), 3514 (2023). https://doi.org/10.3390/app13063514 Englert M., Latoszek B. B., Behlau M. Exploring the validity of acoustic measurements and other voice assessments. Journal of Voice, 38(3), 567–571 (2024). https://doi.org/10.1016/j.jvoice.2021.12.014 Савченко В. В. Мера различий речевых сигналов по тембру голоса. Измерительная техника (10), 63–69 (2023). https://doi.org/10.32446/0368-1025it.2023-10-63-69 ; https://www.elibrary.ru/hqycvs Rabiner L. R., Schafer R. W. Introduction to digital speech processing. Foundations and Trends® in Signal Processing, 1(1–2), 1–194 (2007). https://doi.org/10.1561/2000000001 Gibson J. Mutual information, the linear prediction model, and CELP voice codecs. Information, 10(5), 179 (2019). https://doi.org/10.3390/info10050179 Kadiri S. R., Alku P., Yegnanarayana B. Extraction and utilization of excitation information of speech: A review. Proceedings of the IEEE, 109(12), 1920–1941 (2021). https://doi.org/10.1109/JPROC.2021.3126493 Савченко В. В., Савченко Л. В. Метод асинхронного анализа голосового источника речи на основе двухуровневой авторегрессионной модели речевого сигнала. Измерительная техника, 73(2), 55–62 (2024). https://doi.org/10.32446/0368-1025it.2024-2-55-62 ; https://www.elibrary.ru/ivulbm Winn M. B. Manipulation of voice onset time in speech stimuli: A tutorial and fl exible Praat script. Journal of the Acoustical Society of America, 147(2), 852 (2020). https://doi.org/10.1121/10.0000692 Савченко В. В., Савченко Л. В. Метод корректировки коэффициентов линейного предсказания для систем цифровой обработки речи со сжатием данных на основе авторегрессионной модели голосового сигнала. Радиотехника и электроника, 69(4), 339–347 (2024). https://doi.org/10.31857/S0033849424040056 Khodaei A., Shams P., Sharifi H., Mozaffari-Tazehkand B. Identifi cation and classifi cation of coronavirus genomic signals based on linear predictive coding and machine learning methods. Biomedical Signal Processing and Control, 80(1), 104192 (2023). https://doi.org/10.1016/j.bspc.2022.104192 Mishra J., Sharma R. K. Vocal tract acoustic measurements for detection of pathological voice disorders. Journal of Circuits, Systems and Computers, 33(10), 2450173 (2024). https://doi.org/10.1142/S0218126624501731 Tokuda I. The source-fi lter theory of speech. Oxford Research Encyclopedia of Linguistics, Oxford (2021). https://doi.org/10.1093/acrefore/9780199384655.013.894 Zalazar I. A., Alzamendi G. A., Schlotthauer G. Symmetric and asymmetric Gaussian weighted linear prediction for voice inverse fi ltering. Speech Communication, 159, 103057 (2024). https://doi.org/10.1016/j.specom.2024.103057 Савченко В. В., Савченко Л. В. Метод кодирования голосового источника речи со сжатием данных на основе модели линейного предсказания. Измерительная техника, 74(3), 67–78 (2025). https://doi.org/10.32446/0368-1025it.2025-3-67-78 ; https://www.elibrary.ru/deyysw Vary P., Hofmann R., Hellwig K., Sluyter R. J. A regular-pulse excited linear predictive codec. Speech Communication, 7(2), 209–215 (1988). https://doi.org/10.1016/0167-6393(88)90040-4 Al-Heeti M. M., Hammad J. A. and Mustafa A. S. Voice encoding for wireless communication based on LPC, RPE, and CELP. International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey, 2022, pp. 1–4. https://doi.org/10.1109/HORA55278.2022.9800026 Савченко В. В., Савченко Л. В. Субоптимальный алгоритм измерения частоты основного тона с использованием дискретного фурье-преобразования речевого сигнала. Радиотехника и электроника, 68(7), 660–668 (2023). https://doi.org/10.31857/S0033849423060128 O’Shaughnessy D. Review of analysis methods for speech applications. Speech Communication, 151, 64–75 (2023). https://doi.org/10.1016/j.specom.2023.05.008 Togawa T., Otani T., Suzuki K., Taniguchi T. Development of speech technologies to support hearing through mobile terminal users. APSIPA. Transactions on Signal and Information Processing, 4(1), e14 (2015). https://doi.org/10.1017/ATSIP.2015.3 Bousselmi S., Ouni K. A new time-frequency representation based on the tight framelet packet for telephone-band speech coding. Speech Communication, 152, 102954 (2023). https://doi.org/10.1016/j.specom.2023.102954 Alabed S., Alabed S., Alsaraira A., Mostafa N. Implementing and developing secure lowcost long-range system using speech signal processing. Indonesian Journal of Electrical Engineering and Computer Science, 31(3), 1408–1419 (2023). https://doi.org/10.11591/ijeecs.v31.i3.pp1408-1419 Anselam A. S., Pillai S. S., Sreeni K. G. Quality enhancement of low bit rate speech coder with nonlinear prediction. In: Communication Systems and Networks. Lecture Notes in Electrical Engineering, 656, Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3992-3_53 Chen J. H., Thyssen J. Analysis-by-Synthesis Speech Coding. In: Benesty J., Sondhi M. M., Huang, Y. A. (eds.), Springer Handbook of Speech Processing. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_17 Sankar M. S. A., Sathidevi P. S. A Wideband scalable bit rate mixed excitation linear prediction-enhanced speech coder by preserving speaker-specifi c features. Circuits Syst Signal Process, 42, 3437–3463 (2023). https://doi.org/10.1007/s00034-022-02277-z Maheswari K., Balamurugan A. Voice over Internet protocol codec performance in interactive streaming environment. I-Manager’s Journal on Communication Engineering and Systems, 13(1), 16 (2024). https://doi.org/10.26634/jcs.13.1.20435 Савченко В. В., Савченко Л. В. Метод акустического анализа голосового источника речи в режиме реального времени. Измерительная техника, 74(4), 64–73 (2025). https://doi.org/10.32446/0368-1025it.2025-4-64-73 ; https://www.elibrary.ru/grqhlg Kazuya Y., Ishikawa S., Koba Y., Kijimoto Sh. and Sugiki Sh. Inverse analysis of vocal sound source using an analytical model of the vocal tract. Applied Acoustics, 150, 89–103 (2019). https://doi.org/10.1016/j.apacoust.2019.02.005.
Supplementary files
