Identification of speaker gender by voice characteristics under background of multi-talker noise

封面

如何引用文章

全文:

开放存取 开放存取
受限制的访问 ##reader.subscriptionAccessGranted##
受限制的访问 订阅存取

详细

Psychophysical methods were used to study the features of identifying the gender of a speaker based on voice characteristics under conditions of speech-like interference and stimulation through headphones. We used a set of speech signals and multi-talker noise from experiments in a free sound field – a spatial scene (Andreeva et al., 2019). The set included 8 disyllabic words spoken by 4 speakers: 2 male and 2 female voices with average fundamental frequencies of 117, 139, 208 and 234 Hz. Multi-talker noise represented the result of mixing all audio files (8 words * 4 speakers). The signal-to-noise ratio was 1:1, which subjectively corresponded to the maximum noise level in the spatial scene (SNR = –14 dB). Adult subjects from 17 to 57 years old (n = 42) participated in the experiments. Additionally, 3 age subgroups were identified: 18.6±1.5 years (n = 27); 28±4.1 years (n = 7); 46±5.4 years (n = 8). All subjects had normal hearing. The results of the study and their comparison with the data of mentioned work confirmed the importance of voice characteristics for the auditory analysis of complex spatial (free sound field) and non-spatial (headphones) scenes, and also demonstrated the role of mechanisms of the masking and binaural perception, in particular, the high-frequency mechanism of spatial hearing. A relation the perceptual assessment of the gender by voice in noise and the age of the subjects and the gender of the speakers (male/female voice) was also found. The results are of practical importance for the organization of hearing-speech training, early detection of speech hearing interference immunity impairment, as well as the development of noise-resistant systems for automatic speaker verification and hearing aid technologies.

全文:

受限制的访问

作者简介

O. Labutina

Pavlov Institute of Physiology of the Russian Academy of Sciences

Email: ogorodnikovaea@infran.ru
俄罗斯联邦, Saint Petersburg

S. Pak

Pavlov Institute of Physiology of the Russian Academy of Sciences

Email: ogorodnikovaea@infran.ru
俄罗斯联邦, Saint Petersburg

E. Ogorodnikova

Pavlov Institute of Physiology of the Russian Academy of Sciences

编辑信件的主要联系方式.
Email: ogorodnikovaea@infran.ru
俄罗斯联邦, Saint Petersburg

参考

  1. Balyakova A.A., Labutina O.V., Medvedev I.S., Pak S.P., Ogorodnikova Ye.A. Osobennosti raspoznavaniya rechevykh signalov v usloviyakh golosovoy konkurentsii v norme i pri narusheniyakh slukhorechevoy funktsii [Features of speech signal recognition in conditions of vocal competition with normal hearing and with hearing or speech disorders]. Sensornyye sistemy. 2023. V. 37. № 4. P. 342–347. doi: 10.31857/S0235009223040029.
  2. Koroleva I.V. Osnovy audiologii i slukhoprotezirovaniya. [Fundamentals of audiology and hearing aid]. St. Petersburg: KARO, 2022. 448 p. (in Russian).
  3. Koroleva I.V., Ogorodnikova E.A., Pak S.P., Levin S.V., Baliakova A.A., Shaporova A.V. Metodicheskiye podkhody k otsenke dinamiki razvitiya protsessov slukhorechevogo vospriyatiya u detey s kokhlearnymi implantami. [Methodological approaches to assessing the dynamics of the development of hearing and speech perception processes in children with cochlear implants] Russian Otorhinolaryngology. 2013. № 3. P. 75–85. (in Russian).
  4. Lopotko A.I., Berdnikova I.P., Boboshko M.Yu., Zhuravleva T.A., Zhuravskiy S.G., Kvasova T.V., Lomovatskaya L.G., Mal’tseva N.V., Molchanov A.P., Ryndina A.M., Savenko I.V., Slesarenko N.P., Soldatova G.Sh. Prakticheskoye rukovodstvo po surdologii [A practical guide to audiology]. St. Petersburg: Dialog, 2008. 273 p. (in Russian).
  5. Lyashevskaya O.N., Sharov S.A. Chastotnyy slovar’ sovremennogo russkogo yazyka (na materialakh Natsional’nogo korpusa russkogo yazyka) [Frequency dictionary of the modern Russian language (based on materials from the National Corpus of the Russian Language)]. Moscow: Azbukovnik, 2009. 1090 p. (in Russian).
  6. Ogorodnikova Ye.A., Labutina O.V., Andreyeva I.G., Gvozdeva A.P., Baulin Yu.A. Faktor prosodiki v vospriyatii kommunikativnoy stseny s prostranstvennym razdeleniyem istochnikov rechi i rechepodobnoy pomekhi [Prosody factor in the perception of a communicative scene with spatial separation of speech sources and speech-like interference]. Tezisy dokladov Mezhdunarodnoy konferentsii “Lingvisticheskiy forum 2020: Yazyk i iskusstvennyy intellekt” / Pod red. A.A. Kibrika, V. Yu. Guseva, D.A. Zalmanova. Moscow: Institut yazykoznaniya RAN, 2020. P. 127–128. (in Russian).
  7. Sapogova Ye.Ye. Psikhologiya razvitiya cheloveka [Psychology of human development]. M.: Aspekt press. 2001. 460 p. (in Russian).
  8. Khukhlayeva O.V. Psikhologiya razvitiya. Molodost’, zrelost’, starost’ [Developmental psychology. Youth, maturity, old age]. Moscow: Akademiya, 2006. 208 p. (in Russian).
  9. Andreeva I.G. Spatial selectivity of hearing in speech recognition in speech-shaped noise environment. Hum. Physiol. 2018. V. 44(2). P. 226–236. https://doi.org/10.1134/S0362119718020020
  10. Andreeva I.G., Dymnikowa M., Gvozdeva A.P., Ogorodnikova E.A., Pak S.P. Spatial separation benefit for speech detection in multi-talker babble-noise with different egocentric distances. Acta Acustica united with Acustica. 2019. V. 105. № 3. P. 484–491. https://doi.org/10.3813/AAA.919330
  11. Balling L.W., Mølgaard L.L., Townend O., Nielsen J.B.B. The collaboration between hearing aid users and artificial intelligence to optimize sound. Seminars in Hearing. 2021. № 42(3). P. 282–294. https://doi.org/10.1055/s-0041-1735135
  12. Bharathi R., Nalina H.D. Survey of Recent Advances in Hearing Aid Technologies and Trends. International Research Journal on Advanced Engineering Hub. 2024. V. 2. I. 2. P. 303–308. https://doi.org/10.47392/IRJAEH.2024.0046
  13. Bregman A.S. Auditory scene analysis: the perceptual organization of sound. Cambridge: MIT Press, 1990.
  14. Bronkhorst A.W. The cocktail-party problem revisited: Early processing and selection of multi-talker speech. Attention, Perception & Psychophysics. 2015. V. 77(5). P. 1465–1487. https://doi.org/10.3758/s13414-015-0882-9.
  15. Cherry E.C. Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 1953. V. 25. № 5. P. 975.
  16. Darvin C.J., Brungart D.S., Simpson B.D. Effects of fundamental frequency and vocal-tract length changes on attention to one or two simultaneous talkers. J. Acoust. Soc. Am. 2003. V. 114. P. 2913–2922.
  17. Davis A., McMahon C.M., Pichora-Fuller K.M., Russ S., Lin F., Olusanya B.O., Chadha S., Tremblay K.L. Aging and Hearing Health: The Life-course Approach. Gerontologist. 2016. № 56 (Suppl 2). Р. 256–267. https://doi.org/10.1093/geront/gnw033.
  18. Fostick L., Ben-Artzi E., Babkoff H. Aging and speech perception: beyond hearing threshold and cognitive ability. J. Basic Clin Physiol Pharmacol. 2013. № 24(3). Р. 175–183. https://doi.org/10.1515/jbcpp-2013-0048.
  19. Gutschalk A., Dykstra A.R. Functional imaging of auditory scene analysis. Hear. Res. 2014. V. 307. P. 98.
  20. Lesica N.A., Mehta N., Manjaly J.G., Deng L., Wilson B.S., Zeng F.-G. Harnessing the power of artificial intelligence to transform hearing healthcare and research. Nat. Mach. Intell. 2021. № 3. Р. 840–849. https://doi.org/10.1038/s42256-021-00394-z
  21. Moore B.C.J. An Introduction to the Psychology of Hearing. Leiden. Brill., 2012. 442 p.
  22. Musiek F.E., Chermak G.D. Handbook of central auditory processing disorder. San Diego. Plural Publishing, 2014. V. 1. Auditory neuroscience and diagnosis. 768 p.
  23. Pernet C.R., Belin P. The Role of Pitch and Timbre in Voice Gender Categorization. Front. Psychol. 2012. Sec. Perception Science. V. 3. https://doi.org/10.3389/fpsyg.2012.00023
  24. Popper A.N., Fay R.R. (Eds). Perspectives on auditory research. Springer handbook of auditory research. 2014. 680 p.
  25. Shamma S.A., Elhilali M., Micheyl C. Temporal coherence and attention in auditory scene analysis. Trends Neurosci. 2011. V. 34. P. 114.
  26. Smirnova V.A., Labutina O.V., Gvozdeva A.P. Chapter 9: Speech detection in spatially distributed speech-like noise. In: Neural Networks and Neurotechnologies (eds: Yu. Shelepin, E. Ogorodnikova, N. Solovyev, E. Yakimova). St. Petersburg, VVM, 2019. P. 52–60.
  27. Weston P., Hunter M.D., Sokhi D.S., Wilkinson I. Discrimination of voice gender in the human auditory cortex. NeuroImage. 2014. V. 105. P. 208–214. https://doi.org/10.1016/j.neuroimage.2014.10.056

补充文件

附件文件
动作
1. JATS XML
2. Fig. 1. Distribution of errors in determining voice gender in polyphony noise among speakers. Horizontal: the conditional series of PDO indicators for voices of speakers (M1, M2, Zh1, Zh2) and polyphony noise. Vertical: number of errors in determining the speaker's gender, %. *, ** - the levels of reliability of differences p < 0.05 and p < 0.01 according to the Wilcoxon criterion, respectively

下载 (65KB)
3. Fig. 2. Correct response rates when perceiving the same set of speech stimuli and polyphony noise in different conditions. Non-spatial scene (NS): speaker gender detection by voice during stimulation through head phones. Spatial scenes: speech signal detection in a free sound field when speech and noise sources were localized at a distance of 1 m from the listener (SH1P1), and when they were separated by distance from the listener - noise source at 1 m, speech source at 4 m (SH1P4). ** - reliability of differences p < 0.01 (Mann-Whitney test)

下载 (63KB)
4. Fig. 3. Distributions of correct answers (%) for a number of speakers with different voice characteristics (PSO). NS (non-spatial scene) - data from the study of speaker gender detection by voice in the noise of polyphony. SAR (spatial scene) - results of speech signal detection in spatial scene at speech and noise source separation (Sh1P4) and maximum noise (SNR = -14 dB). ** - significance level of differences in the perception of female voices (p < 0.01, Mann-Whitney test)

下载 (86KB)

版权所有 © Russian Academy of Sciences, 2024
##common.cookie##