Using convolutional neural networks for acoustic-based emergency vehicle detection

Andrey A. Lisov; Лисов Андрей Анатольевич; Askar Z. Kulganatov; Кулганатов Аскар Зайдакбаевич; Sergei A. Panishev; Панишев Сергей Алексеевич

doi:10.17816/transsyst20239195-107

Using convolutional neural networks for acoustic-based emergency vehicle detection

作者: Lisov A.A.¹, Kulganatov A.Z.¹, Panishev S.A.¹
隶属关系:
1. South Ural State University
期: 卷 9, 编号 1 (2023)
页面: 95-107
栏目: Original studies
URL: https://journals.rcsi.science/transj/article/view/126662
DOI: https://doi.org/10.17816/transsyst20239195-107
ID: 126662

如何引用文章

全文:

详细
全文:
作者简介
参考
补充文件
统计

详细

Background: A siren is a special signal given by emergency vehicles such as fire trucks, police cars and ambulances to warn drivers or pedestrians on the road. However, drivers sometimes may not hear the siren due to the sound insulation of a modern car, the noise of city traffic, or their own inattention. This problem can lead to a delay in the provision of emergency services or even to traffic accidents.

Aim: develop an acoustic method for detecting the presence of emergency vehicles on the road through the use of convolutional neural networks.

Materials and Methods: The algorithm of work is based on the conversion of sound from the external environment into its spectrogram, for analysis by a convolutional neural network. An open dataset (“Emergency Vehicle Siren Sounds”) from sources available on Internet sites such as Google and Youtube, saved in the “.wav” audio format, was used as a dataset for siren sounds and city traffic. The code was developed on the Google.Colab platform using cloud storage.

Results: The conducted experiments showed that the proposed method and model of the neural network make it possible to achieve an average efficiency of determining the type of sound with an accuracy of 93.3 % and a speed recognition of 0.0004±5 % of a second.

Conclusion: The use of the developed technology for recognizing siren sounds in city noize will improve traffic safety and increase the chances of preventing a dangerous situation. Also, this system can be an additional assistant for hearing-impaired people while driving and everyday life for timely notification of the presence of emergency services nearby.

关键词

machine learning, convolutional neural networks, emergency vehicle signal recognition

全文:

##article.viewOnOriginalSite##

作者简介

Andrey Lisov

South Ural State University

编辑信件的主要联系方式.
Email: lisov.andrey2013@yandex.ru
ORCID iD: 0000-0001-7282-8470
SPIN 代码: 1956-3662

postgraduate student

俄罗斯联邦, Chelyabinsk

Askar Kulganatov

South Ural State University

Email: kulganatov97@gmail.com
ORCID iD: 0000-0002-7576-7949
SPIN 代码: 7607-9723

postgraduate student

俄罗斯联邦, Chelyabinsk

Sergei Panishev

South Ural State University

Email: panishef.serega@mail.ru
ORCID iD: 0000-0003-2753-2341
SPIN 代码: 2676-5207

postgraduate student

俄罗斯联邦, Chelyabinsk

参考

Kanzaria HK, Probst MA, Hsia RY. Emergency department death rates dropped by nearly 50 percent, 1997–2011. Health Affairs. 2016 Jul 1;35(7):1303-8. doi: 10.1377/hlthaff.2015.1394
Lee J, Park J, Kim KL, Nam J. Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. arXiv preprint arXiv:1703.01789. 2017 Mar 6. doi: 10.48550/arXiv.1703.01789
Zhu Z, Engel JH, Hannun A. Learning multiscale features directly from waveforms. arXiv preprint arXiv:1603.09509. 2016 Mar 31. doi: 10.48550/arXiv.1603.09509
Choi K, Fazekas G, Sandler M. Automatic tagging using deep convolutional neural networks. arXiv preprint arXiv:1606.00298. 2016 Jun 1. doi: 10.48550/arXiv.1606.00298
Nasrullah Z, Zhao Y. Music artist classification with convolutional recurrent neural networks. In2019 International Joint Conference on Neural Networks (IJCNN) 2019 Jul 14 (pp. 1-8). IEEE. doi: 10.1109/IJCNN.2019.8851988
Wang Z, Muknahallipatna S, Fan M, et al. Music classification using an improved crnn with multi-directional spatial dependencies in both time and frequency dimensions. In2019 International Joint Conference on Neural Networks (IJCNN) 2019 Jul 14 (pp. 1-8). IEEE. doi: 10.1109/IJCNN.2019.8852128
Dieleman S, Brakel P, Schrauwen B. Audio-based music classification with a pretrained convolutional network. In12th International Society for Music Information Retrieval Conference (ISMIR-2011) 2011 (pp. 669-674). University of Miami.
Chen MT, Li BJ, Chi TS. CNN based two-stage multi-resolution end-to-end model for singing melody extraction. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 1005-1009). IEEE. doi: 10.1109/ICASSP.2019.8683630
Phan H, Koch P, Katzberg F, et al. Audio scene classification with deep recurrent neural networks. arXiv preprint arXiv:1703.04770. 2017 Mar 14. doi: 10.48550/arXiv.1703.04770
Gimeno P, Viñals I, Ortega A, et al. Multiclass audio segmentation based on recurrent neural networks for broadcast domain data. EURASIP Journal on Audio, Speech, and Music Processing. 2020 Dec;2020:1-9.
Dai J, Liang S, Xue W, et al. Long short-term memory recurrent neural network based segment features for music genre classification. In2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) 2016 Oct 17 (pp. 1-5). IEEE. doi: 10.1109/ISCSLP.2016.7918369
Zhang Z, Xu S, Zhang S, et al. Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing. 2021 Sep 17;453:896-903. doi: 10.1016/j.neucom.2020.08.069
Wang H, Zou Y, Chong D, Wang W. Environmental sound classification with parallel temporal-spectral attention. arXiv preprint arXiv:1912.06808. 2019 Dec 14. doi: 10.48550/arXiv.1912.06808
Sang J, Park S, Lee J. Convolutional recurrent neural networks for urban sound classification using raw waveforms. In2018 26th European Signal Processing Conference (EUSIPCO) 2018 Sep 3 (pp. 2444-2448). IEEE. doi: 10.23919/EUSIPCO.2018.8553247
Choi K, Fazekas G, Sandler M, Cho K. Convolutional recurrent neural networks for music classification. In2017 IEEE International conference on acoustics, speech and signal processing (ICASSP) 2017 Mar 5 (pp. 2392-2396). IEEE. doi: 10.1109/ICASSP.2017.7952585
Gwardys G, Grzywczak D. Deep image features in music information retrieval. International Journal of Electronics and Telecommunications. 2014;60:321-6. doi: 10.2478/eletel-2014-0042
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Communications of the ACM. 2017 May 24;60(6):84-90. doi: 10.1145/3065386
Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition 2009 Jun 20 (pp. 248-255). IEEE. doi: 10.1109/CVPR.2009.5206848
Emergency Vehicle Siren Sounds [Internet]. Kaggle [cited 2023 February 23]. Available from: https://www.kaggle.com/vishnu0399/emergency-vehicle-siren-sounds
CNN for audio recognition. GitHub [cited 2023 February 23]. Available from: https://github.com/AnLiMan/CNN-for-audio-recognition