Signal and Information Processing

Environmental Sound Classification Method Based on Color Channel Feature Fusion

Expand
  • 1. School of Mechantronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing 400074, China;
    2. Chongqing Industrial Big Data Innovation Center Co. Ltd., Chongqing 400707, China

Received date: 2021-09-24

  Online published: 2023-08-02

Abstract

To address low classification accuracy in traditional neural networks processing complex environmental sounds, an environment sound classification method based on color channel feature fusion is proposed. Firstly, three acoustic features are extracted from the raw audio data, namely log-Mel Spectrogram (LMS), Mel-scale frequency cepstral Coefficients (MFCC) and energy spectrum (ES). Then, the above three features are used as RGB color channel components respectively for feature fusion to form a more representative spectrogram, which contributes to representing the environmental sound comprehensively. Subsequently, in order to avoid the poor generalization ability of the trained model due to the small number of datasets, the pre-trained network VGG-16 is trained by fine-tuning method. Finally, the effectiveness of the proposed method is verified on two widely used environmental sound classification datasets and audios collected in real scenarios, and compared with other models in terms of accuracy. The results show that the accuracy of the proposed method on ESC-10 and ESC-50 datasets can reach 88.2% and 65.2% respectively, improving the classification performance of audios collected in real scenarios.

Cite this article

DONG Shaojiang, XIA Zhengfu, FANG Nengwei, XING Bin, HU Xiaolin . Environmental Sound Classification Method Based on Color Channel Feature Fusion[J]. Journal of Applied Sciences, 2023 , 41(4) : 669 -681 . DOI: 10.3969/j.issn.0255-8297.2023.04.011

References

[1] Alías F, Socoró J, Sevillano X. A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds[J]. Applied Sciences, 2016, 6(5):143.
[2] Tripathi A M, Mishra A. Environment sound classification using an attention-based residual neural network[J]. Neurocomputing, 2021, 460:409-423.
[3] Piczak K J. Environmental sound classification with convolutional neural networks[C]//2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), 2015:1-6.
[4] Tripathi A M, Mishra A. Self-supervised learning for environmental sound classification[J]. Applied Acoustics, 2021, 182:108183.
[5] Su Y, Zhang K, Wang J Y, et al. Performance analysis of multiple aggregated acoustic features for environment sound classification[J]. Applied Acoustics, 2020, 158:107050.
[6] Peng N, Chen A B, Zhou G X, et al. Environment sound classification based on visual multi-feature fusion and GRU-AWS[J]. IEEE Access, 2020, 8:191100-191114.
[7] Mushtaq Z, Su S F, Tran Q V. Spectral images based environmental sound classification using CNN with meaningful data augmentation[J]. Applied Acoustics, 2021, 172:107581.
[8] Li S B, Yao Y, Hu J, et al. An ensemble stacked convolutional neural network model for environmental event sound recognition[J]. Applied Sciences, 2018, 8(7):1152.
[9] Nanni L, Maguolo G, Brahnam S, et al. An ensemble of convolutional neural networks for audio classification[J]. Applied Sciences, 2021, 11(13):5796.
[10] Luz J S, Oliveira M C, Araújo F H D, et al. Ensemble of handcrafted and deep features for urban sound classification[J]. Applied Acoustics, 2021, 175:107819.
[11] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[DB/OL]. 2014[2021-09-24]. https://arxiv.org/abs/1409.1556.
[12] Piczak K J. ESC:dataset for environmental sound classification[C]//23rd ACM international conference on Multimedia, 2015:1015-1018.
[13] Boddapati V, Petef A, Rasmusson J, et al. Classifying environmental sounds using image recognition networks[J]. Procedia Computer Science, 2017, 112:2048-2056.
Outlines

/