Digital audio preservation for Indonesian traditional vocal recognition based on machine learning: A literature review and bibliometric analysis

Open Access

Issue		SHS Web Conf. Volume 197, 2024 6^th International Conference on Arts and Design Education (ICADE 2023)


Article Number		03002
Number of page(s)		18
Section		Optimizing Digital Literacy in Art Learning in Schools and Communities
DOI		https://doi.org/10.1051/shsconf/202419703002
Published online		06 September 2024

C. Cooney, R. Folli, D. Coyle, A bimodal deep learning architecture for EEG-fNIRS decoding of overt and imagined speech. IEEE Trans. Biomed. Eng. 69, 1983–1994 (2021) [Google Scholar]
O. Balan, A. Moldoveanu, F. Moldoveanu, Navigational audio games: an effective approach toward improving spatial contextual learning for blind people. Int. J. Disabil. Hum. Dev. 14, 109–118 (2015) [CrossRef] [Google Scholar]
R.A. Khalil, E. Jones, M.I. Babar, T. Jan, M.H. Zafar, T. Alhussain, Speech emotion recognition using deep learning techniques: A review. IEEE Access 7, 117327–117345 (2019) [CrossRef] [Google Scholar]
J. Zhang, Z. Yin, P. Chen, S. Nichele, Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review. Inf. Fusion 59, 103–126 (2020) [CrossRef] [Google Scholar]
A. Minks, From children’s song to expressive practices: old and new directions in the ethnomusicological study of children. Ethnomusicology 46, 379–408 (2002) [CrossRef] [Google Scholar]
S. Williams, The urbanization of Tembang Sunda, an aristocratic musical genre of West Java, Indonesia, Ph.D. thesis, University of Washington (1990) [Google Scholar]
H.I. Midyanti, R. Tila, E.J. Jaohari, J. Masunah, Design of Soundscape Music on Beluk Vocal in Digitizing Audio Archives, in Fifth International Conference on Arts and Design Education (ICADE 2022) (Atlantis Press, 2023), pp. 498–508 [Google Scholar]
I.N. Sedana, K. Foley, The education of a Balinese dalang. Asian Theatre J. 10, 81–100 (1993) [CrossRef] [Google Scholar]
K.Y. Baker, Kecak “Monkey chant” and authenticity in Balinese culture. Found Sounds: UNCG Musicol. J. 2, (2016) [Google Scholar]
M. Hijleh, Towards a global music history: intercultural convergence, fusion, and transformation in the human musical story (Routledge, 2018) [Google Scholar]
J.C. Kuipers, Language, identity, and marginality in Indonesia: The changing nature of ritual speech on the island of Sumba. Cambridge University Press, 18 (1998) [Google Scholar]
M.J. Rossano, The essential role of ritual in the transmission and reinforcement of social norms. Psychol. Bull. 138, 529 (2012) [CrossRef] [Google Scholar]
P.O.R. Reséndiz, Digital preservation of sound recordings. Investig. Bibliotecológica: Archivonomía, Bibliotecología e Información 30, 173–195 (2016) [CrossRef] [Google Scholar]
P. Conway, Preservation in the digital world (Council on Library and Information Resources, 1996) [Google Scholar]
J.F. Hollifield, The emerging migration state 1. Int. Migr. Rev. 38, 885–912 (2004) [CrossRef] [Google Scholar]
D. Huron, Tone and voice: A derivation of the rules of voice-leading from perceptual principles. Music Percept. 19, 1–64 (2001) [CrossRef] [Google Scholar]
S. Suryati, Planning arrangement of medley regional songs on choir for the preservation of local culture. Linguist. Cult. Rev. 5, 977–991 (2021) [CrossRef] [Google Scholar]
T. Baltrušaitis, C. Ahuja, L.P. Morency, Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2018) [Google Scholar]
K. Lee, J. Nam, Learning a joint embedding space of monophonic and mixed music signals for singing voice. arXiv preprint arXiv:1906.11139 (2019) [Google Scholar]
R.A. Kambau, Z.A. Hasibuan, M.O. Pratama, Classification for multiformat object of cultural heritage using deep learning, in 2018 Third International Conference on Informatics and Computing (ICIC), IEEE (2018), pp. 1–7 [Google Scholar]
F.W. Wibowo, Detection of Indonesian Dangdut Music Genre with Foreign Music Genres Through Features Classification Using Deep Learning, in 2021 International Seminar on Machine Learning, Optimization, and Data Science (ISMODE), IEEE (2022), pp. 313–318 [CrossRef] [Google Scholar]
M. Térmens, Preservación digital (2014) [Google Scholar]
J. Candás-Romero, El papel de los metadatos en la preservación digital. El Prof. Inf. 15, 126–136 (2006) [Google Scholar]
IASA (International Association of Sound and Audiovisual Archives), Guidelines on the Production and Preservation of Digital Audio Objects. TC-04 (UNESCO, 2006) [Google Scholar]
R. Green, Memoria y preservación digital, in Memorias del Tercer Seminario Internacional. La Preservación de la memoria audiovisual en la sociedad digital, P. Rodríguez, Ed. (Radio Educación, México, 2006) [Google Scholar]
B.E. Asogwa, Digitization of archival collections in Africa for scholarly communication: Issues, strategies, and challenges. Libr. Philos. Pract. 1 (2011) [Google Scholar]
G. Chowdhury, From digital libraries to digital preservation research: the importance of users and context. J. Doc. 66, 207–223 (2010) [CrossRef] [Google Scholar]
Y.P. Wang, M.C.C. Chen, Digitization procedures guideline: integrated operation procedures, in Taiwan e-learning and Digital Archives Program (2010) [Google Scholar]
A.N. Lacuata, Digitization of Library Resources in Higher Education Institutions in La Union, Philippines. Preserv. Digit. Technol. Cult. 49, 139–158 (2020) [CrossRef] [Google Scholar]
F. Bressan, A. Rodà, S. Canazza, F. Fontana, R. Bertani, The safeguard of audio collections: a computer science based approach to quality control--the case of the sound archive of the arena di verona. Adv. Multimedia 2013, 7 (2013) [CrossRef] [Google Scholar]
M.P. Satija, M. Bagchi, D. Martínez-Ávila, Metadata management and application. Libr. Herald 58, 84–107 (2020) [CrossRef] [Google Scholar]
L. Shklar, A. Sheth, V. Kashyap, K. Shah, InfoHarness: Use of automatically generated metadata for search and retrieval of heterogeneous information, in Advanced Information Systems Engineering: 7th International Conference, CAiSE’95 Jyväskylä, Finland, June 12–16, 1995 Proceedings 7 (Springer Berlin Heidelberg, 1995), pp. 217–230 [Google Scholar]
P.O. Rodriguez, Digital preservation of sound recordings. Investig. Bibliotecológica: Archivonomía, Bibliotecología e Información 30, 173–195 (2016) [CrossRef] [Google Scholar]
G. Pessach, The political economy of digital cultural preservation. Digital archives: management, use and access/ur. Milena Dobreva. Facet 39, 39–72 (2018) [CrossRef] [Google Scholar]
T.M. Mitchell, J.G. Carbonell, R.S. Michalski, G. Dejong, A brief overview of explanatory schema acquisition, in Machine Learning: A Guide to Current Research (Springer, 1986), pp. 47–50 [Google Scholar]
T.M. Mitchell, Does machine learning really work?. AI Mag. 18, 11–11 (1997) [Google Scholar]
7. R. Hecht-Nielsen, Theory of the backpropagation neural network, in Neural networks for perception (Academic Press, 1992), pp. 65–93 [CrossRef] [Google Scholar]
T. Evgeniou, M. Pontil, Support vector machines: Theory and applications, in Machine Learning and Its Applications: Advanced Lectures (Springer Berlin Heidelberg, 2001), pp. 249–257 [CrossRef] [Google Scholar]
I. Goodfellow, Y. Bengio, A. Courville, Deep learning (MIT Press, 2016) [Google Scholar]
M.I. Jordan, T.M. Mitchell, Machine learning: Trends, perspectives, and prospects. Science 349, 255–260 (2015) [CrossRef] [Google Scholar]
S.B. Kotsiantis, I. Zaharakis, P. Pintelas, Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 160, 3–24 (2007) [Google Scholar]
S.W. Knox, Machine learning: a concise introduction (John Wiley & Sons, 2018) [CrossRef] [Google Scholar]
L.P. Kaelbling, M.L. Littman, A.W. Moore, Reinforcement learning: A survey. J. Artif. Intell. Res. 4, 237–285 (1996) [CrossRef] [Google Scholar]
J. Ma, Machine learning and audio processing: a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, Auckland, New Zealand (Massey University, 2019) [Google Scholar]
F. Vesperini, L. Gabrielli, E. Principi, S. Squartini, Polyphonic sound event detection by using capsule neural networks. IEEE J. Sel. Top. Signal Process. 13, 310–322 (2019) [CrossRef] [Google Scholar]
Y. Wang, F. Metze, Connectionist temporal localization for sound event detection with sequential labeling, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2019, pp. 745–749 [Google Scholar]
C.C. Chiu, T.N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, M. Bacchiani. State-of-the-art speech recognition with sequence-to-sequence models, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018, pp. 4774–4778 [Google Scholar]
S. Petridis, T. Stafylakis, P. Ma, F. Cai, G. Tzimiropoulos, M.S. Pantic, End-to-end audiovisual speech recognition, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2018, pp. 6548–6552 [Google Scholar]
C. Weng, J. Cui, G. Wang, J. Wang, C. Yu, D. Su, D.C. Yu, Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition, in Interspeech, September 2018, pp. 761–765 [Google Scholar]
H. Bilen, B. Fernando, E. Gavves, A. Vedaldi, Action recognition with dynamic image networks. IEEE Trans. Pattern Anal. Mach. Intell. 40, 2799–2813 (2017) [Google Scholar]
H. Rahmani, A. Mian, M. Shah, Learning a deep model for human action recognition from novel viewpoints. IEEE Trans. Pattern Anal. Mach. Intell. 40, 667–681 (2017) [Google Scholar]
H. Choi, K. Cho, Y. Bengio, Fine-grained attention mechanism for neural machine translation. Neurocomputing 284, 171–176 (2018) [CrossRef] [Google Scholar]
D. He, Y. Xia, T. Qin, L. Wang, N. Yu, T.Y. Liu, W.Y. Ma. Dual learning for machine translation. Adv. Neural Inf. Process. Syst. (NIPS) 820–828 (2016) [Google Scholar]
J. Lee, K. Cho, T. Hofmann, Fully character-level neural machine translation without explicit segmentation. Trans. Assoc. Comput. Linguist. 5, 365–378 (2017) [CrossRef] [Google Scholar]
J. Schmidhuber, Deep learning in neural networks: An overview. Neural Netw. 61, 85–117 (2015) [CrossRef] [Google Scholar]
Y.Y. Song, L. Ying, Decision tree methods: applications for classification and prediction. Shanghai Arch. Psychiatry 27, 130 (2015) [Google Scholar]
Y.A. Malkov, D.A. Yashunin, Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42, 824–836 (2018) [Google Scholar]
J. Tang, Y. Tian, P. Zhang, X. Liu, Multiview privileged support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 29, 3463–3477 (2017) [Google Scholar]
B. Du, W. Xiong, J. Wu, L. Zhang, L. Zhang, D. Tao. Stacked convolutional denoising auto-encoders for feature representation. IEEE Trans. Cybern. 47, 1017–1027 (2016) [Google Scholar]
B. Hutchinson, L. Deng, D. Yu, Tensor deep stacking networks. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1944–1957 (2012) [Google Scholar]
G.E. Dahl, D. Yu, L. Deng, A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 30–42 (2011) [Google Scholar]
S. Ren, K. He, R. Girshick, X. Zhang, J. Sun, Object detection networks on convolutional feature maps. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1476–1481 (2016) [Google Scholar]
L. Wu, J.Z. Cheng, S. Li, B. Lei, T. Wang, D. Ni, FUIQA: fetal ultrasound image quality assessment with deep convolutional networks. IEEE Trans. Cybern. 47, 1336–1349 (2017) [CrossRef] [Google Scholar]
A. Graves, A.R. Mohamed, G. Hinton, Speech recognition with deep recurrent neural networks. IEEE International Conference on Acoustics, Speech and Signal Processing (2013), pp. 6645–6649 [Google Scholar]
M. Sundermeyer, H. Ney, R. Schlüter, From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 517–529 (2015) [CrossRef] [Google Scholar]
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell, Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 2625–2634 [Google Scholar]
G. Heigold, H. Ney, R. Schluter, S. Wiesler, Discriminative training for automatic speech recognition: Modeling, criteria, optimization, implementation, and performance. IEEE Signal Process. Mag. 29, 58–69 (2012) [CrossRef] [Google Scholar]
D. Marin, M. Tang, I.B. Ayed, Y. Boykov, Kernel clustering: Density biases and solutions. IEEE Trans. Pattern Anal. Mach. Intell. 41, 136–147 (2017) [Google Scholar]
L. Huang, H.Y. Chao, C.D. Wang, Multi-view intact space clustering. Pattern Recognit. 86, 344–353 (2019) [CrossRef] [Google Scholar]
I.A. Maraziotis, S. Perantonis, A. Dragomir, D. Thanos, K-Nets: Clustering through nearest neighbors networks. Pattern Recognit. 88, 470–481 (2019) [CrossRef] [Google Scholar]
Y. Yi, J. Wang, W. Zhou, Y. Fang, J. Kong, Y. Lu, Joint graph optimization and projection learning for dimensionality reduction. Pattern Recognit. 92, 258–273 (2019) [CrossRef] [Google Scholar]
C. Örnek, E. Vural, Nonlinear supervised dimensionality reduction via smooth regular embeddings. Pattern Recognit. 87, 55–66 (2019) [CrossRef] [Google Scholar]
M. Harandi, M. Salzmann, R. Hartley, Dimensionality reduction on SPD manifolds: The emergence of geometry-aware methods. IEEE Trans. Pattern Anal. Mach. Intell. 40, 48–62 (2017) [Google Scholar]
A. Romero, C. Gatta, G. Camps-Valls, Unsupervised deep feature extraction for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 54, 1349–1362 (2015) [Google Scholar]
P.M. Sheridan, C. Du, W.D. Lu, Feature extraction using memristor networks. IEEE Trans. Neural Netw. Learn. Syst. 27, 2327–2336 (2015) [Google Scholar]
Y.A. Ghassabeh, F. Rudzicz, H.A. Moghaddam, Fast incremental LDA feature extraction. Pattern Recognit. 48, 1999–2012 (2015) [CrossRef] [Google Scholar]
A. Hyvarinen, H. Morioka, Unsupervised feature extraction by time-contrastive learning and nonlinear ICA. Adv. Neural Inf. Process. Syst. 29 (2016) [Google Scholar]
C. Doersch, A. Gupta, A.A. Efros, Unsupervised visual representation learning by context prediction. Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1422–1430 [Google Scholar]
L. Tran, X. Yin, X. Liu, Representation learning by rotating your faces. IEEE Trans. Pattern Anal. Mach. Intell. 41, 3007–3021 (2018) [Google Scholar]
R. Xu, D. Wunsch, Survey of clustering algorithms. IEEE Trans. Neural Netw. 16, 645–678 (2005) [CrossRef] [Google Scholar]
C. Rasmussen, The infinite Gaussian mixture model. Adv. Neural Inf. Process. Syst. 12, 554–560 (1999) [Google Scholar]
B. Jian, B.C. Vemuri, Robust point set registration using Gaussian mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1633–1645 (2010) [Google Scholar]
C.Y. Liou, W.C. Cheng, J.W. Liou, D.R. Liou, Autoencoder for words. Neurocomputing 139, 84–96 (2014) [CrossRef] [Google Scholar]
A. Ghosh, V. Kulharia, V.P. Namboodiri, P.H. Torr, P.K. Dokania, Multi-agent diverse generative adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 8513–8521 [Google Scholar]
A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, A.A. Bharath, Generative adversarial networks: An overview. IEEE Signal Process. Mag. 35, 53–65 (2018) [CrossRef] [Google Scholar]
A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999) [CrossRef] [Google Scholar]
A.K. Jain, Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31, 651–666 (2010) [CrossRef] [Google Scholar]
H.S. Park, C.H. Jun, A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 36, 3336–3341 (2009) [CrossRef] [Google Scholar]
Y. Zhao, G. Karypis, U. Fayyad, Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov. 10, 141–168 (2005) [CrossRef] [Google Scholar]
H.P. Kriegel, P. Kröger, J. Sander, A. Zimek, Density-based clustering. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 1, 231–240 (2011) [CrossRef] [Google Scholar]
C. Xu, D. Tao, C. Xu, A survey on multi-view learning. arXiv preprint arXiv:1304.5634 (2013) [Google Scholar]
Y. Li, M. Yang, Z. Zhang, A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 31, 1863–1883 (2018) [Google Scholar]
W. Zhuge, C. Hou, Y. Jiao, J. Yue, H. Tao, D. Yi, Robust auto-weighted multi-view subspace clustering with common subspace representation matrix. PLoS One 12, e0176769 (2017). [CrossRef] [Google Scholar]
C. Zhang, Q. Hu, H. Fu, P. Zhu, X. Cao, Latent multi-view subspace clustering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 4279–4287 [Google Scholar]
G. Chao, S. Sun, J. Bi, A survey on multi-view clustering. arXiv preprint arXiv:1712.06246 (2017) [Google Scholar]
Y. Fan, J. Liang, R. He, B.G. Hu, S. Lyu, Robust localized multi-view subspace clustering. arXiv preprint arXiv:1705.07777 (2017) [Google Scholar]
M. Längkvist, L. Karlsson, A. Loutfi, A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognit. Lett. 42, 11–24 (2014) [CrossRef] [Google Scholar]
S. Dargan, M. Kumar, M.R. Ayyagari, G. Kumar, A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Methods Eng. 27, 1071–1092 (2020) [CrossRef] [Google Scholar]
P. Verma, C. Chafe, J. Berger, One-Shot Acoustic Matching Of Audio Signals--Learning to Hear Music In Any Room/Concert Hall. arXiv preprint arXiv:2210.15750 (2022) [Google Scholar]
Y.D. Mistry, G.K. Birajdar, A.M. Khodke, Time-frequency visual representation and texture features for audio applications: a comprehensive review, recent trends, and challenges. Multimed. Tools Appl. 1–35 (2023) [Google Scholar]
A. Ezhilan, R. Dheekksha, S. Shridevi, Audio style conversion using deep learning. Int. J. Appl. Sci. Eng. 18, 1–8 (2021) [CrossRef] [Google Scholar]
J. Chaki, Pattern analysis based acoustic signal processing: a survey of the state-of-art. Int. J. Speech Technol. 24, 913–955 (2021) [CrossRef] [Google Scholar]
L.J.C. Cohen, Using spectral analysis in the flute studio to develop tone quality (Doctoral dissertation, The University of Iowa) (2021) [Google Scholar]
M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, S. Liu, Towards better analysis of deep convolutional neural networks. IEEE Trans. Vis. Comput. Graph. 23, 91–100 (2016) [Google Scholar]
A. Khan, A. Sohail, U. Zahoora, A.S. Qureshi, A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53, 5455–5516 (2020) [CrossRef] [Google Scholar]
J. Pons, O. Slizovskaia, R. Gong, E. Gómez, X. Serra, Timbre analysis of music audio signals with convolutional neural networks. In 2017 25th European Signal Processing Conference (EUSIPCO) (2017), pp. 2744–2748 [Google Scholar]
H. Chaurasiya, Time-frequency representations: spectrogram, cochleogram and correlogram. Procedia Comput. Sci. 167, 1901–1910 (2020) [CrossRef] [Google Scholar]
A. Graves, S. Fernández, J. Schmidhuber, Multi-dimensional recurrent neural networks. In International Conference on Artificial Neural Networks (2007), pp. 549–558 [Google Scholar]
E. Cakır, G. Parascandolo, T. Heittola, H. Huttunen, T. Virtanen, Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 1291–1303 (2017) [CrossRef] [Google Scholar]
S.P. Yadav, S. Zaidi, A. Mishra, V. Yadav, Survey on machine learning in speech emotion recognition and vision systems using a recurrent neural network (RNN). Arch. Comput. Methods Eng. 29, 1753–1770 (2022) [CrossRef] [Google Scholar]
C. Gao, J. Yan, S. Zhou, P.K. Varshney, H. Liu, Long short-term memory-based deep recurrent neural networks for target tracking. Inf. Sci. 502, 279–296 (2019) [CrossRef] [Google Scholar]
I.C. Kaadoud, N.P. Rougier, F. Alexandre, Knowledge extraction from the learning of sequences in a long short term memory (LSTM) architecture. Knowl.-Based Syst. 235, 107657 (2022) [CrossRef] [Google Scholar]
E. Tsalera, A. Papadakis, M. Samarakou, Comparison of pre-trained CNNs for audio classification using transfer learning. J. Sensor Actuator Netw. 10, 72 (2021) [CrossRef] [Google Scholar]
S. Shin, J. Kim, Y. Yu, S. Lee, K. Lee, Self-supervised transfer learning from natural images for sound classification. Appl. Sci. 11, 3043 (2021) [CrossRef] [Google Scholar]
A. Triantafyllopoulos, B.W. Schuller, The role of task and acoustic similarity in audio transfer learning: Insights from the speech emotion recognition case. In ICASSP 20212021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2021), pp. 7268–7272 [Google Scholar]
J. Abeßer, A review of deep learning based methods for acoustic scene classification. Appl. Sci. 10, 2020 (2020) [CrossRef] [Google Scholar]
D. Michelsanti, Z.H. Tan, S.X. Zhang, Y. Xu, M. Yu, D. Yu, J. Jensen, An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 1368–1396 (2021) [CrossRef] [Google Scholar]
D. de Benito-Gorron, A. Lozano-Diez, D.T. Toledano, J. Gonzalez-Rodriguez, Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset. EURASIP J. Audio Speech Music Process. 2019, 1–18 (2019) [CrossRef] [Google Scholar]
Z. Zhao, Q. Li, Z. Zhang, N. Cummins, H. Wang, J. Tao, B.W. Schuller, Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition. Neural Netw. 141, 52–60 (2021) [CrossRef] [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.