Improving Japanese English pronunciation with speech recognition and feed-back system

. For Japanese people, communicating with English speakers from abroad has become more common because of internationalization, and there are many people who want to improve their English-speaking skills. However, there are few environments where we can speak English outside of the classroom, so Japanese students rarely have a chance to study English pronunciation. Even if students do have a chance to take an English pronunciation class, teachers do not have enough time to individually teach each student pronunciation in a big class. Because of that, computers and smartphones may be one good type of tool to solve this problem. In this research, we develop a web-based application to help Japanese learners with their English pronunciation.


Introduction
According to Dizon [1], intelligent assistants such as Siri or Alexa, which use speech recognition, could be helpful tools for language learners, especially Japanese secondlanguage (L2) English learners, who usually do not have many opportunities to use English outside of the classroom.Daniels and Iwago [2] have compared Siri to Google Speech Recognition (GSR) and found that GSR is more accurate for transcribing Japanese undergraduates' L2 English speech.However, although intelligent assistants can recognize speech, they do not advise users about their pronunciation.So, the current research is focused on creating a web-based feedback system that would be helpful for language learners.The feedback system advises users from the point of view of articulation and correctness based on typical errors in real test results that we collected.In this research, the web application with feedback system to evaluate users' English pronunciation uses Mozilla Developer Network (MDN)'s Web Speech API [3].In order to make the range of possible feedback more manageable, the words and phrases used for this web application are limited.They are chosen from the "Wolf Story" [4], a phonetic passage adapted from an Aesop fable, because of the wide range of English phonemes and different combinations of phonemes.It is easy for teachers and students to access the system, because the web application is hosted on the University of Aizu's CLR Phonetics Lab website at http://clrlab1.u-aizu.ac.jp/acoustics.html.

Speech Recognition Interface
There are many speech recognition applications, so it is possible to choose an interface according to one's individ- * e-mail: s1240221@u-aizu.ac.jp * * e-mail: wilson@u-aizu.ac.jp ual requirements.For example, Google Speech Recognition (GSR), IBM Cloud, Siri, Alexa, and Speech Recognition API (MDN) are all popular.As previously stated, GSR is better for L2 English learners than Siri and Alexa.So, we compared GSR, IBM, and MDN's API.Table 1 shows a list of these APIs' advantages and disadvantages.

Method
The web application created in this research consists of HTML, CSS and JavaScript.First, the HTML and CSS functions are "recognition start button", "sample mp3 files control", etc. Javascript controls the speech recognition and feedback system.The Web Speech API is an interface that Mozilla Developer Network (MDN) provides and this API can recognize any language.In this web application, we selected American English as the language setting because much of the English education in Japanese primary and secondary schools seems to focus on American English.In This web application uses Web Speech API [3].A part of the web application is shown in Figure 1.The application includes a feedback system and a lecture page.The feedback system must give advice to users, so we created an advice dataset corresponding to words from test participants that were commonly misunderstood by the speech recognition system.In case the recognized word is not in the dataset, the feedback given is very general ("please try again and speak more clearly").The lecture section of the application contains three pages.First, an explanation of what the International Phonetic Alphabet (IPA) is.Second and third, lessons teaching how to pronounce vowels and consonants, respectively.For each type of sound, static MRI images of articulators producing the sounds can be seen.Although this study's purpose is a complete web application with feedback system, we also made a lecture page to help with the study of English pronunciation.This application is mainly for Japanese L2 English learners, so the lecture page is written in Japanese.According to Ono [5], the way of effective pronunciation teaching is that the student repeats the teacher's English pronunciation and the teacher explains, in words and illustrations, how to pronounce.

Feedback System
The web application can return advice for a word from a list of common errors by Japanese speakers.Figures 2, 3,  and 4 show examples of feedback.In case the recognized word is not in the list, the application returns common advice.For example, if the word is missing "th" sound, the application returns appropriate advice such as "Put your tongue tip between your teeth to make "th" sound."

Common Pronunciation Errors
There are many common errors that Japanese speakers make when speaking English as a second language.Ta-   5 show the consonants and vowels, respectively, that exist in Japanese and English.In Table 2, consonants only used in English are shown in blue and consonants only used in Japanese are shown in red.Also, consonants used in both English and Japanese are shown in black.Common pronunciation errors by Japanese speakers of English usually occur in vowels and consonants that are not found in Japanese (i.e., the ones shown in blue).In Figure 5, the vowels underlined in pink are those found in North American English but not in Japanese.The number of mismatches ("errors") between target word and transcription can be seen in the bottom row.Note that even for the native speaker (participant L), the system indicated an error for two words: threaten and fool.

Lecture Page
In the lecture page, learners are taught how to pronounce English vowels and consonants with MRI pictures.Figure 6 shows the top page of the web application.This webpage has four tabs.In "About IPA", "Vowels" and "Consonants" pages, it explains how to read the IPA chart and how to make vowel and consonant sounds.In the "Word Practice" page, users can practice pronunciation and get feedback.I referred to [6] in making the "Vowels" and " Consonants" pages.

HTML Source Code
The following is HTML source code.This code is for showing what is found in Figure 1.

Javascript Source Code
The following is part of the Javascript source code.When the REC button is clicked, the speech recognition function is called and starts.Also, each button has a number to identify it.The checkFunc code above makes and returns advice according to recognized word.If the recognized word is not collected, go to branch of switch.

Future Work
We believe that this web application should be developed further, because the feedback system in the web application is based on too little result data.In the future, we must collect more recognized word data and improve the feedback system.

Figure 1 .
Figure 1.Part of the "word practice" page of the resultant web application

Figure 2 .
Figure 2. Example showing feedback given to the user for "th" sound

Figure 3 .
Figure 3. Example showing feedback given to the user for a vowel sound

Figure 4 .
Figure 4. Example showing feedback given to the user for "f" sound

Table 1 .
Comparison of three APIs for speech recognition (GSR = Google Speech Recognition; IBM = IBM Cloud; MDN = Mozilla Developer Network's Web Speech API)

Table 2 .
International Phonetic Alphabet chart of consonants showing differences between English and Japanese.Red phonemes are found only in Japanese, blue ones are found only in English, and black ones are found in both languages