Last month, Emmersion celebrated the release of an Italian version of our TrueNorth Speaking Test. This version provides an automated solution to measuring a person’s ability to speak Italian. It joins the line-up as the seventh language available for testing speaking proficiency along with English, Spanish, Portuguese, Japanese, German, and French.
Although we still have an ambitious list of languages that are in development, this seemed like an opportunity for us to highlight the rigorous and collaborative process that precedes the excitement of a new language release.
The TrueNorth Speaking Test uses an innovative test item type called elicited imitation (EI). A more common name used to describe elicited imitation items is listen and repeat. The test-taker listens to a sentence and then repeats it as accurately and completely as their language ability allows. Research has shown this to be a reliable way of measuring language ability. Cognitively, this is explained because the greater a person’s language ability, the more effectively they can “chunk” or group words and phrases together in order to successfully achieve the hand-off between hearing the sentence and then correctly speaking it.
One of the very first things we do when creating elicited imitation items for a new assessment is to complete a limitations test. For this limitations test, we take recordings of long and complex sentences and present them to native speakers in the same listen-and-repeat format. We look for patterns and thresholds of accuracy that indicate the natural maximum length of sentence for the highest ability users of the language. This helps us to understand what will be the most appropriate length of sentence for low, middle, and high (even native) ability test-takers.
With what we learn from the limitations test, we are ready to start curating our item bank. In order to get authentic sentences, we find a corpus for the test language. A corpus is a bank of texts that have been brought together to study a given language using computational linguistics. The corpus we used for the Italian item bank contained over 500,000 sentences. Using the target lengths identified from the limitation test and a systematic approach to filtering, we narrowed this corpus down to a much more manageable number of sentences divided across four likely ability categories (beginning, intermediate, advanced, and mastery).
We then work with language experts in the study and teaching of the test language who further filter the list down to the sentences that will provide us with the best linguistic coverage of the vocabulary and grammar representative of performing at that level. They also help flag any content that would be inappropriate or too specific to a single region or dialect.
The language experts that contributed to this part of the Italian test development
Cristina Abbona-Sneider Ph.D is a Senior Lecturer in Italian Studies at Brown University, where she teaches language and culture courses at all levels. She also coordinates and administers all aspects of the language program and serves as supervisor and trainer for graduate teaching assistants. She is co-author of Trame. A Contemporary Italian Reader (Yale University Press, 2010).
Filomena Fantarella Ph.D is a Visiting Assistant Professor of Italian at Brown University, where she teaches language and culture courses. Her book, Un figlio per nemico. Gli affetti di Gaetano Salvemini alla prova dei fascismi, was published by Donzelli Press in July 2018. It was reviewed and praised widely in the Italian press.
Worldwide Calibration Study
With those items that the team of language experts affirm as being best suited for the test, we construct two types of calibration tests. First, a general-form test of EI items that spans the entire difficulty range from beginning to mastery. We also construct four targeted calibration tests—one for each likely major ability group: beginner, intermediate, advanced, and mastery.
In addition to the general-form test, each calibration participant takes the targeted-form test that is most likely to match their ability level based on information that we have gathered about their background and profile. The targeted-form test, in addition to EI items, also presents open-response items that target language functions individuals at that ability level should be able to complete. These tasks provide us with spontaneous speech data to inform our scoring algorithm.
We work with participants around the world to collect hundreds of responses to these calibration test forms. They represent learners at each point in the ability spectrum from non-learners to native speakers and every step in between. Thus, we are able to inform the automatic speech recognition solutions, scoring algorithm, and item selection with a rich and deep data set.
Meet a few of the hundreds of language-learners that contributed to the calibration testing
Gordon Wells first learned Italian 47 years ago as part of his two year service mission in Italy for the Church of Jesus Christ of Latter-Day Saints. He joined our calibration study shortly after returning to Italy for the first time: a trip that rekindled his love for the language and the people of Italy.
Elizabetta Innocenzi was born and continues to live in Rome Italy. She was married in 1982 and has two lovely daughters. She graduated in Mathematics and worked as an I.T. Analyst, Project Leader and, in the last 6 years, manager of the Audit Department for an insurance company. She’s now enjoying her retirement.
Training the ASR Solution
The TrueNorth Speaking Test is an automated speaking test. We use supercomputers including IBM’s Watson to support the scoring of the test through automatic speech recognition (ASR). While the speech recognition capacity of this technology is transforming many industries, we do not leave its accuracy to chance. In order to prove and improve the performance of an ASR solution, we also handrate our data bank of calibration test responses and then use these human rated scores to ensure that the automated solution is reliable.
Meet a couple of the native speakers that contributed to this part of the Italian test development
Claudia Mencarelli is from Rome but currently an MA TESOL student at Brigham Young University, in Utah. She is currently teaching ESL classes as she studies. Her ambition is to continue working in the English teaching field in Italy. This summer she completed an internship at Emmersion.
Lisa Cagnacci is from Genova Italy but has lived in the United States for the last six years.. She is currently a full-time mom. In her free time, she enjoys reading English literature and working on her family history. As part of her future projects, she is looking forward to working on her Bachelor’s degree and learning Spanish.
Creating the Final Test Form
Once the ASR solution has been trained, we rescore the calibration database with the ASR solution and use these scores to create a scoring algorithm and identify the best performing items. After these psychometric analyses have been completed, the most reliable and discriminating items are put into the test form that will be released. In order to ensure that the audio recordings for the test are the best quality, we vet voice talent that will be used to complete the recordings. From an audition pool of 10-20 voice professionals in a language, we select the one that is agreed to be the most accurate and authentic.
With the automated test form complete, we pursue confirmation that the test and the scoring algorithm are working as they had been designed. This confirmation testing included participants from our calibration pool and test-takers who had not previously taken the test. In addition to their test data, our UX design team collects observation data that the test form will enhance the experience of the test-taker.
Meet the voice talent featured in the released Italian speaking test
Sabrina Carletti is a theater actor and acting instructor. Her voiceover work has included tv and radio advertising for Italian national and international brands as well as cultural and educational projects.
Our work does not end with the release of a new language product. We continue to learn how the test form and experience can be enhanced. For our international language products, we also continue to grow our item pool and enhance our scoring tools so that we can eventually offer a fully adaptive version of these automated speaking tests like we have done with English. We are also excited to continue to learn about how we can measure other abilities in addition to speaking in a test language.
If you are interested in what languages we are working on or, even better, maybe you could help us as a calibration participant, we have information about what’s upcoming here!
The TrueNorth English Speaking Assessment was developed to modernize English language testing with patented artificial intelligence and machine learning. This technology allows for immediate results and scoring as opposed to all other language assessments that take 24 to 48 hours to grade. TrueNorth provides a convenient and immediate English testing solution that has been validated and calibrated to global testing standards and is also available in several other languages.
Some of the largest international companies rely on TrueNorth testing every day. With this technology, BPOs and international companies utilize real-time data and reporting to assist in attracting and hiring the best talent. Additionally, more than 650 universities, colleges, and training institutions around the globe use the TrueNorth platform for course placement and progress tracking. TrueNorth is delivered online with only the need for headphones and microphones.