A Shopper’s Guide for Automated Speaking Assessments

While we are proud of our accomplishments towards automated speaking assessment solutions and claim many first ascents in this area of language testing, we also recognize that other solutions are out there that use similar tools to our automated assessments. You might wonder what we think about these competitors. To be honest, we welcome the company! Being a Team Player is a part of our culture. It’s validating to see the work that others are doing that aligns with our own. 

We recognize that the global problems related to language learning and language assessment are big enough to require multiple solutions. Naturally, we want our solutions in the hands of any and all who can benefit from them. However, we haven’t tried to create a one-size-fits-all solution. Depending on a use case, there may be a better fit in a competitor’s product, and that is okay. Fit matters. 

We want our clients to be happy (and we have a very great record of success). Our expectations for satisfied clients begin with making sure that our solution meets their needs. We have good people who help ensure that. 

So having choices is a good thing. Even good things though come with challenges. Having a choice means you need to make a choice and that can be difficult. What should you look for when choosing an automated speaking test? 

If you’re reading, I’ll assume that you’re asking. Here are a few things to consider. 

1. Transparency 

Innovative solutions like automated speaking tests will have a ‘magic’ effect. They will take what was previously days or even weeks of work that required the effort of many to accomplish and transform it into a couple of clicks and a few minutes wait. I’ve run human-rated speaking assessment events and I’ve herded sheep (not everyone can boast both). Herding sheep is less exhausting. There is a thrill when you see that there is a better way. Saving everyone, test takers and test administrators alike, the pain of delay and exhaustion of execution.

However, unlike street magic, there shouldn’t be tricks in testing. Particularly testing that has a consequence. Obviously, some particulars may not be relevant for you to know about how a solution works. But you should be able to ask questions and get answers. At Emmersion, with the release of each product, we also release a technical report. It is a part of our culture of truth-seeking. We want those that use our product to understand how it works and why it works.  Regularly these efforts to teach and document are met with surprise. “Others don’t do this. Thank you”. Hearing this feels good but it also feels sad. “They should. You’re welcome”. 

If you’re pursuing an automated assessment solution and you can’t get more than marketing materials, beware. Increasingly it’s becoming clear that there is danger in being complacent about transparency with technological solutions. Phrases like ‘AI-powered’ or ‘automated’ carry a need for accountability. How the data used to make decisions came to be is an important question to ask. 

2. Accuracy

Would you let a robot cut your hair? The answer to this shouldn’t be an immediate no. 

Robots can perform delicate surgeries and land airplanes. Clearly, they can do a lot when they have been trained. However, the answer shouldn’t also be an immediate yes, either. Even if it was going to save me time and money, I’d ask, “Tell me about how s/he learned how to cut hair; how confident are you that at the end I’ll still have ears and can leave my house (or in a pandemic turn video on while Zooming)?” before I would let a robot start whirling blades around my head.

Behind any automated speaking test solution is an automated speech recognition tool (shortened in its name to ASR tool). Automatic speech recognition has come a long way. There are really powerful solutions out there that are remarkably accurate. We couldn’t do what we do without it. However, not all are created or calibrated equally (we will get to that later). 

In our use of ASR tools, we don’t just assume that because something ‘works’ that it will work for the specific job that we will give it to do. We select those tools that will give us the best results and then confirm that they do. We go through a process of calibration where we ensure that the speech recognition tool performs very closely to the levels of accuracy we’d see if we were relying on trained human raters. 

If you’re considering an ASR-backed automated speaking assessment solution, ask about the ASR they use and how they’ve confirmed that it’s accurate. If they don’t have clear answers, don’t sit in the chair and ask for just a little off the sides and top. 

3. Calibration

Speaking of calibration, the ASR isn’t the only thing that should be calibrated. The test content should also be calibrated before you can have confidence that it is going to result in data that is valid. Items vary in what they reveal about ability and how reliably they reveal it. 

Be very concerned if a test publisher treats all test content as good test content. How will you know? Here’s one red flag: they let you bring your own. 

One of the very compelling reasons to move away from an in-house solution is because no matter your level of experience, expertise alone cannot ensure that test content will work. DIY can very quickly end in DIWhy!??[insert all the crying emojis].

Before we use test content towards producing score data, it has been presented to thousands of test-takers. From this process, we learn how the test content responds to differences in the test takers’ background and ability and how these differences represent difficulty. Understanding the difficulty of the content is critical when it comes to interpreting performance. 

Although we work with experts as we create test content, authored test content doesn’t always graduate. Some of it ends up in the bin. We learn from it; try to understand what went wrong but we don’t use it. We won’t use it until it has been proven. An automated test is faster but it shouldn’t cut corners. 

4. Meaningfulness

An automated test solution is going to be faster at returning data. If it’s not, well…it’s not what it says it is. If it’s been constructed well, it will be consistent in the data that it returns. Speed and reliability in scoring should be non-negotiable. However, a quick response that is always the same but doesn’t answer the question you’re asking, isn’t worth your money. 

The raw data from a test tells you one thing: how the person did on the test. If you’re using the test as most people use tests to estimate performance BEYOND the test, a scoring solution must be more sophisticated than percent correct. Again, if you’ve understood well the difficulty of the tasks presented, test results can start to reveal a lot about a person’s ability. 

This understanding of ability (when combined with additional detail about a person’s language performance) replicated across many different test-takers, with respect for and good use of psychometric techniques and tools like machine learning, can start to complete the picture you are looking for. 

Can we predict the weather? With the right understanding and tools, we can. 

Can we predict a person’s language ability? Again, with the right understanding and tools, we can. 

Now, will there be instances of error? Yes. 

Even very good measures (automated and non-automated approaches alike) produce false positives and false negatives. A false positive being the test saying that a thing is so when it isn’t and a false negative being when a test says that a thing is NOT so when it is. However, reducing error AND creating better techniques for interpreting the test results is a critical and rewarding part of our job. If an automated speaking assessment provider can’t answer both the former and the latter with specific details on how they are improving, keep looking; you can do better. 

5. Cost

This one may be more personal. You may feel differently than I do and as a result, this reason may be less compelling to you. An automated solution that leverages technology that can scale far beyond any human-supported process creates efficiencies. These efficiencies create value, in part by reducing costs. Now I don’t expect that all of these cost savings should be returned to the consumer. An investment has been made and future expenses must be planned for. 

However, if a company moves towards using automated solutions BUT keeps their price model the same, their world (meaning their balance sheet) may change but THE World won’t. That should matter to us. Automating an industry should be a catalyst for many changes, one of which is increasing who can benefit from that thing. Keeping pricing static prevents that. Perhaps I’m being redundant. This comes up a lot for me. I guess that’s one way to show that something is important. 

So, if you’re in the market for an automated speaking assessment solution, I have a really good feeling about how close you are to a good one. However, in evaluating any option (including Emmersion’s own), here’s a quick summary of things that should guide your search. 

  1. Where can I learn more about how your automated test works? 
  2. What efforts were taken to make sure that your automated solution is accurate? What about the ASR solution you use?
  3. How have you calibrated your automated assessment content? Do you throw poorly performing items out?
  4. How do you make sure that your automated test gives meaningful results?
  5. How has automating your assessment changed who can afford your assessment? 

Emmersion certifies language ability for organizations around the world using a fully automated and adaptive language assessment engine. It’s revolutionizing the language testing process with instant, accurate scoring for speaking, grammar, and writing ability in 9 global languages and counting. With a scalable assessment solution they can count on, hundreds of global businesses are building successful teams, reducing turnover, and improving their customer satisfaction scores. Learn more at www.Emmersion.ai.

Leave a Reply

Your email address will not be published. Required fields are marked *