Emmersion Assessments
Version Descriptions

TrueNorth Assessments

WebCAPE Assessments

TrueNorth — English Speaking

10/14/2020 — Version D.20.09

With this version, we release full adaptivity in Part 1- Listen and Repeat. After each question in Part 1, the scoring engine updates its ability estimate and selects the next question based on what task will provide the most information about the test taker’s ability. As a test taker meets the challenge of a task, the next task will likely be more difficult. As the test items become too difficult for the test taker, the next task will become easier. The introduction of adaptivity also affords many improvements as listed below. You can find out more about this version update to the TrueNorth test here

NOTE: Which user confirms the update to this version will be recorded in the version history.

Technical Details

What’s improved

  • Reduced Part 1 – Listen and Repeat from 30 tasks to 12-20 tasks depending on test taker performance.
  • Increased accuracy with tasks dynamically adapting to match test-taker ability.
  • Enhanced barriers to cheating and test fraud.

What’s fixed

  • Consistency in prompt audio throughout the test with use of a single voice. 
  • Ensured continuity with Version C.19.3 with aligned test scores.
  • Removed the progress bar through Part 1.
  • Faster and more reliable audio downloads and uploads.

4/17/2019 — Version C.19.04

Version C.19.04 was a relatively minor incremental change which applied more aggressive item filtering to remove misfitting response patterns from the calibration data set. Removing the misfitting response patterns resulted in stronger criterion validity between the TrueNorth Score and the ACTFL scale. This in turn improved the agreement between the TrueNorth score and the ACTFL levels obtained through an OPI.

Technical Details:

  • Improved data screening process incorporating Rasch infit/outfit statistics

3/20/2019 — Version C.19.03

Version C.19.03 leveraged advanced statistical modeling to increase the accuracy of the TrueNorth Score. Previously, the TrueNorth score was calculated by analyzing a test-taker’s overall performance independently. This update introduced a more advanced process (utilizing Item Response Theory) that can take into account both the difficulty of each item in the assessment, and the relative performance of other test-takers answering the same items. This approach provides more information that we can use to increase the accuracy of the scoring algorithm.

Technical Details:

  • Transitioned from using the mean performance across items for a single test-taker to using the theta estimate and its standard error for a test-taker in the context of calibrated item parameters
  • Transitioned from using the mean to theta and standard error as the predictor values in the regression algorithm for predicting score equivalencies

TrueNorth — German Speaking

1/28/2021 — Version B.21.01.27

Version B.21.01.27 consisted of some technical improvements to the assessment including a more direct and stable connection to IBM Watson for scoring. Prior collected data was rescored via this connection and informed a slight update to the scoring algorithm. 

Technical Details:
 
  • Direct to IBM Watson Scoring.
  • Scoring algorithm update  

5/05/2020 — Version B.20.05.05

Version B.20.05.05 this was a minor update in test content. The items selected for the form was adjusted to optimize those that would most accurately reflect ability as scored by the automated scoring solution.  

Technical Details:

  • Updated items in form to optimize scoring using IBM Watson. 

5/01/2020 — Version A.20.05.01

Version A.19.03 was the first version for the German Speaking Assessment. We conducted a calibration pilot to determine the item and ability parameters to create an assessment form and accompanying scoring algorithm to best predict German speaking ability.

Technical Details:

  • Used Rasch infit/outfit statistics to screen calibration data to the most informative items and response patterns
  • Incorporated regression model to predict ability with a high-degree of reliability

TrueNorth — Japanese Speaking

7/07/2021 — Version C.21.07.07

 

Version C.21.07.07 consisted of some technical improvements to the assessment including a more direct and stable connection to the most recent version of IBM Watson’s Japanese ASR for scoring. Previously collected data was rescored and informed an update to the scoring algorithm.  Efforts were made to minimize disruptions to scoring and interpretation of score data. 

Technical Details

 

What’s improved

  • Updated items in form to optimize scoring using the newest version of IBM Watson.

  • Scoring algorithm update 

  • Ensured continuity with Version B.19.03.02 with aligned test scores

What’s fixed

 

  • Updated to the most recent version of IBM Watson’s Japanese ASR solution.

  • Removed inefficiencies in connecting to IBM Watson for Japanese scoring.

 

3/04/2020 — Version B.19.03.02

 Version B.20.03.04 utilizes automatic speech recognition scoring to immediately score test on test completion bringing this version of the Japanese test in line with the other TrueNorth language products. 

Technical Details:

  • Transitioned scoring mechanism to ASR solution backed by IBM Watson.
  • Drastically reduced score return time

3/25/2019 — Version A.19.03.02

Version A.19.03.02 was a minor scoring algorithm update published shortly after the original Japanese scoring version to incrementally improve the accuracy.

Technical Details:

  • Used improved infit/outfit statistical analysis to select better performing items
  • Transitioned from the using Rasch Model to a Graded Response Model to better account for differences in item difficulty for improved scoring accuracy

3/19/2019 — Version A.19.03

Version A.19.03 was the first version for the Japanese Speaking Assessment. We conducted a calibration pilot to determine the item and ability parameters to create an assessment form and accompanying scoring algorithm to best predict Japanese speaking ability.

Technical Details:

  • Used Rasch infit/outfit statistics to screen calibration data to the most informative items and response patterns
  • Incorporated regression model to predict ACTFL level with a high-degree of agreement

TrueNorth — Portuguese Speaking

8/12/2021 — Version B.21.08

Version B.21.08 consisted of some technical improvements to the assessment including a more direct and stable connection to the most recent version of IBM Watson’s Portuguese ASR for scoring. A psychometric transition from a partial credit model to a graded response model was adopted to better align with other language versions of Emmersion’s speaking tests. Data collected since Version A.19.03’s release was used to update item discrimination and threshold parameters and generate an improved scoring model.  

What’s improved

  • Updated items in form to optimize scoring using the newest version of IBM Watson.
  • Scoring algorithm update 

What’s fixed

  • Updated to the most recent version of IBM Watson’s Portuguese ASR solution.
  • Removed inefficiencies in connecting to IBM Watson for Portuguese scoring.
  • Adjusted model to reduce likelihood of misidentifying ability of high ability (mastery) and low ability (beginner) test takers.

7/18/2019 — Version A.19.07

Version A.19.03 was the first version for the Portuguese Speaking Assessment. We conducted a calibration pilot to determine the item and ability parameters to create an assessment form and accompanying scoring algorithm to best predict Portuguese speaking ability.

Technical Details:

  • Used Rasch infit/outfit statistics to screen calibration data to the most informative items and response patterns
  • Incorporated regression model to predict ACTFL level with a high-degree of agreement

TrueNorth — Spanish Speaking

5/02/2019 — Version C.19.05

Version C.19.05 was a minor incremental update to improve accuracy of the scoring algorithm by augmenting the calibration data set with additional response patterns to better represent a wider spectrum of language ability.

  • Improved lower score range by augmenting data with response patterns comprising zeros

3/21/2019 — Version C.19.03.02

Version C.19.03.02 was a minor incremental update to improve accuracy of the scoring algorithm by augmenting the calibration data set with additional response patterns to better represent a wider spectrum of language ability.

  • Improved upper score range by augmenting calibration data with high ability level response patterns
  • Used Rasch infit/outfit statistics to screen calibration data to the most informative items and response patterns

3/20/2019 — Version C.19.03

Version C.19.03 leveraged advanced statistical modeling to increase the accuracy of the TrueNorth Score. Previously, the TrueNorth score was calculated by analyzing a test-taker’s overall performance independently. This update introduced a more advanced process (utilizing Item Response Theory) that can take into account both the difficulty of each item in the assessment, and the relative performance of other test-takers answering the same items. This approach provides more information that we can use to increase the accuracy of the scoring algorithm.

Technical Details:

  • Transitioned from using the mean performance across items for a single test-taker to using the theta estimate and its standard error for a test-taker in the context of calibrated item parameters
  • Transitioned from using the mean to theta and standard error as the predictor values in the regression algorithm for predicting score equivalencies

WebCAPE — Reading, Grammar, and Vocabulary

Brigham Young University originally developed the WebCAPE assessment. The development of WebCAPE required significant involvement and research. Content was developed and reviewed, followed by rigorous testing to determine the level and significance of each question. The process was repeated multiple times until each question was calibrated and weighted according to its difficulty. Spanish was the first language developed, followed by French, German, Russian, ESL, Chinese, and Italian. Each language has a database of questions that range from 400 (Russian) to as many as 1000 (Spanish).

The license for WebCAPE was acquired by Emmersion Learning in 2019. Since the acquisition, no changes have been made to the scoring algorithm. Improvements to the WebCAPE algorithm are scheduled near the beginning of 2020 to improve the reliability, especially among low-scoring and high-scoring candidates.