TQE: Transcription Quality Evaluation

Project team


Helmer Strik	Eric Sanders	Robin Rutten	Joost van Doremalen	Robin Oostrum	Daan Broeder	Remco van Veenendaal	Laura van Eerten
CLST, RU, Nijmegen	CLST, RU, Nijmegen	CLST, RU, Nijmegen	CLST, RU, Nijmegen	CLST, RU, Nijmegen	MPI	TST-C	TST-C

TQE

Background

TQE (Transcription Quality Evaluation) makes it possible to automatically evaluate the quality of transcriptions. Pairs of files can be uploaded, in which each pair consist of an audio file and its phone transcription (PT). Such a pair is then processed in the following way:

the audio signal and the phonetic transcription are aligned,
segment boundaries are derived for each phone, and
for each segment-phone combination it is determined how well they fit together, i.e. for each phone a TQE measure (a confidence measure) is determined, a number ranging from 0-100%, indicating how good the fit is, i.e. what the quality of the phone transcription is (see, e.g., Figures 1 and 2).

The higher the number, the better the fit is. The output of the TQE tool consists of a TQE measure and the segment boundaries for each phone in the corpus.

Goal

The TQE tool thus makes it possible to find (sequences of) segments for which the match of the phone symbols with the audio signal is not optimal, in other words, the TQE tool can be used to check the quality of phonetic transcriptions. This can be useful for validating (manual) phonetic transcriptions, but also to compare and select (‘competing’) transcriptions, e.g. to study pronunciation variation.

Examples

In order to get a better idea of how TQE works, 2 examples are provided in Figures 1 and 2. As can be observed, the audio signal is the same in these two figures. However, in Figure 1 the correct transcription was used, while in Figure 2 we deliberately replaced it with an incorrect transcription. It can be observed that the TQE scores in Figure 2 are much lower, because the transcription symbols do not match the audio well. It can also be observed that for the second transcription symbol the score reduces from 90 to 39.

The reason is that the different phone transcription sequence in Figure 2 also yields a different segmentation, and part of the /r/ segment in Figure 2 contains part of the vowel, and thus the match is less good, and the TQE score becomes lower.


Figure 1. TQE scores for a correct transcription	Figure 2. TQE scores for an incorrect transcription

Conclusion

TQE is useful for validating, obtaining, and selecting phone transcriptions, for detecting phone strings (e.g. words) with deviating pronunciation, and, in general, it can be usefully applied in all research - in various (sub-)fields of humanities and language and speech technology (L&ST) - in which audio and PT's are involved.

Link	Description
PID	The TQE PID-site
Website	The projects website
Manual	Manual for the use of the TQE-webservices
Information	Additional information about TQE

CLARIN Centre

MPI

Project leader

Helmer Strik