Project team
Helmer Strik |
Eric Sanders |
Robin Rutten |
Joost van Doremalen |
Robin Oostrum |
Daan Broeder | Remco van Veenendaal |
Laura van Eerten |
CLST, RU, Nijmegen | CLST, RU, Nijmegen | CLST, RU, Nijmegen | CLST, RU, Nijmegen | CLST, RU, Nijmegen | MPI | TST-C | TST-C |
TQE
Background
TQE (Transcription Quality Evaluation) makes it possible to automatically evaluate the quality of transcriptions. Pairs of files can be uploaded, in which each pair consist of an audio file and its phone transcription (PT). Such a pair is then processed in the following way:
- the audio signal and the phonetic transcription are aligned,
- segment boundaries are derived for each phone, and
- for each segment-phone combination it is determined how well they fit together, i.e. for each phone a TQE measure (a confidence measure) is determined, a number ranging from 0-100%, indicating how good the fit is, i.e. what the quality of the phone transcription is (see, e.g., Figures 1 and 2).
The higher the number, the better the fit is. The output of the TQE tool consists of a TQE measure and the segment boundaries for each phone in the corpus.
Goal
The TQE tool thus makes it possible to find (sequences of) segments for which the match of the phone symbols with the audio signal is not optimal, in other words, the TQE tool can be used to check the quality of phonetic transcriptions. This can be useful for validating (manual) phonetic transcriptions, but also to compare and select (‘competing’) transcriptions, e.g. to study pronunciation variation.
Examples
In order to get a better idea of how TQE works, 2 examples are provided in Figures 1 and 2. As can be observed, the audio signal is the same in these two figures. However, in Figure 1 the correct transcription was used, while in Figure 2 we deliberately replaced it with an incorrect transcription. It can be observed that the TQE scores in Figure 2 are much lower, because the transcription symbols do not match the audio well. It can also be observed that for the second transcription symbol the score reduces from 90 to 39.
The reason is that the different phone transcription sequence in Figure 2 also yields a different segmentation, and part of the /r/ segment in Figure 2 contains part of the vowel, and thus the match is less good, and the TQE score becomes lower.
Figure 1. TQE scores for a correct transcription | Figure 2. TQE scores for an incorrect transcription |
Conclusion
TQE is useful for validating, obtaining, and selecting phone transcriptions, for detecting phone strings (e.g. words) with deviating pronunciation, and, in general, it can be usefully applied in all research - in various (sub-)fields of humanities and language and speech technology (L&ST) - in which audio and PT's are involved.
Link | Description |
---|---|
PID | The TQE PID-site |
Website | The projects website |
Manual | Manual for the use of the TQE-webservices |
Information | Additional information about TQE |