Project team
Louis ten Bosch | Eric Sanders | Lou Boves | Daan Broeder |
CLST, RU, Nijmegen | CLST, RU, Nijmegen | CLST, RU, Nijmegen | Max Planck Instituut |
Background
Currently there are many speech recordings that were made ‘in the field’ some time ago, for example by anthropologists to describe certain cultural scenes and events. Now, years later, only the original recording, in combination with a global description in written form, is available. Often a verbal transcription is lacking, which hampers the accessibility and disclosure of the information in these recordings. Researchers have attempted to find methods to annotate the speech recordings in such a way that the recordings become searchable via a search on text. In many cases such an annotation must be done manually, since there are no Automatic Speech Recognition resources available for an automated speech-to-text conversion, because there is not enough annotated speech material available for the language to build a mature ASR system. This project attempts to provide an intermediate solution by an automatic annotation on the level of phones rather than words.
Aim
The AAM-LR project aims at building a speech analysis tool that will help field researchers to annotate and enrich multimodal audio- and video-recordings. The project is targeting at ‘found’ recordings. Usually, the enrichment of such recordings, e.g. historical recordings of speech of under-resourced or endangered languages for which little knowledge sources are available, is a time-consuming task. The AAM-LR tool aims at supporting this annotation process. It allows the use of different tiers along which time intervals can be defined at which speech events occurs in a certain recording. The service will provide a global phonetic annotation, using phone models (and phonetic features) based on ASR-inspired techniques.
To define the conceptual setting of the project, novel categories were proposed in a scheme to be included in ISOCAT. This scheme is developed during the project and is currently being implemented in collaboration with Menzo Windhouwer and Ineke Schuurman.
Method
An ASR system was redesigned to be able to do a free phone loop on speech data that are presented via CLAM to a dedicated web service. Since the found data have an unknown origin, in general a word-based recognition is impossible. The algorithms provides a ‘best guess’ about the phones in the signal. in the figure an example of the free phone loop recognition for an input wave file is shown. The picture is taken from the ELAN application. The tool provides a segmentation by putting boundaries in the signal, based on the local acoustic contents of the signal, and assigns labels to each segment between any two consecutive boundaries.
Type | Link |
---|---|
Webservice | http://hdl.handle.net/1839/00-SERV-0000-0000-004F-F |
Documentation | Final Report AAM-LR document, plus technical appendices Proposal ISOCAT categories AAM-LR (xls) |