Towards a shared LMF core for sign language lexicons

Onno Crasborn

30 January 2013

Nijmegen, 29-30 November 2012

Sign language research is a small branch of linguistics. Wouldn’t it therefore be extra easy for people to share and exchange resources? That was the thought that drove a recent workshop on sign language lexicons. With the recent stream of efforts to build sign language corpora, the creation of sign lexicons has taken on new importance, functioning not just as (learner) dictionaries but also as the vocabularies used to gloss sign language texts.

Signed dictionaries have been available for a relatively long time now. They started out as thick hardcopy volumes with drawings or photographs in the 1980s. When digital video arrived on the desktop in the 1990s, CD-ROMS and then DVDs and websites were produced. Although a series of workshops has been devoted to sign dictionaries in the past twenty years, most people have developed their own database underlying these dictionaries. Featuring as many database structures as there were products, and using almost as many different software packages. The linguists involved in making them talked to each other, but the software developers often didn’t. Not out of maliciousness, but rather because the deaf associations or educational centres that developed the dictionaries were poorly supported, IT-wise. Just as an illustration: in the Netherlands it took 20 years before a sign dictionary was first published by a standard lexicographic publisher (Van Dale), after the first one had been published by the Foundation for the Deaf and Hard of Hearing Child. Sign lexicography has rarely been a matter for standard lexicographic centres, but rather for deaf-related organisations. In terms of technology, people went for any solution that was locally feasible. This holds for most European countries, up until today.

Six European countries were represented at the workshop in November in Nijmegen, supplemented by the USA, Australia, New Zealand, and Hong Kong, making for a total of 34 participants over two days. They all brought an open attitude for the idea that the Lexical Markup Framework (LMF) could serve as a common format for exchanging data between databases. No doubt there are intricate pros and cons to LMF as a format, and perhaps there are few if any popular lexicon tools using it. As one of the CLARIN standard formats, however, it appears to be a useful vehicle for us to exchange thoughts and plan for actual data sharing. The workshop communicated CLARIN’s goals to the sign community, and featured introductions to LMF, LEXUS and ISOcat. During the presentations and dicussions, remarkably little disagreement was observed between the basic information categories that should be present in any sign lexicon. Rather than disagreement, we found simply that different centres put different emphasis on different domains. Most agreement can be seen in the way sign forms are described: the fruit of 50 years of sign language phonology research, starting with the seminal work of Bill Stokoe in 1960. While some lexicons contained extensive information on sociolinguistic variables (including dialectal variation found in most deaf communities), others did much less so. It’s especially in the domain of semantics where quite some variation in approaches can be observed. Definitions are rarely if ever given in the sign language, remarkably enough: the spoken language dominates in the semantic information, with at best example sentences in the sign language. Effectively all sign dictionaries are bilingual, although the role for the spoken language varies. The shared impression from the workshop was that the decisions that are made are as often based on practical concerns as on principled linguistic choices, and the different solutions are complementary rather than contradictory.

Taken together, the workshop provided more than enough encouragement for my team to proceed with the development of an LMF scheme for our lexicon of Sign Language of the Netherlands, which is linked to the Corpus NGT. By sharing both structure and content of the NGT lexicon and by describing the fields explicitly in ISOcat, we have good hopes that we can promote the creation of a LMF core structure to serve as an exchange format for sign lexicons.

Presentations from the workshop are available at the Sign Language Corpora wiki, http://www.signlanguagecorpora.org.