Call 2

The following projects have been accepted in Call 2

Acronym Name
ArthurianFiction Arthurian Fiction in Medieval Europe: Narratives and Manuscript
C-DSD Curating the Dutch Song Database
COAVA Cognition, Acquisition and Variation Tool
INPOLDER Integrated Parser and Lemmatizer Dutch in Retrospect
IPROSLA Integrating and publishing resources on sign language acquisition
NEHOL Negerhollands Database
VU-DNC VU Diachronic Newspaper Corpus
WAHSP Web-application for historical sentiment mining in public media
WIP War in Parliament
Title "Arthurian Fiction in Medieval Europe: Narratives and Manuscripts
Project coordinator Dr. A.A.M. Besamusca (Utrecht University)
Abstract The resources consist of two databases with data for literary research in the area of European Arthurian fiction. The databases are currently not available to researchers. Using the data and the technology of the project, scholars will be able to study their national literatures and book production from a European and interdisciplinary perspective. They will have access to data concerning a great many European languages and cultures and concerning manuscripts and their art. The users include literary historians in all European languages, cultural historians, medieval book specialists and art historians. Arthurian scholarship in particular will profit. The current datasets will be converted to an XML standard from which metadata will be extracted and made available to CLARIN harvesters. The demonstrator will allow users access to the data for searching as well as for adding and correcting records.
Title C-DSD - Curating the Dutch Song Database
Project coordinator Dr. E. Stronks (Utrecht University)
Abstract The Dutch Song Database (DSD) is a database in the field of Literary Studies. It contains (meta-)data on 140.000 songs and their 15.000 sources (songbooks, pamphlets, field recordings, etc.) from the Middle Ages to the present day. Built and rebuilt over a period of 25 years, consisting of four distinct datasets, this database was enlarged and enriched in many stages, with grants from (among others) NWO and OCW. The first online version of the DSD was published in 2007. As the DSD was built over a long period of time, parts of the metadata sets were adjusted to modern norms, but never the time was found to curate the database as a whole, that is to make it compliant to the current metadata standards and protocols. The ratio behind the analysis of data and the production of metadata in the DSD is internationally renowned and perceived as exemplary. Yet, because the DSD is not based upon now common, up-to-date standards for the encoding of metadata, the opportunities to export the DSD model are limited, as are the opportunities for (international) collaboration with other projects with similar source materials. The aim of the proposed project, ‘Curating the Dutch Song Database’ is to create new perspectives for international collaboration. The DSD will be employed in the CLARIN-EU Search & Develop’s Multimedia/Multimodal Demonstrator, in which the DSD will participate.
Title COAVA - Cognition, Acquisition and Variation Tool
Project coordinator Dr. L. Cornips (Meertens Institute)
Abstract This proposal for a CLARIN project is targeted at interdisciplinary research into the relation between language acquisition and language variation by developing a tool for easily exploring the linguistic characteristics of objects. Tool development departs from tools developed in previous projects. The project will demonstrate how the CLARIN infrastructure will enable eHumanities type research by making available datasets from traditionally distinct sub disciplines (first language acquisition and historical dialectology) in a standardized way and combine them with tools for data processing.
Title INPOLDER - Integrated Parser and Lemmatizer Dutch in Retrospect
Project coordinator Prof. dr. A. van Kemenade (Radboud University Nijmegen)
Abstract  In this project we aim to provide for a gap in the availability of syntactically analysed corpus material for Dutch. While there is such material for Modern Dutch, as well as for historical versions of various neighbouring languages, it is sorely lacking for historical Dutch. We propose to repair this situation efficiently by bootstrapping off existing resources. The first necessary processing steps will be provided by the Adelheid tagger, which is currently being made available through the Clarin infrastructure. The syntactic parsing process will make use of a variant of the Penn-Helsinki parser for historical texts. This parser is trained on annotated corpus text by Dan Bikel's parser generator, and we estimate that with a limited amount of semi-automatic annotation we will be able to deliver a parser for historical Dutch with a quite respectable level of quality. This parser will be made available to the scholarly community through a web interface in such a way that a work flow can be set up which starts from raw text and allows tagging, lemmatizing and parsing, with optional manual correction at all interface points.
Title IPROSLA - Integrating and publishing resources on sign language acquisition
Project coordinator Prof. dr. P. Fikkert (Radboud University Nijmegen)
Abstract  This resource curation project aims to integrate two different data sets on sign language acquisition in one central archive. First, a diverse set of longitudinal data of deaf children from deaf and hearing parents that has been collected at the UvA in the last 20 years, and secondly, a new collection of longitudinal data collected at the RU from hearing children of deaf parents. Neither of the two has been properly documented with metadata descriptions, and neither of the two has been safely archived anywhere. Only of the latter, some CLARIN-compliant annotation is available. The goal of the project is to document the two data sets with CMDI, and archive them at the MPI language archive.
Title NEHOL - Negerhollands Database
Project coordinator  

Prof. dr. P.C. Muysken (Radboud University Nijmegen)

Abstract This Resource Curation Project aims to make available to the Clarin community the data from the Dutch-lexifier Creole language Negerhollands. Negerhollands, the now extinct Creole language of the Virgins Islands, is unique in that there is a rich digitized corpus of historical as well as almost contemporary texts available that hardly have been studied. In a previous NWO project the data were carefully edited and digitized, but so far they have remained unavailable online. The format to be followed will be that of the already existing SUCA database.
Title VU-DNC - VU Diachronic Newspaper Corpus
Project coordinator Prof. dr. W. Spooren (VU University Amsterdam)
Abstract

The VU-DNC project has four main aims: 1) to make a unique diachronic corpus of Dutch newspaper articles from five major Dutch newspapers from 1950/1951 and 2002 (2 MW) available to the community of researchers in the humanities, 2) to extend the linguistic annotation of discourse with encoding for lexico-grammatical features of subjectivity and quotations, 3) to create a gold standard benchmark that can be used for testing and training OCR-postcorrection tools, by aligning uncorrected and corrected versions of the digitized printed newspaper articles from 1950/51, and 4) to improve the development of metadata within CLARIN by mapping the data categories for the part of speech and lemma coding to the data category registry, and extending the ISOcat categories for the historical spelling variation, subjectivity and quotations.

PID

Title WAHSP - Web-application for historical sentiment mining in public media
Project coordinator Prof. dr. T. Pieters (Utrecht University)
Abstract This project aims at using and populating the basic CLARIN infrastructure to enable advanced forms of text mining in large historical datasets of newspapers and journals. The challenge is to convert a specific text mining technology, so-called ‘sentiment mining’, into an accessible CLARIN compliant web-application addressing research questions of the intended user group of historians and policy researchers. The demonstrator will build on the sentiment mining tools developed in the STEVIN DuOMAn project. The interdisciplinary project-team (historians, linguists, computer scientists) will tailor existing tools to the specific needs of digital humanities research, with a special focus on opinions/perceptions regarding the use and abuse of drugs between 1900 and 1945. The development of this demonstrator prototype will also be used to inventory a list of requirements the CLARIN infrastructure should meet and desiderata it preferably should offer. The demonstrator will be hosted by the Huygens Institute, acting as a CLARIN A/B centre.
Title WIP - War in Parliament. The Second World War in Parliamentary Debates in the Netherlands
Project coordinator Dr. H. Piersma (NIOD: Institute for War, Holocaust and Genocide studies, Amsterdam)
Abstract

References to the Second World War (WW II) shaped political debate in the Netherlands for many decades. However, we have no systematic knowledge of why, how often, when, by whom or from which political party, and in which context, these references were made. Nor do we know the meanings politicians ascribed to the war years, the lessons the war was supposed to teach, and how all of this influenced political decision-making. Answering these questions will help us better understand the complex legacies of WW II.[1]

The WIP project wants to bridge the gap between current historical and social science practices and the possibilities offered by using large corpora and language resources, in particular Clarin tools for Dutch. We will do this by making a much used dataset - de Handelingen der Staten-Generaal (Dutch Hansard) - compliant with Clarin, ISOCAT and ISO/TC 37/SC 4 standards. We will also create an advanced search engine for this dataset with an intuitive and powerful query language based on XPath, and output which can be fed directly into further analysis programs like SPSS. Integrating this technology with important historical research questions will directly contribute to new and innovative ways of writing about history.

The demonstration value of the project is enhanced by the production of an enriched publication in which we answer the research question based on the curated data. URLs in the paper link to research questions operationalized as XQueries which are executable on the curated data.


[1] This project can be applied and tested in the research project ‘Legacies of collaboration. The exclusion and integration of former National-Socialist milieus in Dutch Society’ (zie: www.erfenissenvancollaboratie.nl) and fits in current NIOD research as proposed in the application ‘The politics of emotional mobilisation. The Second World War in parliamentary debates in the Netherlands, England and West-Germany’ for the Free Competition in the Humanities (NWO).