Call 4 Projects

The following projects have been accepted in Call 4

Acronym	Title
@PhilosTEI	TICCLing Philosophy: a TEI corpus-building workflow towards a new computational methodology for philosophy
QuaMeRDES	Quantitative Content Analysis of Media Researchers’ Data
e-BNM+	e-BNM+ Linked Data on Middle Dutch Sources Kept Worldwide
VALID	VALID - Vulnerability in Acquisition: Language Impairments in Dutch. Curating five valuable data sets
COBWWWEB	Connections Between Women and Writings Within European Borders
SHEBANQ	System for HEBrew Text: ANnotations for Queries and Markup
DSS	Dutch Ships and Seamen
EXILSEA	Exploiting ISOcat's Language Sections in ELAN and ANNEX
ColTime	Collaboration on Time-Based Resources
RemBench	A Digital Workbench for Rembrandt Research
OpenSoNaR	Online Personal Exploration and Navigation of SoNaR

Additional Information

Title	TICCLing Philosophy: a TEI corpus-building workflow towards a new computational methodology for philosophy
Project coordinator	Dr. Arianna Betti
CLARIN Centre	Huygens ING
Budget	To be determined
Abstract	The step to e-research in philosophy depends on the availability of high quality, easily accessible corpora in a sustainable format composed from multi-language, multi-script books from different historical periods. Corpora matching these needs are at the moment virtually non-existing. In this project we want to address this corpus building problem by developing and making available an open source, web-based, user-friendly workflow from textual digital images to TEI, based on an OCRopus/Tesseract webservice and a multilingual version of OCR-postcorrection webservice TICCLops. We shall demonstrate the tool on a multilingual, multi-script corpus of important 18th-20th-century European philosophical texts. These texts are of fundamental importance to understand the development of key scientific concepts such as explanation and truth in 18th-20th-century Europe. The tool will be of general interest and importance to solve problems of CLARIN-compliant corpora building.

Title	Quantitative Content Analysis of Media Researchers’ Data
Project coordinator	Dr. Jasmijn van Gorp
CLARIN Centre	The Netherlands Institute for Sound and Vision
Budget	€ 76K
Abstract	The QuaMeRDES demonstrator will enable quantitative content analysis of television and printed media. Key is providing meaningful links to various streams of information relevant to media studies scholars. This includes television broadcasts, newspaper archives, metadata records and also time-based information, notably subtitle files. QuaMeRDES will expand the existing tool MeRDES (Media Researchers’ Data Exploration Suite) , developed in the NWO-CATCH project BRIDGE. MeRDES is an analysis tool that visualises trends in extensive catalogues maintained by archives. This tool will be expanded to support principles of quantitative content analysis and explore new data visualisation paradigms. The result will integrate wrappers that read and write metadata and provenance information provided by the CLARIN infrastructure. QuaMeRDES will be evaluated in a specific case study that enables media studies scholars to come to new insights about how representations of migrants on Dutch television are related to social, cultural and political values in Dutch society.

Title	e-BNM+ Linked Data on Middle Dutch Sources Kept Worldwide
Project coordinator	Dr. André Bouwman
CLARIN Centre	Huygens ING
Budget	€ 80K
Abstract	The e-BNM is a database which collects and presents textual, codicological and historical information about thousands of Middle Dutch manuscripts kept world wide. e-BNM is one of the most important sources of knowledge for all scholars working in medieval studies pertaining to The Netherlands and Flanders. The proposed project provides e-BNM with a much needed conversion into a flexible datastructure that will turn e-BNM into a key open access resource to which many other resources can be easily linked. Persistency and long term sustainability will be of main importance. The project will deliver a web application for consultation, using facetted search, and will enable collaborative editing.

Title	VALID - Vulnerability in Acquisition: Language Impairments in Dutch. Curating five valuable data sets
Project coordinator	Dr. Jetske Klatter
CLARIN Centre	Max Planck Institute for Psycholinguistics (MPI)
Budget	€ 64K
Abstract	Research groups from the universities of Nijmegen, Amsterdam (UvA), and Utrecht have decided to prepare a nationwide, open access multimedia archive of language pathology data collected in the Netherlands, primarily on Dutch. In this enterprise, as many other research groups as possible, both from universities and care and educational institutes, will be involved. Although data sharing is becoming increasingly important to widen the scope and depth of empirical research in all kinds of areas, no tradition has been established yet in language and speech pathology research. This means that in various places a wealth of relevant and precious language data exists that cannot be easily and optimally accessed and exploited. The aim of the present project is to curate five existing, digital data sets, in order to make them available for scientific research in CLARIN-compatible format. This is a first, major step in the development of a VALID data archive.

Title	Connections Between Women and Writings Within European Borders
Project coordinator	Dr. Suzan van Dijk
CLARIN Centre	Huygens ING
Budget	€ 95K
Abstract	Recent NWO and COST projects funded the collection of data in view of research on the reception and internationalization of women’s literature in Europe up to the 20th century. The WomenWriters database currently contains references concerning 4.000 authors, their works and over 22.000 reception documents. To expand the research networks about female participation in the literary field we intend to: (1) connect this data to other national collections in women’s literature; (2) build a research application for scholars; and (3) create a set of standards to exchange data based on CLARIN guidelines for shared metadata and service---based infrastructures. These standards will be implemented by international partners: the Selma Lager Archive and the Literature Bank in Sweden; the Google Grant project Collaborative Annotation of Digitalized Literary Texts in Madrid; the Serbian Knjiženstvo; the Norwegian Female Robinsonades and the Swiss Women in Arcadia (1690-1800) databases.

Title	System for HEBrew Text: ANnotations for Queries and Markup
Project coordinator	Prof. dr. Wido van Peursen
CLARIN Centre	DANS
Budget	€ 101K
Abstract	The WIVU Hebrew Text Database contains the Hebrew text of the Old Testament enriched with many linguistic features at the morpheme level up to the discourse level. This work of decades is currently represented in an object database that is optimized for linguistically relevant queries. However, this resource is not readily available to researchers, and moreover, work based on this resource cannot be linked to it on the web. The curation part of this project will create a durable representation of the contents of the WIVU database in LAF plus an annotatable Linked Data export in RDF. There will be persistent addresses for fine-grained fragments. The demonstrator part will be a web application that enables researchers to perform linguistic queries on the web resource and preserve significant results as annotations to this resource. The more than 100 different features will be defined in ISOcat and used by CMDI profiles.

Title	Dutch Ships and Seamen
Project coordinator	Prof. dr. Lex Heerma van Voss
CLARIN Centre	Huygens ING
Budget	€ 98K
Abstract	As a sea-faring nation, a large portion of Dutch history is found on the water. However, much of the digitized historical source material is still scattered across many databases and archives. This curation and demonstrator project aims to bring together the rich maritime historical data preserved in the many different databases. We propose a (semantic) web-based infrastructure that will house various maritime-historical datasets. We will provide a tool chain and methodology for converting legacy datasets. The infrastructure includes common vocabularies to normalize and enrich existing data. Links are established between the datasets and to other relevant datasets on the Web. Although the infrastructure will be set up to facilitate 25+ identified datasets, we initially populate the infrastructure with four selected datasets. These will allow us to investigate two case studies in order to answer the historical research question “To what extent did patterns of shipping and recruitment in the Dutch maritime sector change over the course of the 18th and 19th centuries?”

Title	Exploiting ISOcat's Language Sections in ELAN and ANNEX
Project coordinator	Dr. Onno Crasborn
CLARIN Centre	Max Planck Institute for Psycholinguistics (MPI)
Budget	€ 65K
Abstract	This project aims to strengthen the CLARIN infrastructure by making annotated audiovisual resources more accessible for users of different languages. The multilingual features of ISOcat, CLARIN’s Data Category Registry, are not well exploited by current tools. This will be changed for ELAN and ANNEX, the audiovisual annotation and display tools of MPI, allowing users to select a display language for CMDI metadata and for Controlled Vocabularies. The curation part of the project enhances the Corpus NGT, the world’s first open access sign language corpus, by updating the existing IMDI metadata to CLARIN---standard CMDI descriptions using bilingual ISOcat categories, and likewise standardises the Controlled Vocabularies in the annotation files by using references to bilingual ISOcat categories.

Title	Collaboration on Time-Based Resources
Project coordinator	Dr. Onno Crasborn
CLARIN Centre	Max Planck Institute for Psycholinguistics (MPI)
Budget	€ 64K
Abstract	With the growing amount of online language resources, the need to exploit these in innovative ways is also increasing. This project aims to strengthen the CLARIN infrastructures by extending ELAN and ANNEX for the annotation and display of time---based resources such as audio and video with a referencing and note exchanging system. Precise and persistent references to time slices and annotations in the form of hyperlinks will be made possible. A system for exchanging notes containing these links and exploiting ISOcat will be created, allowing for efficient collaboration among researchers. ANNEX will be extended so that multiple references can be presented side---by---side, allowing for the display of multiple examples of the same phenomenon, for instance. While the system will be tested on the basis of linguistic data, the functionality can be exploited by any research that makes use of time---based resources, whether in the humanities or the social sciences. The demonstrator part will be a web application that enables researchers to perform linguistic queries on the web resource and preserve significant results as annotations to this resource. The more than 100 different features will be defined in ISOcat and used by CMDI profiles.

Title	A Digital Workbench for Rembrandt Research
Project coordinator	Prof. dr. Volker Manuth
CLARIN Centre	Huygens ING
Budget	€ 121K
Abstract	The goal of this project is to demonstrate how (art) historians can benefit from linking a pivotal CLARIN resource, namely eLaborate, which adheres to ISOCat, to resources created by museums, archives and libraries, which adhere to other standards for metadata. For that purpose, we intend to construct a demonstrator that connects a number of databases centred around the life and art of Rembrandt van Rijn. In this project we will demonstrate how an initial version of the Rembrandt Documents (RemDoc) database, built as an eLaborate application, can serve as an integrated tool for art history research by virtue of its coupling with related resources created by the Rijksbureau voor Kunsthistorische Documentatie (RKD), as well as with a standard university library catalogue.

Title	Online Personal Exploration and Navigation of SoNaR
Project coordinator	Prof. dr. Max Louwerse
CLARIN Centre	INL (Institute for Dutch Lexicology)
Budget	€ 120K
Abstract	The OpenSoNaR project will provide end users with the online means for extracting information from the SoNaR-500 reference corpus of contemporary written Dutch. This includes exploring the texts and navigating through the SoNaR-500 corpus by way of the metadata. The project makes the contents of the new SoNaR-500 reference corpus available to laymen and specialist researchers alike. Based on the desiderata of four distinct CLARIN-NL priority groups, access to the corpus for navigation, exploration and exploitation in an online environment will be through a front-end, to be called WhiteLab, providing a range of interfaces that provide user-driven functionality. The back-end is the new retrieval engine BlackLab developed by INL (Dutch Institute for Lexicology), designed to provide access to corpora for linguistic and lexicographical use in the CLARIN infrastructure.