On Wednesday January 23rd, the PoliMedia project team organized the symposium ‘Linking political debates and media’. In this symposium the project team presented their current research, and invited two speakers also doing computational research on political and news-media data.
The afternoon was opened by chair Laura Hollink (VUA), who asked for a show of hands. Both the humanities and computer sciences were well represented, mirroring the crossroads of political analysis by computational means.
The PoliMedia presentations were introduced by Martijn Kleppe (EUR) (SlideShare). As a historian, he described his interest in how political topics develop over time and how these topics are represented through media. However, the current situation is that he would need to go to a physical archive, which requires lots of traveling, as well as manual searching the archive. The digital alternative of LexisNexis (newspapers) and Academia (broadcasts) however offers limited material (e.g., no photographs in Lexis Nexis), and learning different search systems for different types of media. In contrast, PoliMedia aims to link the Dutch Hansard to the databases for newspapers (KB), radio (KB) and television (Academia), and would allow him cross-media analysis of the coverage of debates in a uniform search interface.
How does PoliMedia link these databases? In the presentation by Damir Juric (TUD) (SlideShare), he explained the steps needed to create such links. As a start, he developed a semantic model to describe the items (i.e., speeches, newspaper articles, radio bulletins and television programmes) expressive enough to describe important information, representing the people, topics, date and media types. The challenge then is to create a representation of the speech in parliament that contains enough information to be used as a query to retrieve relevant media articles from the archives. This query contains the speaker, a timeframe (for newspapers, 7 days after the speech), important terms (Named Entities) from the speech and important terms from the description of the debate. The results are then linked in RDF. Interesting findings in this process were that the definition of a link is rather vague, what does it mean to be related? Moreover, journalists often summarize the important speeches in debates, making it difficult to search by using speakers and words used by speakers, as journalists might use different words.
With links available, how should we then represent these to be usable for researchers analysing the coverage of political debates? The research concerning the user requirements was presented by Max Kemman (EUR) (SlideShare), in which he explained how users were involved right from the start of developing the PoliMedia interface. In a survey with 298 respondents from the Social Sciences and Humanities (organized in the AXES project), it was found that contemporary search is mainly performed by using Google. This finding has two consequences: 1) people compare other search systems to their experience with Google, and 2) the search task is mainly performed by using keywords. Following this survey, interviews with five researchers were done for PoliMedia specifically. Key findings were that it is very important to show clearly why search results are retrieved, and how they are ranked; researchers want to understand and feel in control of the search results.
The PoliMedia presentations were completed by Jaap Blom (NISV) showing a live demo of the current prototype. Important features he showed are the presentation of search results and how these can be filtered by role of speaker (minister, member of parliament), politician, political party and decade. Each search result leads to a speech in the context of the entire debate, where other speeches can be collapsed to provide a better overview. All in all, although a lot of work is still to be done, the prototype was received positively.
Marieke van Erp (VUA) was the first keynote speaker and gave a talk about the Newsreader project. In this project, big heaps of news articles at LexisNexis are analysed to extract events (with a focus on financial-economic events such as take overs and bankruptcies), find related events and create a story line spanning a multitude of news articles. The goal is that decision makers can find the entire context of an event in the news, and make informed decisions. An interesting difficulty is that newspapers also often report events that didn’t happen, based on rumours or speculation; the key research question is how to make sense of all the data.
The second keynote was by Maarten Marx (UVA), who presented the PoliticalMashup project. In this project, the Dutch Hansard at KB from 1914-1995 were combined with the archive of Parlando from 1995-2010 and the archive at overheid.nl from 2010 and onwards, and formatted as structured data. This data can then be used by research projects such as PoliMedia. Other features made possible by structuring this data were demonstrated as well; in the debates a user could now click on the name of a speaker and retrieve a picture and biography from parlement.com. Another feature shown was an n-gram viewer on the debate data, allowing for analysis of questions such as “When did social media become a topic in the Dutch parliament?”
All in all it was a very interesting afternoon; we hope the attendees were tickled to come up with new research questions, using computational techniques to allow new and innovative analyses of parliamentary and news media data.