On March 14th, I presented a paper about the SIRUP project at IUI’17. IUI stays for Intelligent User Interface, and it is an international conference where the Human-Computer Interaction (HCI) community meets the Artificial Intelligence (AI) community. It is a highly competitive venue, with an acceptance rate below 25%. Our paper introduces a model for serendipity in recommender systems using curiosity theory. Here the abstract of the paper:
In this paper, we propose a model to operationalise serendipity in content-based recommender systems. The model, called SIRUP, is inspired by the Silvia’s curiosity theory, based on the fundamental theory of Berlyne, aims at (1) measuring the novelty of an item with respect to the user profile, and (2) assessing whether the user is able to manage such level of novelty (coping potential). The novelty of items is calculated with cosine similarities between items, using Linked Open Data paths. The coping potential of users is estimated by measuring the diversity of the items in the user profile. We deployed and evaluated the SIRUP model in a use case with TV recommender using BBC programs dataset. Results show that the SIRUP model allows us to identify serendipitous recommendations, and, at the same time, to have 71% precision.
The paper is available here.
Here you can find the slides for my talk at ICT Open 2017:
The DIVE+ team is present on the 21st and 22nd of March at the ICTOpen 2017 conference to present and showcase the latest developments of the tool. As part of the latest developments, DIVE+ is also integrated in the CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) research infrastructure, next to other media studies research tools (CLARIAH MediaSuite), that aim at supporting the media studies researchers and scholars by providing access to digital data and tools. During the Meet the Demo sessions we also screencast the new DIVE+ interface that provides support for the automatic generation of narratives and storylines. Following you can check the DIVE+ presentation.
For more insights, you can also check our short demo!
We are happy to announce that our project exploring relation extraction from natural language has 2 extended abstracts accepted at the Collective Intelligence conference this summer! Here are the papers:
- Crowdsourcing Ambiguity-Aware Ground Truth: we apply the CrowdTruth methodology to collect data over a set of diverse tasks: medical relation extraction, Twitter event identification, news event extraction and sound interpretation. We prove that capturing disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, a method which enforces consensus among annotators. By applying our analysis over a set of diverse tasks we show that, even though ambiguity manifests differently depending on the task, our theory of inter-annotator disagreement as a property of ambiguity is generalizable.
- Disagreement in Crowdsourcing and Active Learning for Better Distant Supervision Quality: we present ongoing work on combining active learning with the CrowdTruth methodology for further improving the quality of DS training data. We report the results of a crowdsourcing experiment ran on 2,500 sentences from the open domain. We show that modeling disagreement can be used to identify interesting types of errors caused by ambiguity in the TAC-KBP knowledge base, and we discuss how an active learning approach can incorporate these observations to utilize the crowd more efficiently.
For those curious about the Big Data Europe technology stack and who rather view videos than read descriptions and documentation, we have started a youtube video channel where BDE researchers explain the how, why and what of the BDE stack. Embedded below is a short clip of Hajira Jabeen explaining how BDE enables someone to get started with Big Data. More clips are available on the channel.
Source: Victor de Boer
Our paper “Social Network Analysis for Trust Prediction” by Davide Ceolin and Simone Potenza of KonnketID has been accepted at the IFIP Trust Management Conference 2017. This paper is the result of a Network Institute Voucher to establish a collaboration between our group and KonnektID and it is also partly funded by the COMMIT Big Data Veracity project.
Abstract: From car rental to knowledge sharing, the connection between online and offline services is increasingly tightening. As a consequence, online trust management becomes crucial for the success of services run in the physical world. In this paper, we outline a framework for identifying social web users more inclined to trust others by looking at their profiles. We use user centrality measures as a proxy of trust, and we evaluate this framework on data from Konnektid, a knowledge-sharing social Web platform. We introduce four metrics for measuring trust. Performance achieved an accuracy between 43% and 99%.
On February 24th 2017 the Kick-off meeting for the Linkflows project took place. The meeting was hosted by Vrije Universiteit Amsterdam. During this meeting, the partners involved in the project were introduced.
Linkflows is an innovation PhD project with two external contributors that introduces the timely topic of semantic publishing and scientific assessment, and links it to the existing research, collections and collaborations central in the Web & Media Group, e.g. linked data, crowdsourcing, quality assessment and multimedia collections.
The aim of the Linkflows project is to make scientific contributions on the Web, e.g. articles, reviews, blog posts, multimedia objects, datasets, individual data entries, annotations, discussions, etc., better valorized and efficiently assessed in a way that allows for their automated interlinking, quality evaluation and inclusion in scientific workflows.
The PhD candidate for this project is Cristina-Iulia Bucur. The daily supervisors are Tobias Kuhn and Davide Ceolin and co-promoter is Lora Aroyo.
The partners involved in the Linkflows project:
On the 10th of March “Narrativizing disruption”, a DIVE+ centered CLARIAH-funded research pilot, was presented at the CLARIAH toog day. The pilot (2017-2018) focuses on the question how exploratory search can support media researchers interpret disruptive media events as lucid narratives. Disruptive media events, such as terrorist attacks or environmental disasters, are difficult to interpret due to an inability to grasp the story. This leads to problems for media scholars, who analyse how narratives construct different political, economic or cultural meanings around such events. Offering media scholars the ability to explore and create lucid narratives about media events therefore greatly supports their interpretative work.
This project studies how exploratory search can help to understand how ‘disruptive’ events are constructed as narratives across media, and instilled with specific cultural-political meanings. This project approaches this question by using CLARIAH components (DIVE+’s navigation and bookmarking pane) to examine how scholars use and create narratives to understand media events. Academic insights conclude how exploratory search supports narrative generation. Software-specific insights produce recommendations at the entity, interface and user level, provide starting points for media research, and recommendations for at the entity, interface and user level, provide starting points for media research, and recommendations for auto-generating narratives based on exploratory search practices.
Our ControCurator paper abstract titled “ControCurator: Understanding Controversy Using Collective Intelligence” has been accepted at Collective Intelligence 2017. In this paper we describe the aspects of controversy: the time-persistence, emotion, multiple actors, polarity and openness. Using crowdsourcing, the ControCurator dataset of 31888 controversy annotations was obtained for the relevance of these aspects to 5048 Guardian articles. The results indicate that each of these aspects is a positive indicator of controversy, but also that there is a clear difference in their signal strength. Most notably, the emotion was found to be the highest indicator. Though, all the measured controversy aspects were found to positively correlate with controversy. These results suggest that the controversy model is accurate and useful for modeling controversy in news articles.
The full dataset with controversy annotations is available for download at https://github.com/ControCurator/controcurator-corpus/releases/tag/1.0
On 7th of March the DIVE+ project was presented at Cross Media Café: Uit het Lab. DIVE+ is result of a true inter-disciplinary collaboration between computer scientists, humanities scholars, cultural heritage professionals and interaction designers. In this project, we use the CrowdTruth methodology and framework in order to crowdsource events for the news broadcasts from The Netherlands Institute for Sound and Vision (NISV) that are published under open licenses in the OpenImages platform.
As part of the digital humanities effort, DIVE+ is also integrated in the CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) research infrastructure, next to other media studies research tools, that aims at supporting the media studies researchers and scholars by providing access to digital data and tools. In order to develop this project we work together with eScience Center, which is also funding the DIVE+ project.
Check the slides!