Collective Intelligence 2017 – Trip Report

On June 15-16 the Collective Intelligence conference took place at New York University. The CrowdTruth team was present with Lora Aroyo, Chris Welty and Benjamin Timmermans. Together with Anca Dumitrache and Oana Inel we published a total of six papers at the conference.


The first keynote was presented by Geoff Mulgan, CEO of NESTA. He set the context of the conference by stating that there is a problem with technological development, namely that it only takes knowledge out of society and does not put it back in. Also, he made it clear that many of the tools we see today like Google Maps are actually nothing more than companies that were bought and merged together. This combination of things is what creates the power. He also defined what the biggest trends are in collective intelligence: the observation e.g. citizen generated data on floods, predictive models e.g. fighting fires with data, memory e.g. what works centers on crime reduction, and judgement e.g. adaptive learning tool for schools. Though, there are a few issues with collective intelligence: Who pays for all of this? What skills are needed for CI? What are the design principles of CI? What are the centers of expertise? These are all not yet clear. However, what is clear is that there is a new field emerging through combining AI with CI: Intelligence Design. We used to think systems resolve this intelligence, but actually we need to steer and design it.

In a plenary session there was an interesting talk on public innovation by Thomas Kalil. He defined the value of concreteness as things that happen when particular people or organisations take some action in pursuit of a goal. These actions are more likely to affect change if you can articulate who would needs to do what. He said he would like to identify the current barriers to prediction markets and areas where governments could be a user and funder of collective intelligence. This can be achieved through connecting people that are working to solve similar problems locally, e.g. in local education. Then change can be driven realistically, by making clear who needs to do what. Though, it was noted also that people need to be willing and able for change to work.

Parallel Sessions

There were several interesting talks during the parallel sessions. Thomas Malone spoke about using contest webs to address the problem of global climate change. He claims that funding science can be both straightforward and challenging, for instance government policy does not always correctly address the need of a domain issues, and even conflicts of interest may exist. Also, fundamental research can be tough to convince the general public of its use, as it is not sexy. Digital entrepreneurship is furthermore something that is often overlooked. There are hard problems, and there are new ways of solving them. It is essential now to split the problems up into parts, solve each of them with AI, and combine them back together.

#CrowdTruth at @cicon17 presented by @cawelty #Crowdsourcing Ambiguity-aware #GroundTruth

— Lora Aroyo (@laroyo) June 15, 2017

Chris Welty presented our work on Crowdsourcing Ambiguity Aware Ground Truth at Collective Intelligence 2017.

Also Mark Whiting presented his work on Daemo, a new crowdsourcing platform that has a self-governing marketplace. He stress the fact that crowdsourcing platforms are notoriously disconnected from user interests. His new platform has a user driven design, in order to get rid of the flaws that exist in for instance Amazon Mechanical Turk.

Plenary Talks

Daniel Weld from the University of Washington presented his work on argumentation support in crowdsourcing. Their work uses argumentation support in crowd tasks to allow workers to reconsider their answers based on the argumentation of others. They found this to significantly increase the annotation quality of the crowd. He also claimed that humans will always need to stay in the loop of machine intelligence, for instance to define what the crowd should work on. Through this, hybrid human-machine systems are predicted to become very powerful.

Hila Lifshitz-Assaf of NYU Stern School of Business gave an interesting talk on changing innovation processes. The process of innovation has changed from a lane inventor, to labs, to collaborative networks, and now into open innovation platforms. The main issue with this is that the best practices of innovation fail in the new environment. In standard research and development there is a clearly defined and selectively permeable, whereas with open innovation platforms this is not the case. Experts can participate from in and outside the organisation. It is like open innovation: managing undefined and constantly changing knowledge in which anyone can participate. For this to work, you have to change from being a problem solve to a solution seeker. It is a shift from thinking: The lab is my world, to the world is my lab. Still, problem formulation is key as you need to define the problems in ways that cross boundaries. The question always remains, what is really the problem?

Poster Sessions

In the poster sessions there were several interesting works presented, for instance work on real-time synchronous crowdsourcing using “human swarms” by Louis Rosenberg. Their work allows people to change their answers through the influence of the rest of the swarm of people. Another interesting poster was by Jie Ren of Fordham University, who presented a method for comparing the divergent thinking and creative performance of crowds compared to experts. We ourselves had a total of five posters covering both poster sessions, which were received well by the audience.

@8w @cawelty @laroyo presenting Part I of our #CrowdTruth posters with @oana_inel @anouk_anca at the @cicon17 #informationExtraction

— Lora Aroyo (@laroyo) June 15, 2017

Posted in CrowdTruth, Projects

ESWC 2017 – Trip Report

Between 28th of May and 1st of June 2016 the 14th Extended Semantic Web Conference took place in Portorož, Slovenia. As part of the CrowdTruth team and project, Oana Inel presented her paper written together with Lora Aroyo in the first day of the conference. More about the paper that was presented can be found in a previous post. In the last day of the conference, Lora was the keynote speaker.

The Semantic Web group at the Vrije Universiteit Amsterdam had other great presentations. During the Scientometrics Workshop Al Idrissou talked about the SMS platform that links and enriches data for studying science. During the poster and demo session people were invited to check SPARQL2Git: Transparent SPARQL and Linked Data API Curation via Git by Albert Meroño-Peñuela and Rinke Hoekstra. Furthermore, the Semantic Web group had a candidate paper for the 7-year impact award “OWL reasoning with WebPIE: calculating the closure of 100 billion triples”, by Jacopo Urbani, Spyros Kotoulas, Jason Maassen, Frank van Harmelen and Henri Bal.


I’ll start by writing a couple of words about the keynotes, which covered this year a high range of areas, domains and subjects. In the first keynote presentation at ESWC 2017, on Tuesday, Kevin Crosby, from RavenPack, stressed the importance of data as a factor in decision making for financial markets. In his talk entitled “Bringing semantic intelligence to financial markets”, he focused on the current issues related to data analytics in decision making: the lack of skills and expertise, the quality and completeness of data and the timeliness of data. However, the most important issue is the fact that although we live in the age of data, only around 29% of the decisions in the financial market are made based on data.

The second keynote speaker was John Sheridan, the digital director of The National Archives in UK. While giving a nice overview of the British history, he talked about how semantic technologies are used to preserve the history at The National Archives in UK, in a talk entitled “Semantic Web technologies for Digital Archives”. Nowadays, semantic technologies are used at large in order to make the cultural heritage collections publicly available online. However, people still struggle to search and browse through archives without having the context of the data. As a take home message, we need to work towards the second generation digital archives that should measure risks, provide trust evidence, redefine context, embrace uncertainty, enable use and access.

In the last day of the conference Lora Aroyo gave her keynote presentation, “Disrupting the Semantic Comfort Zone”. Lora started her keynote by looking back into the history of Semantic Web and AI and how her own journey embraced the changes along the way. Something was clear: the humans were always in the centre and they still continue to be. The second part of the presentation focused on introducing the underlying idea of the CrowdTruth project. As a final note, I’ll leave here the following question from Lora: “Will the next AI winter be the winter of human intelligence or not?”

NLP & ML Tracks

Federico Bianchi presented during the ML track an approach that uses active learning to rank semantic associations. The problem is well-known, we have an information overload in contextual KB exploration and even for small amounts of texts there is a lot of data to be considered. In order to determine which semantic associations are most interesting to users, Actively Learning to Rank Semantic Associations for Personalized Contextual Exploration of Knowledge Graphs defines a ranking function based on a serendipity heuristic, i.e., relevance and unexpectedness.

The paper “All that Glitters Is Not Gold – Rule-Based Curation of Reference Datasets for Named Entity Recognition and Entity Linking” by Kunal Jha, Michael Röder and Axel-Cyrille Ngonga Ngomo draws the attention over the current gold standards and makes similar claims as the ones we presented in our paper: the gold standards for not share a common set of rules for annotating named entities, they are not thoroughly checked and they are not refined and updated to newer versions. Thus, the need for the EAGLET benchmark curation tool for named entities!

Using semantic annotations for providing a better access to scientific publications is a subject that nowadays caught the attention of many researchers. Sepideh Mesbah, PhD student at Delft University of Technology presented “Semantic Annotation of Data Processing Pipelines in Scientific Publications”, a paper that proposes an approach and workflow for extracting semantically rich metadata from scientific publications, by classifying the content of scientific publications and extracting the named entities (objectives, datasets, methods, software, results).

Jose G. Moreno presented the paper “Combining Word and Entity Embeddings for Entity Linking” which introduces a natural idea for entity linking by using a combination of entity and word embeddings. The claims of the authors are the following: you shall know a word by the company it keeps and you shall know an entity by the company it keeps in a KB, word context by alignment, word/entity context by concatenation.

Social Media Track

The Social Media track started with a presentation by Hassan Saif – “A Semantic Graph-based Approach for Radicalisation Detection on Social Media”. The approach presented in the paper uses semantic graph representation in order to discover patterns among pro and anti ISIS users on social media. Overall, pro-ISIS users tend to discuss about religion, historical events and ethnicity, while anti-ISIS users focus more on politics, geographical locations and intervention against ISIS. The second presentation – “Crowdsourced Affinity: A Matter of Fact or Experience” by Chun Lu – took us in a different domain – a travel destination recommendation scenario that is based on a user-entity affinity, i.e., the likelihood of a user to be attracted by an entity (book film, artist) or to perform an ection (click, purchase, like, share). The main finding of the paper was that in general, a knowledge graph helps to assess more accurately the affinity, while a folksonomy helps to increase its diversity and novelty. The Social Media Track had two papers nominated for best student research paper – the aforementioned paper and the paper “Linked Data Notifications” presented by Sarven Capadisli, Amy Guy, Christoph Lange, Sören Auer, Andrei Sambra and Tim Berners-Lee. The latter was also the winner!

Best student paper award of #eswc2017 goes to @csarven and @rhiaro for Linked Data Notifications

June 1, 2017

In-Use and Industrial Track

Social media was highly relevant for the In-Use track as well. The Swiss Armed Forces is developing a Social Media Analysis system aiming to detect events such as natural disasters and terrorists activity by performing semantic tweet analysis. If you want to know more, you can the paper “ArmaTweet: Detecting Events by Semantic Tweet Analysis”. This track has as well nominations for best in-use paper. The winning paper in this category was “smartAPI: Towards a More Intelligent Network of Web APIs”, presented by Amrapali Zaveri.

Won the best in-use paper award for our #smartAPI work! Congrats to all co-authors! #eswc2017 #api #FAIR

— Amrapali Zaveri (@AmrapaliZ) June 1, 2017

Open Knowledge Extraction Challenge

During the Open Knowledge Extraction challenge, Raphaël Troncy presented the participating system ADEL – an adaptable entity extraction and linking framework, also the challenge winning entry. The ADEL framework can be adapted to a variety of different generic or specific entity types that need to be extracted, as well as to different knowledge bases to be disambiguated to, such as DBpedia and MusicBrainz). Overall, this self-configurable system tries to solve a difficult problem with current NER tools, i.e., the fact that they are only tailored for specific data, scenarios and applications.

OKE Challenge winner @ #eswc2017 #oke2017 #benchmarking #bigdata #linkeddata #semanticweb #H2020

— Project HOBBIT (@hobbit_project) June 2, 2017


On Monday, during the second day of workshops I attended two workshops, 3rd international workshop on Semantic Web for Scientific Heritage, SW4SH 2017 and Semantic Deep Learning, SemDeep-17, now at the first edition. During the SW4SH 2017 workshop, Francesco Beretta had a detailed keynote, entitled “Collaboratively Producing Interoperable Ontologies and Semantically Annotated Corpora” in which he presented a couple of projects for digital humanities (, the corpus analysis environment TXM, among others) and how linked (open) data, ontologies, automated tools for natural language processing and semantics are finding their place in the daily projects of humanities scholars. However, all these tools, approaches and technologies are not 100% embraced, as humanities scholars are seldom content with precision values of 90% and they feel the urge of manually tweak the data, until it looks perfect.

During SemDeep-17, Sergio Oramas presented the paper “ELMDist: A vector space model with words and MusicBrainz entities”. This article makes it clear that it’s still unclear how NLP and semantic technologies can contribute in Music Information Retrieval areas such as music and artist recommendation and similarity. The approach presented uses NLP processing in order to disambiguate the entities from the musical texts and then runs the word2vec algorithm over this sense level space. Overall, their results show promising results, meaning that textual descriptions can be used in order to improve the Music Information Retrieval area. The last paper of the workshop, “On Semantics and Deep Learning for Event Detection in Crisis Situations”, was presented by Hassan Saif. As the title suggests, the paper tries to solve the problem of event detection in crisis situations from social media, using Dual-CNN, a semantically-enhanceddeep learning model. Altought the model has successful results in identifying the existence of events and their types, its performance drops significantly when identifying event-related information such as the number of people affected, total damages.

Posted in CrowdTruth, Projects

Kickoff meeting Mixed Methods in the Humanities projects

Last week, the Volkswagen Stiftung-funded “Mixed Methods’ in the Humanities?” programme had its kickoff meeting for all funded projects in in Hannover, Germany. Our ArchiMediaL project on enriching and linking historical architectural and urban image collections was one of the projects funded through this programme and even though our project will only start in September, we already presented our approach,  the challenges we will be facing and who will face them (our great team of post-docs Tino Mager, Seyran Khademi and Ronald Siebes). Group picture. Can you spot all the humanities and computer science people?Other interesting projects included analysing of multi-religious spaces on the Medieval World (“Dhimmis and Muslims”); the “From Bach to Beatles” project on representing music and schemata to support musicological scholarship as well as the nice Digital Plato project which uses NLP technologies to map paraphrasing of Plato in the ancient world. An overarching theme was a discussion on the role of digital / quantitative / distant reading methods in humanities research. The projects will run for three years so we have some time to say some sensible things about this in 2020.



Share This:

Source: Victor de Boer

Posted in Staff Blogs, Victor de Boer

EVENTS2017 workshop at SEMANTiCS

An important role in the interpretation of cultural heritage collections is played by ‘historic events’. In the SEMANTiCS workshop Events2017: Understanding Events Semantics in Cultural Heritage, to be held on 11 Sept 2017, we will investigate and discuss challenges around identifying, representing, linking and reasoning about historical events. We invite full papers (8p) as well as short papers (4p) on this topic.

The call for papers is out now.  You have until July 10, 2017 to submite your contribution. Contributions can include original research papers, position papers, or papers describing tools, demonstrators or datasets. Accepted contributions will be published on the CEUR-WS website (or equivalent).

More information at

Share This:

Source: Victor de Boer

Posted in Staff Blogs, Victor de Boer

Harnessing Diversity in Crowds and Machines for Better NER Performance

Today, I presented in the Research Track of ESWC 2017 my work entitled “Harnessing Diversity in Crowds and Machines for Better NER Performance”. Following, you can check the abstract of the paper and the slides that I used during the presentation.


Over the last years, information extraction tools have gained a great popularity and brought significant performance improvement in extracting meaning from structured or unstructured data. For example, named entity recognition (NER) tools identify types such as people, organizations or places in text. However, despite their high F1 performance, NER tools are still prone to brittleness due to their highly specialized and constrained input and training data. Thus, each tool is able to extract only a subset of the named entities (NE) mentioned in a given text. In order to improve \emph{NE Coverage}, we propose a hybrid approach, where we first aggregate the output of various NER tools and then validate and extend it through crowdsourcing. The results from our experiments show that this approach performs significantly better than the individual state-of-the-art tools (including existing tools that integrate individual outputs already). Furthermore, we show that the crowd is quite effective in (1) identifying mistakes, inconsistencies and ambiguities in currently used ground truth, as well as in (2) a promising approach to gather ground truth annotations for NER that capture a multitude of opinions.

Posted in CrowdTruth, Projects

ICT4D at Sustainability day

During the National Day for Sustainability (Nationale dag voor duurzaamheid in het hoger onderwijs 2017), the ICT4D team presented our current research and educational activities to the many participants of this event, hosted at VU. Anna Bon and myself presented our work on sustainable methodologies for ICT4D as well as current work on small and sustainable ICT platform (Kasadaka), see the slides below.

After this, the participants got a chance to meet our students and their very nice projects up close in an interactive demonstration session. Selected ICT4D students presented the voice-accessible services.




All photos by SURFSara, more pictures of the event can be found on Flickr.

Share This:

Source: Victor de Boer

Posted in Staff Blogs, Victor de Boer

Amsterdam Data Science – Coffee & Data: Controversy in Web Data

On 9th of June we are organising a Coffee & Data event with the Amsterdam Data Science community. The topic is “How to deal with controversy, bias, quality and opinions on the Web” and will be organised in the context of the COMMIT ControCurator project. In this project VU and UvA computer scientists and humanities researchers investigate jointly the computational modeling of controversial issues on the Web, and explore its application within real use cases in existing organisational pipelines, e.g. Crowdynews and Netherlands Institute for Sound and Vision.

The Agenda is as follows:

09:00-09:10 Coffee

Introduction & Chair by Lora Aroyo, Full Professor at the Web & Media group (VU, Computer Science)

09:10 – 09:25: Gerben van Eerten – Crowdynews deploying ControCurator

09:25 – 09:40: Kaspar Beelen – Detecting Controversies in Online News Media (UvA, Faculty of Humanities)

09:40 – 09:50: Benjamin Timmermans – Understanding Controversy Using Collective Intelligence (VU, Computer Science)

09:50 – 10:00: Davide Ceolin – (VU, Computer Science)

10:00 – 10:15: Damian Trilling – (UvA, Faculty of Social and Behavioural Sciences)

10:15 – 10:30: Daan Oodijk (Blendle)

10:30 – 10:45: Andy Tanenbaum – “Unskewed polls” in 2012

10:45 – 11:00: Q&A Coffee

The event takes place at the Kerkzaal (HG-16A00) on the top floor of the VU Amsterdam main building.

Posted in CrowdTruth, Projects

VU’s 4th ICT4D symposium: a look back

Yesterday, 18 May 2017, the 4th International ICT4D symposium was held at Vrije Universiteit Amsterdam.  The event was organized by the W4RA team and supported by VU Network Institute, the Netherlands Research School for Information and Knowledge Systems SIKS, VU Computer Science Department and VU International Office. Invited speakers from Ghana, France and the Netherlands highlighted this year’s theme was “Sustainability and ICT4D”.

Keynote speaker Gayo Diallo from Universite de Bordeaux discussed the possibilities of ICT for African Traditional Medicine (ATM). In his talk, he showed how semantic web technologies can play a role here to connect heterogeneous datasets for analytics and end-user services. Such services would need to be based on voice-interaction and localized technologies. His slides can be found here.

Chris van Aart from 2Coolmonkeys discussed a number of smartphone applications developed in the context of W4RA activities, including Mr. Jiri a tree-counting application. He proved there is a market for such applications in the African context (Slides).

After the break, Francis Dittoh from UDS Ghana discussed issues around sustainbility for a meteo application he is currently developing for Northern-Ghana (slides). Wendelien Tuijp from VU’s CIS then presented multiple perspectives on ICT4D (Slides). The symposium was closed by a video presentation from Aske Robenhagen, showcasing the ongoing work in Nepal around mapping knowledge networks and developing a smartphone application supporting information exchange for local accountability extension workers. More information on that project can be found at

The presentations of the day can be found through the links above. The entire symposium was live-streamed and you can watch it all on youtube or below.

Below is a lost of the approximate starting time of the various speakers in the video

  • 6m19 Dr. Gayo Diallo – Université de Bordeaux (FR): Towards a Digital African Traditional Healthcare using Semantic Web.
  • 56m28 Dr. Chris van Aart – 2CoolMonkeys BV (NL) : Developing Smartphone Apps for African farmers.
  • 1h30m00 break.
  • 1h52m00 Francis Dittoh – University for Development Studies (Ghana): ICT business development in rural Africa.
  • 2h23m00 Wendelien Tuyp – CIS-VU : Sustainable Community Initiatives and African Farmer Innovation.
  • 2h52m00 Aske Robenhagen Network Institute Academy Assistant VU – Building resilient applications for sustainable development. Better video of this can be found at

Share This:

Source: Victor de Boer

Posted in Staff Blogs, Victor de Boer

Big Data Europe Platform paper at ICWE 2017

With the launch of the Big Data Europe platform behind us, we are telling the world about our nice platform and the many pilots in the societal challenge domains that we have executed and evaluated. We wrote everything down in one comprehensive paper which was accepted at the 7th international conference on Web Engineering (ICWE 2017) which is to be held in Rome next month.

High-level BDE architecture (copied from the paper Auer et al.)

The paper “The BigDataEurope Platform – Supporting the Variety Dimension of Big Data”  is co-written by a very large team (see below) and it presents the BDE platform — an easy-to-deploy, easy-to-use and adaptable (cluster-based and standalone) platform for the execution of big data components and tools like Hadoop, Spark, Flink, Flume and Cassandra.  To facilitate the processing of heterogeneous data, a particular innovation of the platform is the Semantic Layer, which allows to directly process RDF data and to map and transform arbitrary data into RDF. The platform is based upon requirements gathered from seven of the societal challenges put forward by the European Commission in the Horizon 2020 programme and targeted by the BigDataEurope pilots. It is validated through pilot applications in each of these seven domains. .A draft version of the paper can be found here.


The full reference is:

Sören Auer, Simon Scerri, Aad Versteden, Erika Pauwels, Angelos Charalambidis, Stasinos Konstantopoulos, Jens Lehmann, Hajira Jabeen, Ivan Ermilov, Gezim Sejdiu, Andreas Ikonomopoulos, Spyros Andronopoulos, Mandy Vlachogiannis, Charalambos Pappas, Athanasios Davettas, Iraklis A. Klampanos, Efstathios Grigoropoulos, Vangelis Karkaletsis, Victor de Boer, Ronald Siebes, Mohamed Nadjib Mami, Sergio Albani, Michele Lazzarini, Paulo Nunes, Emanuele Angiuli, Nikiforos Pittaras, George Giannakopoulos, Giorgos Argyriou, George Stamoulis, George Papadakis, Manolis Koubarakis, Pythagoras Karampiperis, Axel-Cyrille Ngonga Ngomo, Maria-Esther Vidal.   . Proceedings of The International Conference on Web Engineering (ICWE), ICWE2017, LNCS, Springer, 2017


Share This:

Source: Victor de Boer

Posted in Staff Blogs, Victor de Boer

Trip report: Museums and the Web Conference 2017 (MW17)

Between 19-22 April 2017, the MW (Museums and the Web) conference took place in Cleveland, Ohio, USA. I was there to give a presentation about DigiBird, a valorization project supported by the Dutch national program COMMIT/. The project lasted for 6 months and the results were summarized in this paper (see proposal). We were given a slot in the panel session titled “How Can We Connect Online Audiences With Online Collections?”. In this panel session the presentations and discussions were focused on describing strategies that can be used to link and engage online users with online collections, but also on ways of connecting online collections together. The presentation of our project can be found below.

Next, I will tell you more about the conference itself and my experience there as an attendee.

The MW conference series started taking place since 1997 and its history can be traced back by more than 1000 papers that can be accessed online from the past 20 years. The conference takes place every year in North America and Asia and mostly gathers together professionals from the cultural heritage domain. But, its attendees also include, as the MW organizers also mention: “webmasters, educators, curators, librarians, designers, managers, directors, scholars, consultants, programmers, analysts, publishers and developers from museums, galleries, libraries, science centers, and archives – as well as the companies, foundations and governments that support them”. Thus, the people that attend the conference come from very diverse backgrounds and have as a common interest the cultural heritage domain, be it from an artistic, a cultural or a technological point of view.

This diversity of the audience is also reflected in the organization of the conference itself. As reflected in this year`s program, the conference hosted panel sessions for presentations of formal papers, professional forums, How-to-sessions, Lightning Talks (Pecha Kucha-style), “Birds of a Feather” round-tables and an exhibition. Also, during this conference, the GLAMi (formerly Best of the Web) awards are given for the best organizations or projects that make innovations in the the cultural heritage domain.

You can read the MW17 notes that I took during some of the presentations that I attended.

The MW17 conference was a great experience for me and I was happy to represent our DigiBird project there. As most of the presentations described US-related projects, ours brought a nice nuance of orange.

In the end, I would like to thank all the people that have contributed to make this presentation and project happen, including Chris Dijkshoorn, Maarten Brinkerink, Sander Pieterse and, last but not least, Lora Aroyo.



Posted in Conferences, Papers, Trip Reports