Trip report: ISWC 2014

From the 19th to the 23rd of October I had the pleasure to participate at the International Semantic Web Conference in beautiful Riva del Garda, Italy. I was there together with Oana Inel and Lora Aroyo, my colleagues in the CrowdTruth team, to demo our platform and spread the word about how to use crowdsourcing to get ground truth data to the Semantic Web community. Here’s a brief summary of what went on.


The official conference kicked off with a great talk by Prabhakar Raghavan, Google Engineering VP and one of the most important names in Information Retrieval (this book he wrote with Christopher Manning is still considered THE textbook of the field). He did a quick, fun overview of IR and how it has developed in the last 20 years, starting with the first web crawls of 1994. Some key milestones to consider are:

  • the early emphasis on web search recall (i.e. the fraction of relevant results returned in a search),
  • the failure of these web rankers (mainly due to the assumption that content creators have the same motives for publishing on the Web – they do not, each agent is writing in their own interest),
  • introduction of n-gram search (specifically bigrams for noun entities),
  • perhaps the most famous on this list, Google’s game changing Page Rank algorithm (measuring page quality independently of the query, through link patterns).

He also outlined some of the current problems of interest in IR, specifically how to model the underlying need of the user behind the query (e.g. when someones googles for a DSLR camera, it is implied they are interested in buying one). This switch from entity-based so semantically-rich queries is the next frontier of web search, and Google is trying to reach it by employing the knowledge graph to browse through property links. This ties in nicely with the research we are doing with CrowdTruth — user data on page ranking is indeed noisy, but isn’t this some sort of indicator for the large variety of user needs?

"Crowdsourcing for ranking yields reasonable quality but extremely high variance" @WittedNote; maybe it's a feature ;) #CrowdTruth #iswc2014

— Anca Dumitrache (@anouk_anca) October 21, 2014

Yolanda Gil also gave an interesting talk on practical applications of semantic technology, this time in the field of organizing and scheduling software. She outlined 3 use cases to consider: (1) to-do lists, (2) knowledge-rich tasks in science, and (3) collaborative work environments, together with some challenges to consider for the future:

  • Can we build a semantically rich to-do list manager?
  • What about software coordinating multiple user’s lists?
  • Even more interesting, could we use linked scientific datasets of experimental methods and results to automatically generate research papers?
  • But perhaps a more realistic goal would be to use this data for building collaborative working environments for scientific research, that is both open, and does NOT rely on email (ok, maybe that is not very realistic either).

Like the previous keynote, Gil also discussed about the implied semantic relations in the data. For instance, to-do lists often lack the verb component (i.e. the semantic property), which is to be inferred from the type of the nouns in the list. Her proposed solutions also focus on understanding user (and crowd!) dynamics, collecting data through games with a purpose, or by studying the way users organize their work, research or otherwise. For an overview of her work, check out this paper in Science.

Finally, Sir Nigel Shadbolt from the University of Southampton gave a talk on linked data use case in the open government field (see also his talk at WS3 this year). The UK government has been doing a great job at publishing datasets of general interest, that Shadbolt and his team have interpreted into beautiful visualizations for public consumption. Success stories include:

  • natality, death rates, obesity data, used to compare health trends across various cities (London seems to be doing much better than Manchester here),
  • prescription data released by physicians, showing an unsettling trend that GP’s are over-prescribing brand-name medication,
  • banking data, visualizing country size in comparison with the number of businesses that declare their taxes there (Cayman Islands takes the cake here),
  • coordinates of firemen posts in London, used for predictive analysis to make crisis management more efficient.

Once upon a time, this data was very difficult to get (you even had to pay for some of this information), the implications of making it freely available are quite big. And the work is far from over — the datasets currently published are only a small amount of the information that governments currently posses. More data will also present technical challenges — as the data complexity increases, we need simplified interfaces to encourage users to keep publishing.

Selected talks, papers

Natural Language Processing was a big theme at this year’s conference, specifically how we can use NLP to generate and enrich data semantics. The conference had a great NLP track, but also several workshops around the theme, with papers published even in the Doctoral Consortium, showing that this is a topic of great future interest.

Roberto Navigli from the Sapienza University of Rome kicked off the discussions with his keynote in the NLP & DBpedia workshop. He gave an overview of the diverse problems that are tackled by his research group, from traditional NLP topics such as cross-lingual entity linking to more ambitious projects such as using Games with a Purpose to solve disambiguation. Some interesting ideas he explored are:

  • integrating knowledge from a variety of languages by combining the WordNet vocabulary with specialized concepts from Wikipedia,
  • using multiple languages to perform disambiguation for one specific language, using centrality measures in a network of cross-lingual entities,
  • attributing concepts to categories by identifying is a relations in the entity network,
  • using gamification to build disambiguation sets WITHOUT having access to a gold standard (in their experiments, players performed slightly better than crowdsourcing, particularly for identifying negative examples).

How to deal with noise/disagreement in semantic interpretation of data proved to be a recurring topic. Isabelle Augenstein from the University of Sheffield discussed a related topic in her Doctoral Consortium paper. Her work with extracting semantic relations (using distant supervision, something we have explored with CrowdTruth) from text in the musical domain and then using it to train a relation extraction classifier, has faced a familiar challenge — performance of automated approaches is marred by ambiguous terms and incomplete knowledge bases. She solved the issues by identifying unreliable data points from the training set, which generated good results in the evaluation. The question remains though, what is the value of training on ambiguous data? Can we capture and model this disagreement in a good way? I suspect that, for more complex domains (like medical text), this could make a big difference in performance. Me and Isabelle also had a nice chat during lunch : ) — I am curious to try out her ambiguity metrics in comparison with our own CrowdTruth metrics.

Michelle Cheatham from Wright State University also discussed semantic ambiguity, this time from the perspective of the crowd. In her paper presented in the ontology alignment session, she discusses asking both a set of experts, and the crowd on Amazon Mechanical Turk, to perform ontology alignment tasks in the conference organization domain. Her experimental results have some important similarities to our own CrowdTruth work:

  • there is a LOT of disagreement, even between experts in the field (the only cases of clear agreement are between exact lexical matches),
  • crowdsourcing on AMT similarly generated a lot of noise, with the most complex sentences generating the most disagreement,
  • evaluation of this data using current benchmarks is made difficult by the assumption that there is always one correct answer — a mapping exists or does not exist (and therefore most benchmark datasets are discreet, with 1 and 0 labels).

The conclusions of this work are very important — the gold standard for ontology alignment does not reflect the high degree of disagreement between domain experts (and the crowd). Again, I would be curious to try CrowdTruth metrics on this data, to see if/how the results change.

All in all, it was encouraging to see so many researchers tackle the same problems as us — I hope that by this time next year we will be able to set up some collaborations to extend our work in all of these different domains. Aside from that, there were many good presentations on a variety of topics, some others that I enjoyed include:

  • Gong Cheng’s work on Explass, exploring association search between linked-data entities by using top-K clustering to discover ontological patterns and facets,
  • Axel Ngonga’s best paper award work on AGDISTIS, a framework for mutilingual named-entity disambiguation, able to link entities against every Linked Data Knowledge Base.

CrowdTruth at ISWC

Last but not least, I want to talk about what we, the CrowdTruth team, did at the conference. We had a paper accepted in the Replication, Benchmark, Data and Software track about the CrowdTruth platform for managing crowdsourcing tasks and analyzing task ambiguity and inter-annotator disagreement. On Wednesday evening, we participated in the demo session, showing off our application. Here is our poster:

And here’s our conference booth:


Lora also presented our work in a short talk on Thursday:

Annotating data with ambiguity analysis on the #CrowdTruth platform – @laroyo talks at #iswc2014

— Anca Dumitrache (@anouk_anca) October 22, 2014

We received some nice feedback — everyone seems to be impressed by how sleek this looks for an open source application! On the downside, there is still some work to be done to improve the user experience and make the platform more easily customizable for external users — this is a big goal for us for next year! Also a big step for us: most of the visitors agreed with the theoretical premise of our work, that means next time we will have to show them the hard numbers (i.e. experimental results).

Random thoughts

The workshops were particularly strong this year (almost as good as the main conference!), and the same goes for the Doctoral Consortium — some great advice from Natasha Noy and Paul Groth to reframe your research questions in terms of who cares about your work, this cannot be repeated often enough! The PhD mentoring lunch was also a success, it’s always nice to share stories and perspectives about what it’s like working in academia. And finally, some travel highlights from the Garda lake area:

  • Cascata Varone — a bit of a touristy site, but the actual waterfall in a cave is very impressive (Thomas Mann was also a fan),
  • the pasta and pizza are good, but you should also try the Limoncello,
  • many beautiful hiking spots, particularly if you’re into history, there are Austro-Hungarian military forts, World War I trenches, chapels, and everything is at the top of a mountain.


Posted in Anca Dumitrache, Conferences, Staff Blogs, Trip Reports

Leave a Reply