It’s been about a week since I got from Australia attending the International Semantic Web Conference (ISWC 2013). This is the premier forum for the latest in research on using semantics on the Web. Overall, it was a great conference – both well run and there was a good buzz. (Note, I’m probably a bit biased – I was chair of this year’s In-Use track) .
ISWC is a fairly hard conference to get into and the quality is strong.
More importantly, almost all the talks I went to were worth thinking about. You can find the proceedings of the conference online either as a complete zip here or published by Springer. You can find more stats on the conference here.
As an aside, before digging into the meat of the conference – Syndey was great. Really a fantastic city – very cosmopolitan and with great coffee. I suggest Single Origin Roasters. Also, Australia has wombats – wombats are like the chillest animal ever.
From my perspective, there were three main themes to take away from the conference:
- Impressive applications of semantic web technologies
- Core ontologies as the framework for connecting complex integration and retrieval tasks
- Starting to come to grips with messiness
We are really seeing how semantic technologies can power great applications. All three keynotes highlighted the use of Semantic Tech. I think Ramanathan Guha’s keynote probably highlighted this the best in his discussion of the growth of schema.org.
Beyond the slide above, he brought up representatives from Yandex, Yahoo, and Microsoft on stage to join Google to tell how they are using schema.org. Drupal and WordPress will have schema.org in their cores in 2014. Schema.org is being used to drive everything from veteran friendly job search, to rich pins on Pinterest and enabling Open Table reservations to be easily put into your calendar. So schema.org is clearly a success.
Peter Mika presented a paper on how Yahoo is using ontologies to drive entity recommendations in searches. For example, you search for Brad Pitt and they show you related entities like Angelina Jolie or Fight Club. The nice thing about the paper is that it showed how the deployment in production (in Yahoo! Web Search in the US) increases click through rates.
I think it was probably Yves Raimond’s conference – he showed some amazing things being done at the BBC using semantic web technology. He had an excellent keynote at the COLD workshop – also highlighting some challenges on where we need to improve to ease the use of these technologies in production. I recommend you check out the slides above. Of all the applications, their work on mining the world service archive of the BBC to enrich content being created. This work won the Semantic Web Challenge.
In the biomedical domain, there were two papers showing how semantics can be embedded in tools that regular users use. One showed how the development of ICD-11 (ICD is the most widely used clinical classification developed by the WHO) is supported using semtech. The other I liked was the use of excel templates (developed using RightField) that transparently captured data according to a domain model for Systems biology.
- Tania Tudorache, Csongor I Nyulas, Natasha F. Noy, Mark Musen
Using Semantic Web in ICD-11: Three Years Down the Road
- Katherine Wolstencroft, Stuart Owen, Olga Krebs, Quyen Ngyuen, Jacky. L. Snoep, Wolfgang Mueller, Carole Goble
Semantic Data and Models Sharing in systems Biology: The Just Enough Results Model and the SEEK Platform
Finally, there was a neat application presented by Jane Hunter applying these technologies to art preservation: The Twentieth Century in Paint.
I did a review of all the in-use papers leading up to the conference but it’s good enough to say that there were numerous impressive applications. Also, I think it says something about the health of the community when you see slides like this:
Core Ontologies + Other Methods
There were a number of interesting papers that were around the idea of using a combination of well-known ontologies and then either record linkage or other machine learning methods to populate knowledge bases.
A paper that I like a lot (and also won the best student paper) was titled Knowledge Graph Identification (by Jay Pujara, Hui Mia, Lise Getoor and William Cohen) sums it up nicely:
Our approach, knowledge graph identification (KGI) combines the tasks of entity resolution, collective classification and link prediction mediated by rules based on ontological information.
Interesting papers under this theme were:
- Mohsen Taheriyan, Craig Knoblock, Pedro Szekely, José Luis Ambite
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
- Note to self: Karma is awesome!
- Daniel Gerber, Sebastian Hellmann, Lorenz Bühmann, Tommaso Soru, Axel-Cyrille Ngonga Ngomo, Ricardo Usbeck
Real-time RDF extraction from unstructured data streams
- Heiko Paulheim, Christian Bizer
Type Inference on Noisy RDF Data
- COLD workshop: On-the-fly Integration of Static and Dynamic Sources (Andreas Harth, Craig Knoblock, Steffen Stadtmüller, Rudi Studer and Pedro Szekely)
- Poster: Aldo Gangemi, Francesco Draicchio, Valentina Presutti, Andrea Giovanni Nuzzolese and Diego Reforgiato A Machine Reader for the Semantic Web
From my perspective, it was also nice to see the use of the W3C Provenance Model (PROV) as one of these core ontologies in many different papers and two of the keynotes. People are using it as a substructure to do a number of different applications – I intend to write a whole post on this – but until then here’s proof by twitter:
Coming to grips with messiness
It’s pretty evident that when dealing with the web things are messy. There were a couple of papers that documented this empirically either in terms of the availability of endpoints or just looking at the heterogeneity of the markup available from web pages.
In some sense, the papers mentioned in the prior theme also try to deal with this messiness. Here are another couple of papers looking at essentially how do deal with or even use this messiness.
- Aibo Tian, Juan F. Sequeda, Daniel Miranker
QODI: Query as Context in Automatic Data Integration
- Alberto Tonon, Michele Catasta, Gianluca Demartini, Philippe Cudré-Mauroux, Karl Aberer
TRank: Ranking Entity Types Using the Web of Data
- Michelle Cheatham, Pascal Hitzler
String Similarity Metrics for Ontology Alignment
- Daniel M. Herzig, Roi Blanco, Peter Mika, Thanh Tran
Federated Entity Search using On-The-Fly Consolidation
One thing that seemed a lot more present in this year’s conference than last year was the term entity. This is obviously popular because of things like google knowledge graph – but in some sense maybe it gives a better description of what we are aiming to get out of the data we have – machine readable descriptions or real world concepts/things.
There are some things that are of interest that don’t fit neatly into the themes above. So I’ll just try a bulleted list.
- The workshops were very well attended. I attended the COLD workshop and the organized the Linked Science Workshop (LISC). My colleague Albert has provided a nice workshop report on the SemStat workshop.
- We actually worked in the LISC workshop looking at how semtech can support scientific reproducibility. You can find videos, speadsheets and our challenges to the community on figshare.
- I liked the methodological approach in Matthew Horridge, Tania Tudorache, Jennifer Vendetti, Csongor Nyulas, Mark Musen, Natasha F. Noy Simplified Ontology Editing for the Web: Is WebProtege Enough? – something to remember when trying to develop and test user interfaces.
- The Semantic Web Jam Session was amazing.
- We won the Best Demo Paper Award for git2prov.org
- Our paper on using NoSQL stores for RDF went over very well. Congrats to Marcin for giving a good presentation.
- The format of mixing talks from different tracks by topic and having only 20 minutes per talk was great.
- VUA had a great showing – 3 main track papers, a bunch of workshop papers, a couple of different posters, 4 workshop organizers giving talks at the workshop summary session, 2 organizing committee members, alumni all over the place, plus a bunch of stuff I probably forgot to mention.
- The colocation with Web Directions South was great – it added a nice extra energy to the conference.
- There were best reviewer awards won by Oscar Corcho, Tania Tudorache, and Aidan Hogan
- Peter Fox seemed to give a keynote just for me – concept maps, PROV followed with abductive reasoning.
- Did I mention that the coffee in Syndey (and Newcastle) is really good and lots of places serve proper breakfast!