Our paper “Harnessing Diversity in Crowds and Machines for Better NER Performance” (Oana Inel and Lora Aroyo) has been accepted for the ESWC 2017 Research Track. The paper is to be published together with the proceedings of the conference.
Over the last years, information extraction tools have gained a great popularity and brought significant improvement in performance in extracting meaning from structured or unstructured data. For example, named entity recognition (NER) tools identify types such as people, organizations or places in text. However, despite their high F1 performance, NER tools are still prone to brittleness due to their highly specialized and constrained input and training data. Thus, each tool is able to extract only a subset of the named entities (NE) mentioned in a given text. In order to improve NE Coverage, we propose a hybrid approach, where we first aggregate the output of various NER tools and then validate and extend it through crowdsourcing. The results from our experiments show that this approach performs significantly better than the individual state-of-the-art tools (including existing tools that integrate individual outputs already). Furthermore, we show that the crowd is quite effective in (1) identifying mistakes, inconsistencies and ambiguities in currently used ground truth, as well as in (2) a promising approach to gather ground truth annotations for NER that capture a multitude of opinions.