Linking Birds: Converting the IOC World Bird List to RDF

In SEALINCMedia presentations about Accurator we often use the example of a print described as “bird near red leaf”. Although this description captures what is seen in the print,  it can be much more precise. Questions such as what sort of bird is depicted,  What is the type of the red leaf, etc. can be further answered.

This is an ideal case for the Accurator framework. We engage the appropriate niche (bird enthusiasts) to help annotate the bird prints of the Rijksmuseum with bird names from a structured vocabulary. The only problem was that we did not have such a structured vocabulary at hand.

This is where the experts at Naturalis came in. They pointed us to the IOC World Bird List, het Nederlands soortenregister and provided us with data of their own specimen collection. Since we aim to integrate these different datasets to create a comprehensive list of birds, we turned to RDF. In this blog post I describe the conversion of the IOC list.

The IOC World Bird List is available in multiple file formats. Using the Cliopatria server extended with the xmlrdf package I started the conversion process by loading the available XML file. Xmlrdf automatically turns the hierarchy embedded in the XML into a graph structure. Using rewrite rules such as the one below, the graph can be refined.

common_name_property @@
{ A, birds:englishName, B }
{ A, txn:commonName, B@en }.

As you can see the rule above replaces the property created by xmlrdf with one from the TaxonConcept ontology. This ontology contains a lot of concepts useful for modelling species data and I reused as much of these concepts as possible. Initially all the concepts in the graph are blank nodes. Using the same sort of rewrite rules, I created IRI’s of the form: The IRI’s consist of the namespace, the level in the hierarchy (e.g. genus or species) and the scientific name.

Another useful resource is available on the IOC website: a spreadsheet with bird names in 19 different languages. Using the scientific names I found the corresponding species IRI in the graph and added the different commonNames with the corresponding language tags. An example of information linked to the birds:species-phoenicurus_auroreus resource:

Predicate Value
rdf:type txn:SpeciesConcept
txn:authority “(Pallas, 1776)”
birds:breedingRegions “EU”
birds:breedingSubregions “c,e”
txn:commonName “rehek mongolský”@cs, “Amurrødstjert”@da,
“Spiegelrotschwanz”@de, “Daurian Redstart”@en,
“Colirrojo Dáurico”@es, “mustselglepalind”@et,
“laaksoleppälintu”@fi, “Rougequeue aurore”@fr,
“tükrös rozsdafarkú”@hu, “Codirosso daurico”@it,
“ジョウビタキ”@ja, “Spiegelroodstaart”@nl,
“Aurorarødstjert”@no,”pleszka chińska”@pl,
“Сибирская горихвостка”@ru, “žltochvost zrkadlový”@sk,
“Svartryggad rödstjärt”@sv, “北红尾鸲”@zh
txn:inGenus birds:genus-phoenicurus
birds:nonbreedingRegions “s China, ne India”
txn:scientificName “Phoenicurus auroreus”

Many of the objects are currently literals, while some of them could be linked to external vocabularies. Linking the regions to GeoNames is something I will look into in the future, although parsing the more specific regions will be troublesome (e.g. “w slope of the e Andes in c Colombia”).

In a following blog post I will describe the conversion of the collection data of Naturalis to RDF and how I link that information to IOC World Bird List. This conversion was done at the Web & Media group at the VU University Amsterdam, if the work sparked your interest have a look at my site.

Source: Chris Dijkshoorn

Posted in Projects, SEALINCMedia, Staff Blogs

Leave a Reply