The Koninklijke Bibliotheek launched Delpher, a search engine giving access to millions of historical text resources, varying from magazines, newspapers and books. The data is available via a public API and can also be downloaded as a complete dataset.
To us, Delpher is a great example of the opportunities that lie in making data publicly available and serves a wide variety of users, from researches in historical fields (art history, general history, anthropology, etc.) but also a more general public who would like to find something in the past from their own lives.
It is a tremendous effort to digitize, store and make available these enormous amounts of information. To make it accessible without annotation, Delpher uses Optical Character Recognition to build full-text indexes.
While the technological feats are great, it is important to be critical as well. First, Delpher states it chooses quantity over quality and says the technology they use is not yet capable of precise OCR, let alone, recognizing context or meaning. How these challenges are currently addressed is unclear.
Playing around with Delpher quickly shows it is slow. This can be fixed by both upscaling resources and using different/better search algorithms. While Delpher is not open about the both of them, it is unclear which technique is more profitable, in search times and investment cost.
A quick test also shows that Delpher can return many results. Querying Philips returns almost a million results. It is doubtful that these are all relevant and in such an example, without filtering, prioritizing and ranking results the search engine becomes difficult to use and understand.
We also argue that the interaction and presentation of the front end is not very modern and lacks a general UX quality. The aim of the project is to make historical texts available for the public and with that comes the responsibility to make the date consumable.
At this time, the website still caries the BETA label and we are curious to see improvements over time. Delpher is a great tool for anyone interested in historical text and for data scientists. Head over and take a look at http://www.delpher.nl.