- Status: In Progres
- Type: Specific
- Work Package: WP6
- Research Coordinators: Lodewijk Petram
- Coordinators for CLARIAH: Katrien Depuydt, Jesse de Does
- Participating Institutes: HuC, VU, KB, INT, DANS
- End-users: Linguists, historians
- Developers: Sophie Arnoult, Dirk Roorda, Jesse de Does, ....
- Interest Groups: Text
- Task IDs: Wp3/6 task Infrastructure for Historical Dutch
In the CLARIAH+ WP6 meeting on Tuesday 16 April 2019, it was decided to further develop an idea for a joint use case. The aim of this use case is to align the tools and methods of the different partners in WP6. The VOC was chosen as the topic, on the one hand because of the rich and versatile source material available about this company and its activities, and on the other hand because of the challenging and relevant historical research questions that can be answered with this use case.
How can CLARIAH text processing tools contribute to historical research questions like e.g.
- What shifts took place in the VOC's presence in the East Indies and in the Company's interaction with local rulers and their subjects (1600-1800)?
- How did the networks of VOC employees in Asia develop?
- How did the way in which official VOC documents were written about the local East Indies population, and about the interaction between the VOC employees and the local population develop?
- How did the way in which secondary literature wrote about the VOC's presence in the East Indies, the local East Indies population and the interaction between the VOC employees and the local population develop?
- How did the way in which newspapers, popular magazines and pamphlets were written about the VOC's presence in the East Indies, the local East Indies population and the interaction between the VOC personnel and the local population develop?
Suitable tools were lacking for historical text processing, a.o.
- For named entity recognition and resolution
- Basic linguistic annotation (lemma, PoS)
- Train better NER
- Apply state-of-the art tools for historical enrichment mediated by Wp3/6 task Infrastructure for Historical Dutch, which may include tools like PIE and Deepfrog
Among others:
- Generale Missiven (TEI data converted from ABBYY XML, also NAF and Text fabric representations)
- Pieter van Dam, Beschryvinge van de Oostindische Compagnie
- Dagh-register gehouden int Casteel Batavia vant passerende daer ter plaetse als over geheel Nederlandts-India (1624-1682, uitgegeven in periode 1887-1931)
- De dagregisters van het kasteel Zeelandia, Taiwan (1629-1662, uitgegeven in periode 1986-2000)
- several relevant books, newpaper articles and periodicals
- a range of available relevant structural data Sources
Currently deployed:
- Conversion pipeline Abbyy XML --> TEI --> NAF/XMI (Sophie Arnoult and Jesse de Does)
- Named entity recognition (developed by Sophie Arnoult)
- Text fabric analysis tools
- Others to be determined by lead researchers
(if known, what existing software and services are involved, which need to be developed? Please link to the tools if possible and specify whether it can be used as is, needs extra work, needs to be developed from scratch etc.)
Evaluation by researcher.
References to related resources and publications and especially links to related use-cases: