GLOBALISE newsletter 5

13 December 2022
Review of 2022, new blog post and announcements
Blog: "The (Ground) Truth is out there: an introduction to GLOBALISE’s use of Handwritten Text Recognition"
Did you know Abrekocken are a type of apricots?

GLOBALISE colleagues Maartje Hids and Kay Pepping hand out valuable HTR lessons in the latest blog post.
Call for Data Contributions

Historical reference data are a crucial part of our project: we collect available information about (f.e.) historical persons, places, and commodities and link those to related mentions in archival records. This data also helps to correctly categorize events and entities.

We are currently looking for data related to polities, persons, places, weights, measures and currencies – pertaining to the sphere of VOC activity, namely the Indian Ocean World, in the period between 1600 and 1800.

Datasets need not necessarily be based on VOC archives and can also draw on other early modern sources.

For more detailed information: see PDF or write to us.
It has been (almost) a year….

…. of guest lectures, seminars, dataset creation, HTR finetuning, annotation trials, papers (see most recent), and blogs.

This year, we dotted the i’s and crossed the t’s (pun intended) of our HTR Ground Truth set, and refined our NER annotation process.

We started work on four separate datasets for commodities, places, polities, and weights and measures. And we published a research dataset on VOC prices in Dutch-Asian trade in our Dataverse.

We also held our very first user panel meeting with researchers who work with the VOC archives.

The team will continue this work in the new year – after some well deserved rest 😉 
Get to know us!

At GLOBALISE, we work in a growing team of historians and developers. We would like you to get to know the people behind the project better, which is why we will introduce each of our colleagues in this and upcoming newsletters. This time: Brecht Nijman from work package 3: Historical Contextualisation, and Sophie Arnoult from work package 4: Semantic Contextualisation.
Brecht Nijman
I am a data scientist and historian, and joined GLOBALISE as a junior researcher in the Historical Contextualisation team earlier this fall.

At the moment, I am trying to wrap my head around the many weights, measures, and currencies appearing in the VOC archives in order to start setting up a related dataset and conversion tool. This has already added a number of niche measurements to my vocabulary.
In my free time I enjoy playing music, and I have in fact perfected the essential skill of playing the accordion upside down. Up next: upside down data entry.
Sophie Arnoult
I am a computational linguist at the VU, working in the Computational Linguistics and Text Mining Lab, as well as for the research office of the Faculty of Humanities.

I work on entity modelling for GLOBALISE as part of the Semantic Contextualisation team: one of our goals is to annotate entities in texts – ships, persons, commodities, etc. – and train machine learning models to learn and identify these entities and link them to knowledge graphs or thesauri.

I sing in the VU choir in my spare time, and I like to talk about Natural Language Processing, language, and classical music among other things.
Holiday lights…

While most of us retreat from work to rest and / or celebrate during this time of year, our server will be doing overtime.

Over the holiday period, we plan to run the GLOBALISE HTR model on some 5.000.000 VOC scans to produce automated transcriptions of each page – stay tuned for the results!

Happy Holidays!