Barcelona Summer School 2025

The second Summer School of the PortADa project took place between January and March 2025, hosted by the University of Barcelona, and focused on a key stage in the project’s development: the refinement of extracted data. This phase marked the transition from generating text corpora to preparing them for structured analysis..

During this meeting, essential tasks were completed to advance the processing of ship arrival reports at the ports, including:

  • The optimisation of segmentation procedures, which allow for the correct organisation of the content on each digitised page.
  • The completion of the OCR process on digitised historical press images, resolving errors and refining the automatic recognition procedure.
  • The start of text refinement tasks, based on representative samples of the corpus. This work made it possible to identify and categorise frequent errors and to establish differentiated strategies depending on the type of entity mentioned (ports, vessels, people).
  • The design and use of dictionaries and controlled word lists, both for normalisation and for the automatic detection of errors and ambiguities.

At the end of the Summer School, an academic seminar was held, featuring international experts involved in leading projects in the fields of maritime history and the application of digital technologies for large-scale processing of data from digitised historical newspapers. The presentations included:

  • Silvia Marzagalli (Université Côte d’Azur – The Portic Project): The Portic experience: understanding maritime trade through digital humanities.
  • Miguel Ángel Bringas Gutiérrez (University of Cantabria): Commercial Traffic through the Port of Santander, 1881–1891
  • Antoine Doucet (University of Ljubljana / La Rochelle Université): NewsEye – a digital investigator for historical newspapers.

These exchanges made it possible to compare the methodological approaches of the PortADa project with similar experiences, enrich the refinement process, and open up new avenues for collaboration.

As a result of this Summer School, the full text of virtually all newspapers corresponding to the study period was successfully obtained—an important milestone for the project. In addition, a work plan was outlined for the next stage: the development of an interactive digital platform that will enable collaborative refinement, validation, and data extraction tasks to be carried out in a distributed manner across the various research teams.

The Barcelona Summer School thus consolidated a new phase of the project, focused on the transition from text to data and on strengthening the tools required for a rigorous and sustainable exploitation of the generated corpus.