A fragmentary collection like the corpus of Ramesside papyri in the Museo Egizio in Turin is challenging for any expert: studying and analysing thousands of fragments, both front (recto) and back (verso), translating texts, identifying handwritings and finding potential joins consumes a lot of time, while keeping in mind writing features for the whole collection all along is clearly impossible. In times of increasing significance of computers and algorithms in all fields of science, it seems logical to take this next step and use computer-based techniques to support and enhance the scientific analysis of such a large material corpus.
The main goal of this PhD research project, titled “The Classification and Reconstruction of Fragmentary Documents with Machine Learning. A Case Study in Ancient Egyptian Papyri“, is to use and develop state-of-the-art machine learning approaches to automatically analyse the historical documents. The algorithms will not only use features extracted from the RGB images (e.g. handwriting features, colours, layout, lines, etc.). They will also consider additional meta-information provided by the egyptological experts, like transcriptions and translations of the texts. This data will be used to classify the individual fragments and, eventually, to propose joints and connections between them.
These machine learning results should be visible and useful for other scholars such that they can create their document reconstructions based upon the automatic propositions. Therefore, the development of a “Virtual Light Table” is also part of the PhD project. This tool will allow scholars to browse the available corpus and put a selection of fragments on a digital table where they can be freely moved, rotated and flipped. Once available, the software will automatically propose the most likely candidates for additional joins to the fragments already present. Eventually, users can export the reconstructions and thus use them for further studies or publications. The “Virtual Light Table” will be published as open-source software at the end of the research project.