The Computer Analysis of Latin Texts: Topic Modeling of “Historia de Regibus Gothorum, Vandalorum et Suevorum” by Isidore of Seville
Table of contents
Publication type
Aleksey Kuznetsov 
Affiliation: Institute of World History RAS
Address: Russian Federation, Moscow

The article attempts to use the modern text mining methods to analyze the latin text of “Historia de regibus Gothorum, Vandalorum et Suevorum” by Isidore of Seville, in particular, to perform topic modeling to extract hidden semantic structures from the text. The main task of study was to clarify the attitude of Isidore of Seville toward the three barbarian peoples. The analysis of the text was performed with the R programming language. As a method for topic modeling was chosen the probabilistic topic model of Latent Dirichlet Allocation. The main research tool was the UDPipe package for R. Topic modeling was performed by means of pre-trained model created as part of the Universal Dependencies project and based on the Index Thomisticus treebank. Particular attention during the creation of the topic model was paid to the quality of the text preprocessing and the selection of the optimal number of topics based on the metric of topics coherence. At the end of the article, the results of the distribution of the topics identified by sections of the text by Isidore of Seville are analyzed.


Isidore of Seville, early Middle Age historiography, computational text analysis, topic modelling, Latent Dirichlet Allocation, topic coherence
