Középkori okleveles adatbázisok adatbányászati módszerekkel való kutatásának módszertani problémáiról

Authors: Péter Szász

Publication: Medievisztikai Vándorkonferencia Tanulmányok 3.

Published: Jan 1, 2026

Source: Crossref

Back to Search View Original Cite This Article

Abstract

<jats:p>In my study, I aim to present a preliminary outline of my doctoral research topic. My goal is to develop a research methodology that supports the study of calendar/regesta-based source editions through the use of artificial intelligence and text mining tools. The source base of my research is the corpus of charters from the Anjou-kori Oklevéltár (Documents of the Angevin Period in Hungary) (AÓkl) issued between 1301 and 1342, that is, during the reign of Charles I of Hungary. After 35 years of work, this corpus was completed in 2025. The study presents the main characteristics of the calendars/regesta published in the Anjou-kori Oklevéltár as well as the challenges and limitations of text cleaning prior to digital processing. Particular attention is paid to the phenomenon of so-called “dirty data”, which is a consequence of poorly executed text cleaning. Possible approaches to text preprocessing are also discussed. I briefly outline the fundamental characteristics of text-mining methods and describe the procedures – namely named entity recognition, n-gram analysis, topic modeling, and TF–IDF – that I intend to use in developing the methodology. Finally, I present a case study based on my own research, which concisely illustrates the aforementioned problems and possibilities.</jats:p>

Keywords

study research text present outline

Középkori okleveles adatbázisok adatbányászati módszerekkel való kutatásának módszertani problémáiról

Abstract

Keywords

Related Articles

Mathematics, Scientific Instruments, and Certainty of Knowledge: Péter Pázmány between Bellarmine and Clavius at the Jesuit College of Graz

Jaucourt lovag, az <i>Enciklopédia</i> „iparosa”

Data Veracity Assurance in Data Spaces

Commentary by Zoltán Somogyvári and Péter Érdi

Approximation Properties of Generalized $$\mathbf {Q}$$-Favard-Szász-Mirakjan Operators of Max-Product Kind