Scientific context
Humanities scholars have long sought to understand how past people thought, felt, and interpreted the world. Traditional methods — such as close reading, archival analysis, and philology — offer rich, interpretive insights, but they are often labor-intensive and limited in scope. Quantitative approaches like word frequency, topic modelling and word embedding analysis have expanded our methodological toolkit, but remain indirect proxies for psychological or cultural traits [1-4].
Recent advances in artificial intelligence (AI) have opened up novel avenues for understanding the human experience across time. Among the most intriguing frontiers is the development of Historical Large Language Models (HLLMs) — language models trained on corpora of historical texts [5]. These models offer the potential to simulate plausible psychological responses and cultural representations from individuals who lived in past societies, effectively creating populations of ‘virtual ancestors’. An HLLM trained on a specific corpus — say, 18th-century French political tracts, or Qing dynasty administrative documents — can respond to prompts with outputs that reflect the linguistic and conceptual patterns present in its training data. These simulated responses can be interrogated using psychological instruments or thematic surveys, generating data that, while artificial, may reveal the distribution of beliefs or values latent in a cultural moment. One could, for instance, estimate levels of authoritarianism, concern for purity, or belief in free will.
MonadGPT (https://huggingface.co /Pclanglais/MonadGPT) provides an example of what we have in mind. This is a fine-tuned version of the Mistral-Hermes 2 model, trained on a corpus of 11,000 early modern texts in English, French, and Latin, primarily sourced from Early English Books Online (EEBO) and Gallica. This model is designed to emulate the language and conceptual frameworks of the 17th c., offering insights into the discourse of that era.