Workshop: “Narratology, Literature & Large Language Models”
École normale supérieure, Salle Jaurès, 29 rue d’Ulm 75005 Paris
Speakers: David Bamman (Berkeley), Evelyn Gius (Technical University Darmstadt), Enrique Manjavacas Arevalo (U. Leiden)
Pre-registration (required): https://forms.gle/yTmxufDLTtkUg1mF9
Talks will be held in English.
Zoom link will be send just before the workshop to those registered and not able to attend (depending on technical conditions – we recommend attending in person).
The workshop is organized with the support of EUR Translitterae (https://www.translitterae.psl.eu/) and PRAIRIE.
* 14h — Thierry Poibeau. Welcome and Introduction
*14h05 — David Bamman (Berkeley): « The Promise and Peril of Large Language Models for Cultural Analytics »
Abstract: In this talk, I’ll discuss the role of large language models (such as ChatGPT, GPT-4 and open alternatives) for research in cultural analytics, both raising issues about the use of closed models for scholarly inquiry and charting the opportunity that such models present. I’ll discuss recent work carrying out a data archaeology to infer books that are known to ChatGPT and GPT-4 using a name cloze membership inference query, where we find that OpenAI models have memorized a wide collection of materials and that the degree of memorization is tied to the frequency with which passages of those books appear on the web. I’ll also detail the use of those models for downstream tasks in cultural analytics, illustrating their affordances for measurement of difficult cultural phenomena, but also the risks that come in establishing measurement validity. The rise of large pre-trained language models has the potential to radically transform the space of cultural analytics by both reducing the need for large-scale training data for new tasks and lowering the technical barrier to entry, but need care in establishing the reliability of results.
*15h — Evelyn Gius (Technical University Darmstadt): « Events as minimal units in prose – A narrative theory-driven approach to event classification and narrativity »
Abstract: Narrative theory conceives of events as smallest building blocks of narratives. Moreover, events are linked to plot by the concepts of tellability and narrativity. In this talk I will sketch an approach to narrativity and plot that builds on the different event concepts in narrative theory. While events are considered as changes of state in most approaches, some theorists also include weaker concepts in their event concepts. By integrating these different accounts into our operatonalization of events, we are working towards a strongly discourse-driven plot analysis. I will sketch our approach to event and narrativity analysis and discuss the implications for both narrative theory and applied computational narratology.
*16h — Enrique Manjavacas Arevalo (U. Leiden): « Historical Language Models and their Application to Word Sense Disambiguation »
Abstract: Large Language Models (LLMs) have become the cornerstone of current methods in Computational Linguistics. As the Humanities look towards computational methods in order to analyse large quantities of text, the question arises as to how these models are best developed and applied to the specificities of their domains. In this talk, I will address the application of LLMs to Historical Languages, following up on the MacBERTh project. In the context of the development of LLMs for Historical Languages, I will address how they can be specifically fine-tuned with efficiency to tackle the problem of Word Sense Disambiguation. In a series of experiments relying on data from the Oxford English Dictionary, I will highlight how non-parametric and metric learning approaches can be an interesting alternative to traditional fine-tuning methods that rely on classifiers that learn to disambiguate specific lemmas.
David Bamman is an associate professor in the School of Information at UC Berkeley, where he works in the areas of natural language processing and cultural analytics, applying NLP and machine learning to empirical questions in the humanities and social sciences. His research focuses on improving the performance of NLP for underserved domains like literature (including LitBank and BookNLP) and exploring the affordances of empirical methods for the study of literature and culture. Before Berkeley, he received his PhD in the School of Computer Science at Carnegie Mellon University and was a senior researcher at the Perseus Project of Tufts University. Bamman’s work is supported by the National Endowment for the Humanities, National Science Foundation, the Mellon Foundation and an NSF CAREER award.
Evelyn Gius is a Professor of Digital Philology and Modern German Literature at Technical University Darmstadt and head of the fortext lab. Her research focuses on narrative theory, manual annotation, operationalization, segmentation, and conflict. She leads the development of the annotation platform CATMA as well as the platform fortext.net where beginner-friendly materials for Digital Humanities are provided. Her current research projects include EvENT, a project on events as minimal units of narration, and KatKit, a project on the operationalization of humanities concepts in the framework of applied category theory from mathematics.
Gius also serves as chair of the Digital Humanities Association in the German-speaking areas (“Digital Humanities im deutschsprachigen Raum”, DHd), as co-editor of the Open Access Journal of Computational Literary Studies (JCLS), and as co-editor of the Metzler/Springer Nature book series “Digital Literary Studies“.
Enrique Manjavacas Arevalo is currently a post-doc at the University of Leiden, working in the MacBERTh project developing Large Language Models for Historical Languages. He obtained a PhD at the University of Antwerp (2021) with a dissertation on computational approaches to text reuse detection.