Insights in (Spoken) Multilingual Machine Translation: examining Continuous Learning and Fairness
Speaker: Marta Ruiz Costa-Jussa, Universitat Politècnica de Catalunya
Marta R. Costa-Jussà is a Ramon y Cajal Researcher at the Universitat Politècnica de Catalunya (UPC, Barcelona). She received her PhD from the UPC in 2008. Her research experience is mainly in Machine Translation. She has worked at LIMSI-CNRS (Paris), Barcelona Media Innovation Center, Universidade de São Paulo, Institute for Infocomm Research (Singapore), Instituto Politécnico Nacional (Mexico) and the University of Edinburgh. Recently, she has received an ERC Starting Grant 2020 and two Google Faculty Research Awards (2018 and 2019).
Multilingual Machine Translation is at the core of social communication. In everyday situations, we rely on free commercial services. These systems have improved their quality thanks to the use of deep learning techniques. Despite the considerable progress that machine translation is making, why do we still see that translation quality is much better in English to Portuguese than between spoken Dutch and Catalan? In addition to this, there are demographic biases widely affecting our systems e.g., from poorer speech recognition for women than for men to stereotyped translations, why neutral words as “doctor” tend to infer the “male” gender when translated into a language that requires gender flexion for this word?
In this talk, we will give some profound insights into (spoken) multilingual machine translation pursuing similar quality for all languages and allowing for incremental addition of new languages. Moreover, we will give details on the fairness challenge, focusing on producing multilingual balanced data in terms of gender; working towards transparency; and debiasing algorithms.