ALMAnaCH launches CamemBERT

CamemBERT is based on the RoBERTa architecture and has been trained on 138GB of French text. It establishes a new state of the art in part-of-speech (POS) tagging, Dependency Parsing and named entity recognition (NER), and achieves strong results in natural language inference (NLI); improving the state of the art for most tasks over previous monolingual and multilingual approaches, which confirms the effectiveness of large pretrained language models for French.

CamemBERT is the result of a joint work involving Inria and Facebook Research, and was trained and evaluated by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.

More information: https://camembert-model.fr