François Yvon
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts
arXiv 2026
On Relation-Specific Neurons in Large Language Models
arXiv 2025
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu
arXiv 2025
Tracing Multilingual Factual Knowledge Acquisition in Pretraining
arXiv 2025
How Programming Concepts and Neurons Are Shared in Code Language Models
arXiv 2025
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
arXiv 2024
MaskLID: Code-Switching Language Identification through Iterative Masking
arXiv 2024
How Transliterations Improve Crosslingual Alignment
arXiv 2024
CroissantLLM: A Truly Bilingual French-English Language Model
arXiv 2024
MOSAIC: Multiple Observers Spotting AI Content, a Robust Approach to Machine-Generated Text Detection
arXiv 2024
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
arXiv 2024
GlotLID: Language Identification for Low-Resource Languages
arXiv 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
arXiv 2023
Assessing Word Importance Using Models Trained for Semantic Tasks
arXiv 2023
GlotScript: A Resource and Tool for Low Resource Writing System Identification
arXiv 2023
Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging
arXiv 2022
SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings
Findings of the Association for Computational Linguistics 2020
Affiliations
Frequent co-authors
10from 17 papers
Hinrich Schütze
Amir Hossein Kargaran
Yihong Liu
Ayyoob Imani
Masoud Jalili Sabet
Mingyang Wang
André F. T. Martins
Chunlan Ma
Jana Diesner
Nafiseh Nikeghbal