Ivan Vulić

Cross-Tokenizer Distillation via Approximate Likelihood Matching

arXiv 2025

RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding

arXiv 2025

Lost in Embeddings: Information Loss in Vision-Language Models

arXiv 2025

Language Fusion for Parameter-Efficient Cross-lingual Transfer

arXiv 2025

Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding

arXiv 2025

Agentic Policy Optimization via Instruction-Policy Co-Evolution

arXiv 2025

Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments

arXiv 2024

Retrofitting Large Language Models with Dynamic Tokenization

arXiv 2024

Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators

arXiv 2024

TopViewRS: Vision-Language Models as Top-View Spatial Reasoners

arXiv 2024

Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation

arXiv 2024

Pheme: Efficient and Conversational Speech Generation

arXiv 2024

Zero-Shot Tokenizer Transfer

arXiv 2024

Scaling Sparse Fine-Tuning to Large Language Models

arXiv 2024

DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models

arXiv 2024

Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

arXiv 2023

On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning

arXiv 2023

Distilling Efficient Language-Specific Models for Cross-Lingual Transfer

arXiv 2023

Free Lunch: Robust Cross-Lingual Transfer via Model Checkpoint Averaging

arXiv 2023

AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning

arXiv 2023

Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems

arXiv 2023

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning

arXiv 2023

CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models

arXiv 2023

NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented Dialogue

Findings (NAACL) 2022 7

2022

EVI: Multilingual Spoken Dialogue Tasks and Dataset for Knowledge-Based Enrolment, Verification, and Identification

Findings (NAACL) 2022 7

2022

Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval

COLING 2022 10

2022

Composable Sparse Fine-Tuning for Cross-Lingual Transfer

ACL 2022 5

2021

Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders

EMNLP 2021 11

2021

xGQA: Cross-Lingual Visual Question Answering

Findings (ACL) 2022 5

2021

AdapterHub: A Framework for Adapting Transformers

EMNLP 2020 11

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

EMNLP 2020 11

How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models

ACL 2021 5

UNKs Everywhere: Adapting Multilingual Language Models to New Scripts

EMNLP 2021 11