Hinrich Schütze
- Papers
- 49
Cite
Notes
Only stored in your browser.
Authored papers
49Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models
arXiv 2026
Crosslingual On-Policy Self-Distillation for Multilingual Reasoning
arXiv 2026
Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners
arXiv 2026
GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts
arXiv 2026
NoLiMa: Long-Context Evaluation Beyond Literal Matching
arXiv 2025
How Programming Concepts and Neurons Are Shared in Code Language Models
arXiv 2025
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu
arXiv 2025
ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge
arXiv 2025
On Relation-Specific Neurons in Large Language Models
arXiv 2025
Tracing Multilingual Factual Knowledge Acquisition in Pretraining
arXiv 2025
XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs
arXiv 2025
Consistent Document-Level Relation Extraction via Counterfactuals
arXiv 2024
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy
arXiv 2024
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
arXiv 2024
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models
arXiv 2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
arXiv 2024
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish
arXiv 2024
TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data
arXiv 2024
LangSAMP: Language-Script Aware Multilingual Pretraining
arXiv 2024
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation
arXiv 2024
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
arXiv 2024
MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory
arXiv 2024
MaskLID: Code-Switching Language Identification through Iterative Masking
arXiv 2024
How Transliterations Improve Crosslingual Alignment
arXiv 2024
MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions
arXiv 2024
MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank
arXiv 2024
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks
arXiv 2024
A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation
arXiv 2024
GlotLID: Language Identification for Low-Resource Languages
arXiv 2023
RET-LLM: Towards a General Read-Write Memory for Large Language Models
arXiv 2023
A Survey of Corpora for Germanic Low-Resource Languages and Dialects
arXiv 2023
GlotScript: A Resource and Tool for Low Resource Writing System Identification
arXiv 2023
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining
arXiv 2023
Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages
arXiv 2023
mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models
arXiv 2023
LongForm: Effective Instruction Tuning with Reverse Instructions
arXiv 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
arXiv 2023
GIRT-Data: Sampling GitHub Issue Report Templates
arXiv 2023
MenuCraft: Interactive Menu System Design with Large Language Models
arXiv 2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging
arXiv 2022
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
arXiv 2021
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models
EACL 2021 2
Data Centric Domain Adaptation for Historical Text with OCR Errors
arXiv 2021
SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings
Findings of the Association for Computational Linguistics 2020
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
NAACL 2021 4
Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts
attentive-mimicking-better-word-embeddings-by-1
BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance
bertram-improved-word-embeddings-have-big-1
textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior
texttovec-deep-contextualized-neural-1
Affiliations
Frequent co-authors
10from 49 papers