0

Hinrich Schütze

Papers
49

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
49papers

Authored papers

49

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

arXiv 2026

2026

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

arXiv 2026

2026

Large Reasoning Models Are (Not Yet) Multilingual Latent Reasoners

arXiv 2026

2026

GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts

arXiv 2026

2026

NoLiMa: Long-Context Evaluation Beyond Literal Matching

arXiv 2025

2025

How Programming Concepts and Neurons Are Shared in Code Language Models

arXiv 2025

2025

Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu

arXiv 2025

2025

ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge

arXiv 2025

2025

On Relation-Specific Neurons in Large Language Models

arXiv 2025

2025

Tracing Multilingual Factual Knowledge Acquisition in Pretraining

arXiv 2025

2025

XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs

arXiv 2025

2025

Consistent Document-Level Relation Extraction via Counterfactuals

arXiv 2024

2024

HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy

arXiv 2024

2024

GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages

arXiv 2024

2024

TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models

arXiv 2024

2024

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

arXiv 2024

2024

TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

arXiv 2024

2024

TransMI: A Framework to Create Strong Baselines from Multilingual Pretrained Language Models for Transliterated Data

arXiv 2024

2024

LangSAMP: Language-Script Aware Multilingual Pretraining

arXiv 2024

2024

CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation

arXiv 2024

2024

MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

arXiv 2024

2024

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

arXiv 2024

2024

MaskLID: Code-Switching Language Identification through Iterative Masking

arXiv 2024

2024

How Transliterations Improve Crosslingual Alignment

arXiv 2024

2024

MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions

arXiv 2024

2024

MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank

arXiv 2024

2024

ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks

arXiv 2024

2024

A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation

arXiv 2024

2024

GlotLID: Language Identification for Low-Resource Languages

arXiv 2023

2023

RET-LLM: Towards a General Read-Write Memory for Large Language Models

arXiv 2023

2023

A Survey of Corpora for Germanic Low-Resource Languages and Dialects

arXiv 2023

2023

GlotScript: A Resource and Tool for Low Resource Writing System Identification

arXiv 2023

2023

OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining

arXiv 2023

2023

Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages

arXiv 2023

2023

mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models

arXiv 2023

2023

LongForm: Effective Instruction Tuning with Reverse Instructions

arXiv 2023

2023

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages

arXiv 2023

2023

GIRT-Data: Sampling GitHub Issue Report Templates

arXiv 2023

2023

MenuCraft: Interactive Menu System Design with Large Language Models

arXiv 2023

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging

arXiv 2022

2022

Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

arXiv 2021

2021

Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models

EACL 2021 2

2021

Data Centric Domain Adaptation for Historical Text with OCR Errors

arXiv 2021

2021

SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings

Findings of the Association for Computational Linguistics 2020

2020

It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

NAACL 2021 4

2020

Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts

attentive-mimicking-better-word-embeddings-by-1

2019

BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

bertram-improved-word-embeddings-have-big-1

2019

textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior

texttovec-deep-contextualized-neural-1

2018

Affiliations

No known affiliations.

Frequent co-authors

10

from 49 papers