0

Bill Yuchen Lin

Meta FAIR research scientist; previously AI2; created WildBench, WildChat, ZebraLogic.

Role
researcher
Papers
20

Cite

Notes

Only stored in your browser.

20papers

Authored papers

20

TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning

arXiv 2025

2025

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

preprint

2024

RewardBench: Evaluating Reward Models for Language Modeling

arXiv 2024

2024

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

arXiv 2024

2024

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

arXiv 2024

2024

ASCIIEval: Benchmarking Models' Visual Perception in Text Strings via ASCII Art

arXiv 2024

2024

ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

arXiv 2024

2024

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

arXiv 2024

2024

Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents

arXiv 2024

2024

SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

arXiv 2024

2024

LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

arXiv 2023

2023

Faith and Fate: Limits of Transformers on Compositionality

faith-and-fate-limits-of-transformers-on

2023

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

arXiv 2023

2023

Agent Lumos: Unified and Modular Training for Open-Source Language Agents

arXiv 2023

2023

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

arXiv 2023

2023

LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion

arXiv 2023

2023

Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4

arXiv 2023

2023

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks

arXiv 2023

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

TMLR

2022

Common Sense Beyond English: Evaluating and Improving Multilingual Language Models for Commonsense Reasoning

ACL 2021 5

2021

Affiliations

Frequent co-authors

10

from 20 papers