Elias Stengel-Eskin
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26Skill-Based Mixture-of-Experts: Adaptive Routing for Heterogeneous Reasoning via Inferred Skills
arXiv 2025
MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments
arXiv 2026
Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind
arXiv 2026
Multimodal Fact-Level Attribution for Verifiable Reasoning
arXiv 2026
Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems
arXiv 2026
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
arXiv 2025
Retrieval-Augmented Generation with Conflicting Evidence
arXiv 2025
PRInTS: Reward Modeling for Long-Horizon Information Seeking
arXiv 2025
CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
ICCV 2025
Learning to Generate Unit Tests for Automated Debugging
arXiv 2025
RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation
arXiv 2025
GenerationPrograms: Fine-grained Attribution with Executable Programs
arXiv 2025
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning
arXiv 2025
The Sum Leaks More Than Its Parts: Compositional Privacy Risks and Mitigations in Multi-Agent Collaboration
arXiv 2025
One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
arXiv 2025
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
arXiv 2024
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
arXiv 2024
Soft Self-Consistency Improves Language Model Agents
arXiv 2024
MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
arXiv 2024
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models
arXiv 2024
AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge
arXiv 2024
See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding
arXiv 2024
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
arXiv 2024
Teaching Models to Balance Resisting and Accepting Persuasion
arXiv 2024
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models
arXiv 2023
Zero and Few-shot Semantic Parsing with Ambiguous Inputs
arXiv 2023
Affiliations
Frequent co-authors
10from 26 papers