Yanghua Xiao

ARM: Adaptive Reasoning Model

arXiv 2025

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

arXiv 2025

Reward Shaping to Mitigate Reward Hacking in RLHF

arXiv 2025

Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

arXiv 2025

ARIA: Training Language Agents with Intention-Driven Reward Aggregation

arXiv 2025

AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model

arXiv 2025

MCiteBench: A Multimodal Benchmark for Generating Text with Citations

arXiv 2025

ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection

arXiv 2025

A Stitch in Time Saves Nine: Proactive Self-Refinement for Language Models

arXiv 2025

Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

arXiv 2025

Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following

arXiv 2025

Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models

arXiv 2025

TravelPlanner: A Benchmark for Real-World Planning with Language Agents

arXiv 2024

AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation

arXiv 2024

From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models

arXiv 2024

How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?

arXiv 2024

Revealing the Barriers of Language Agents in Planning

arXiv 2024

ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

arXiv 2024

QUILL: Quotation Generation Enhancement of Large Language Models

arXiv 2024

Past Meets Present: Creating Historical Analogy with Large Language Models

arXiv 2024

MultiLingPoT: Enhancing Mathematical Reasoning with Multilingual Program Fine-tuning

arXiv 2024

InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews

arXiv 2023

Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

arXiv 2023

Can Large Language Models Understand Real-World Complex Instructions?

arXiv 2023

Distilling Script Knowledge from Large Language Models for Constrained Language Planning

arXiv 2023

BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark

arXiv 2023

QUERT: Continual Pre-training of Language Model for Query Understanding in Travel Domain Search

arXiv 2023