Xuanjing Huang
- Papers
- 94
Cite
Notes
Only stored in your browser.
Authored papers
94World Action Models: The Next Frontier in Embodied AI
arXiv 2026
AI Can Learn Scientific Taste
arXiv 2026
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges
arXiv 2026
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
arXiv 2026
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models
arXiv 2026
CL-bench: A Benchmark for Context Learning
arXiv 2026
Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control
arXiv 2026
FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions
arXiv 2026
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents
arXiv 2026
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening
arXiv 2026
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment
arXiv 2026
CCTU: A Benchmark for Tool Use under Complex Constraints
arXiv 2026
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models
arXiv 2026
OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions
arXiv 2026
Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment
arXiv 2026
AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems
arXiv 2026
Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies
arXiv 2026
Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning
arXiv 2025
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users
arXiv 2025
WorldPM: Scaling Human Preference Modeling
arXiv 2025
Thus Spake Long-Context Large Language Model
arXiv 2025
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
arXiv 2025
EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation
arXiv 2025
Better Process Supervision with Bi-directional Rewarding Signals
arXiv 2025
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
arXiv 2025
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
arXiv 2025
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
arXiv 2025
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
arXiv 2025
Pre-Trained Policy Discriminators are General Reward Models
arXiv 2025
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
arXiv 2025
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
arXiv 2025
R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
arXiv 2025
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
arXiv 2025
Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
arXiv 2025
EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark
arXiv 2025
Multi-hop Reasoning via Early Knowledge Alignment
arXiv 2025
PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
arXiv 2025
Multi-Agent Simulator Drives Language Models for Legal Intensive Interaction
arXiv 2025
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
arXiv 2024
Searching for Best Practices in Retrieval-Augmented Generation
arXiv 2024
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
arXiv 2024
From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents
arXiv 2024
Length Generalization of Causal Transformers without Position Encoding
arXiv 2024
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
arXiv 2024
Multi-Programming Language Sandbox for LLMs
arXiv 2024
Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders
arXiv 2024
Cross-Modality Safety Alignment
arXiv 2024
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
arXiv 2024
Are Large Language Models Good Prompt Optimizers?
arXiv 2024
Secrets of RLHF in Large Language Models Part II: Reward Modeling
arXiv 2024
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
arXiv 2024
LongHeads: Multi-Head Attention is Secretly a Long Context Processor
arXiv 2024
MouSi: Poly-Visual-Expert Vision-Language Models
arXiv 2024
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
arXiv 2024
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning
arXiv 2024
VLSBench: Unveiling Visual Leakage in Multimodal Safety
arXiv 2024
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
arXiv 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
arXiv 2024
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
arXiv 2024
Beyond Boundaries: Learning a Universal Entity Taxonomy across Datasets and Languages for Open Named Entity Recognition
arXiv 2024
Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks
arXiv 2024
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models
arXiv 2024
ALaRM: Align Language Models via Hierarchical Rewards Modeling
arXiv 2024
TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities
arXiv 2024
Case2Code: Learning Inductive Reasoning with Synthetic Data
arXiv 2024
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
arXiv 2024
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios
arXiv 2024
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
arXiv 2024
F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods
arXiv 2024
On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe
arXiv 2024
TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use
arXiv 2024
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
arXiv 2024
Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models
arXiv 2024
DISC-FinLLM: A Chinese Financial Large Language Model based on Multiple Experts Fine-tuning
arXiv 2023
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
arXiv 2023
DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services
arXiv 2023
DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation
arXiv 2023
Orthogonal Subspace Learning for Language Model Continual Learning
arXiv 2023
The Rise and Potential of Large Language Model Based Agents: A Survey
arXiv 2023
Do Large Language Models Know What They Don't Know?
arXiv 2023
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models
arXiv 2023
CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors
arXiv 2023
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement
arXiv 2023
Aligning Large Language Models with Human Preferences through Representation Engineering
arXiv 2023
From Hypergraph Energy Functions to Hypergraph Neural Networks
arXiv 2023
Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication
arXiv 2023
CoNT: Contrastive Neural Text Generation
arXiv 2022
DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models
arXiv 2022
BBTv2: Towards a Gradient-Free Future with Large Language Models
arXiv 2022
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
NAACL 2022 7
Pre-trained Models for Natural Language Processing: A Survey
arXiv 2020
Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis
EMNLP 2020 11
How to Fine-Tune BERT for Text Classification?
arXiv 2019
GlossBERT: BERT for Word Sense Disambiguation with Gloss Knowledge
glossbert-bert-for-word-sense-disambiguation-1
Affiliations
Frequent co-authors
10from 94 papers