Shihan Dou
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges
arXiv 2026
CL-bench: A Benchmark for Context Learning
arXiv 2026
Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control
arXiv 2026
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents
arXiv 2026
FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions
arXiv 2026
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening
arXiv 2026
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment
arXiv 2026
Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies
arXiv 2026
Pre-Trained Policy Discriminators are General Reward Models
arXiv 2025
Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
arXiv 2025
PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts
arXiv 2025
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
arXiv 2024
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
arXiv 2024
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
arXiv 2024
TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities
arXiv 2024
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
arXiv 2024
DocFusion: A Unified Framework for Document Parsing Tasks
arXiv 2024
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
arXiv 2024
Secrets of RLHF in Large Language Models Part II: Reward Modeling
arXiv 2024
Multi-Programming Language Sandbox for LLMs
arXiv 2024
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
arXiv 2024
MouSi: Poly-Visual-Expert Vision-Language Models
arXiv 2024
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
arXiv 2024
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance
arXiv 2024
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin
arXiv 2023
The Rise and Potential of Large Language Model Based Agents: A Survey
arXiv 2023
Affiliations
Frequent co-authors
10from 26 papers