0

Shihan Dou

Papers
26

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
26papers

Authored papers

26

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

arXiv 2026

2026

CL-bench: A Benchmark for Context Learning

arXiv 2026

2026

Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control

arXiv 2026

2026

SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents

arXiv 2026

2026

FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

arXiv 2026

2026

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

arXiv 2026

2026

OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment

arXiv 2026

2026

Can Deep Research Agents Find and Organize? Evaluating the Synthesis Gap with Expert Taxonomies

arXiv 2026

2026

Pre-Trained Policy Discriminators are General Reward Models

arXiv 2025

2025

Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric

arXiv 2025

2025

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts

arXiv 2025

2025

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

arXiv 2024

2024

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

arXiv 2024

2024

RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

arXiv 2024

2024

TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities

arXiv 2024

2024

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

arXiv 2024

2024

DocFusion: A Unified Framework for Document Parsing Tasks

arXiv 2024

2024

Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

arXiv 2024

2024

Secrets of RLHF in Large Language Models Part II: Reward Modeling

arXiv 2024

2024

Multi-Programming Language Sandbox for LLMs

arXiv 2024

2024

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

arXiv 2024

2024

MouSi: Poly-Visual-Expert Vision-Language Models

arXiv 2024

2024

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

arXiv 2024

2024

SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

arXiv 2024

2024

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

arXiv 2023

2023

The Rise and Potential of Large Language Model Based Agents: A Survey

arXiv 2023

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 26 papers