Hongning Wang

Parametric Retrieval Augmented Generation

arXiv 2025

Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints

arXiv 2025

SocialEval: Evaluating Social Intelligence of Large Language Models

arXiv 2025

Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues

arXiv 2025

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

arXiv 2025

Data-Efficient RLVR via Off-Policy Influence Guidance

arXiv 2025

Trust-Region Adaptive Policy Optimization

arXiv 2025

LongSafety: Evaluating Long-Context Safety of Large Language Models

arXiv 2025

How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study

arXiv 2025

Human Decision-making is Susceptible to AI-driven Manipulation

arXiv 2025

Evaluating Intelligence via Trial and Error

arXiv 2025

VPO: Aligning Text-to-Video Generation Models with Prompt Optimization

ICCV 2025

BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs

arXiv 2025

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

arXiv 2024

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

arXiv 2024

Benchmarking Complex Instruction-Following with Multiple Constraints Composition

arXiv 2024

Towards Efficient Exact Optimization of Language Model Alignment

arXiv 2024

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

arXiv 2024

AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback

arXiv 2024

LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models

arXiv 2024

CharacterBench: Benchmarking Character Customization of Large Language Models

arXiv 2024

Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning

arXiv 2024

Agent-SafetyBench: Evaluating the Safety of LLM Agents

arXiv 2024

AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models

arXiv 2024

From Theft to Bomb-Making: The Ripple Effect of Unlearning in Defending Against Jailbreak Attacks

arXiv 2024

Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models

arXiv 2024

AlignBench: Benchmarking Chinese Alignment of Large Language Models

arXiv 2023

Black-Box Prompt Optimization: Aligning Large Language Models without Model Training

arXiv 2023

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

arXiv 2023

CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation

arXiv 2023