Zhexin Zhang
- Papers
- 18
Cite
Notes
Only stored in your browser.
Authored papers
18Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
arXiv 2025
LongSafety: Evaluating Long-Context Safety of Large Language Models
arXiv 2025
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
arXiv 2025
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
arXiv 2025
How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study
arXiv 2025
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
arXiv 2024
Agent-SafetyBench: Evaluating the Safety of LLM Agents
arXiv 2024
From Theft to Bomb-Making: The Ripple Effect of Unlearning in Defending Against Jailbreak Attacks
arXiv 2024
Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework
arXiv 2024
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
arXiv 2024
Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
arXiv 2023
Unveiling the Implicit Toxicity in Large Language Models
arXiv 2023
Safety Assessment of Chinese Large Language Models
arXiv 2023
SafetyBench: Evaluating the Safety of Large Language Models
arXiv 2023
Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
arXiv 2023
MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions
arXiv 2022
Persona-Guided Planning for Controlling the Protagonist's Persona in Story Generation
arXiv 2022
OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics
ACL 2021 5
Affiliations
Frequent co-authors
10from 18 papers