0

Zhexin Zhang

Papers
18

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
18papers

Authored papers

18

Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints

arXiv 2025

2025

LongSafety: Evaluating Long-Context Safety of Large Language Models

arXiv 2025

2025

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

arXiv 2025

2025

BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs

arXiv 2025

2025

How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study

arXiv 2025

2025

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

arXiv 2024

2024

Agent-SafetyBench: Evaluating the Safety of LLM Agents

arXiv 2024

2024

From Theft to Bomb-Making: The Ripple Effect of Unlearning in Defending Against Jailbreak Attacks

arXiv 2024

2024

Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework

arXiv 2024

2024

Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models

arXiv 2024

2024

Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation

arXiv 2023

2023

Unveiling the Implicit Toxicity in Large Language Models

arXiv 2023

2023

Safety Assessment of Chinese Large Language Models

arXiv 2023

2023

SafetyBench: Evaluating the Safety of Large Language Models

arXiv 2023

2023

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

arXiv 2023

2023

MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions

arXiv 2022

2022

Persona-Guided Planning for Controlling the Protagonist's Persona in Story Generation

arXiv 2022

2022

OpenMEVA: A Benchmark for Evaluating Open-ended Story Generation Metrics

ACL 2021 5

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 18 papers