Maksym Andriushchenko
- Papers
- 18
Cite
Notes
Only stored in your browser.
Authored papers
18FutureSim: Replaying World Events to Evaluate Adaptive Agents
arXiv 2026
HalluHard: A Hard Multi-Turn Hallucination Benchmark
arXiv 2026
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
arXiv 2025
Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
arXiv 2025
Capability-Based Scaling Laws for LLM Red-Teaming
arXiv 2025
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
arXiv 2024
Improving Alignment and Robustness with Circuit Breakers
arXiv 2024
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs
arXiv 2024
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
arXiv 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
arXiv 2024
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
arXiv 2024
Does Refusal Training in LLMs Generalize to the Past Tense?
arXiv 2024
Is In-Context Learning Sufficient for Instruction Following in LLMs?
arXiv 2024
A Modern Look at the Relationship between Sharpness and Generalization
arXiv 2023
Layer-wise Linear Mode Connectivity
arXiv 2023
Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings
transferable-adversarial-robustness-for
Why Do We Need Weight Decay in Modern Deep Learning?
arXiv 2023
SGD with Large Step Sizes Learns Sparse Features
arXiv 2022
Affiliations
Frequent co-authors
10from 18 papers