0

Patrick Schramowski

Papers
25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
25papers

Authored papers

25

Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models

arXiv 2025

2025

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

arXiv 2025

2025

Introducing v0.5 of the AI Safety Benchmark from MLCommons

arXiv 2024

2024

ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

arXiv 2024

2024

T-FREE: Subword Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings

arXiv 2024

2024

SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs

arXiv 2024

2024

Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You

arXiv 2024

2024

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

arXiv 2024

2024

AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation

NeurIPS 2023 11

2023

Class Attribute Inference Attacks: Inferring Sensitive Class Information by Diffusion-Based Attribute Manipulations

arXiv 2023

2023

Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness

arXiv 2023

2023

Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models

CVPR 2023 1

2022

Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis

arXiv 2022

2022

A Typology for Exploring the Mitigation of Shortcut Behavior

arXiv 2022

2022

ILLUME: Rationalizing Vision-Language Models through Human Interactions

arXiv 2022

2022

Revision Transformers: Instructing Language Models to Change their Values

arXiv 2022

2022

Speaking Multiple Languages Affects the Moral Bias of Language Models

arXiv 2022

2022

Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content?

arXiv 2022

2022

Does CLIP Know My Face?

arXiv 2022

2022

Inferring Offensiveness In Images From Natural Language Supervision

inferring-offensiveness-in-images-from-1

2021

Adaptive Rational Activations to Boost Deep Reinforcement Learning

arXiv 2021

2021

Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to Do

arXiv 2021

2021

Interactively Providing Explanations for Transformer Language Models

interactively-generating-explanations-for-1

2021

Making deep neural networks right for the right scientific reasons by interacting with their explanations

arXiv 2020

2020

Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks

ICLR 2020 1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 25 papers