Nanyun Peng
- Papers
- 48
Cite
Notes
Only stored in your browser.
Authored papers
48LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues
arXiv 2026
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis
arXiv 2026
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
arXiv 2026
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
arXiv 2025
Do "New Snow Tablets" Contain Snow? Large Language Models Over-Rely on Names to Identify Ingredients of Chinese Drugs
arXiv 2025
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
arXiv 2025
FronTalk: Benchmarking Front-End Development as Conversational Code Generation with Multi-Modal Feedback
arXiv 2025
DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
arXiv 2025
MMGR: Multi-Modal Generative Reasoning
arXiv 2025
Matryoshka Query Transformer for Large Vision-Language Models
arXiv 2024
On Prompt-Driven Safeguarding for Large Language Models
arXiv 2024
Weak-to-Strong Extrapolation Expedites Alignment
arXiv 2024
New Job, New Gender? Measuring the Social Bias in Image Generation Models
arXiv 2024
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
arXiv 2024
Re-ReST: Reflection-Reinforced Self-Training for Language Agents
arXiv 2024
Verbalized Representation Learning for Interpretable Few-Shot Generalization
ICCV 2025
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs
arXiv 2024
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation
arXiv 2024
On the Loss of Context-awareness in General Instruction Fine-tuning
arXiv 2024
Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?
arXiv 2024
Adaptable Logical Control for Large Language Models
arXiv 2024
Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
arXiv 2024
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
arXiv 2024
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding
arXiv 2024
Evaluating Cultural and Social Awareness of LLM Web Agents
arXiv 2024
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks
arXiv 2023
Tractable Control for Autoregressive Language Generation
arXiv 2023
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters
arXiv 2023
Evaluating Large Language Models on Controlled Generation Tasks
arXiv 2023
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos
arXiv 2023
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge
arXiv 2023
RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment
arXiv 2023
AMPERE: AMR-Aware Prefix for Generation-Based Event Argument Extraction Model
arXiv 2023
ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems
arXiv 2023
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems
arXiv 2023
Identifying Informational Sources in News Articles
arXiv 2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TMLR
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
coarse-to-fine-vision-language-pre-training-1
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension
arXiv 2022
Controllable Text Generation with Neurally-Decomposed Oracle
arXiv 2022
Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction
ACL 2022 5
GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument Roles
arXiv 2022
Generalized Decoding for Pixel, Image, and Language
CVPR 2023 1
Re3: Generating Longer Stories With Recursive Reprompting and Revision
arXiv 2022
NewsEdits: A News Article Revision Dataset and a Document-Level Reasoning Challenge
arXiv 2022
DEGREE: A Data-Efficient Generation-Based Event Extraction Model
NAACL 2022 7
On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark
Findings (ACL) 2022 5
Socially Aware Bias Measurements for Hindi Language Representations
NAACL 2022 7
Affiliations
Frequent co-authors
10from 48 papers