Ximing Lu
- Papers
- 32
Cite
Notes
Only stored in your browser.
Authored papers
32ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
arXiv 2026
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
arXiv 2026
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
arXiv 2025
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
arXiv 2025
Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction
arXiv 2025
Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers
arXiv 2025
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
arXiv 2025
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
arXiv 2024
A Roadmap to Pluralistic Alignment
arXiv 2024
StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements
arXiv 2024
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text
arXiv 2024
Faith and Fate: Limits of Transformers on Compositionality
faith-and-fate-limits-of-transformers-on
STEER: Unified Style Transfer with Expert Reinforcement
arXiv 2023
In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search
arXiv 2023
Tailoring Self-Rationalizers with Multi-Reward Distillation
arXiv 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
localized-symbolic-knowledge-distillation-for
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement
arXiv 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
arXiv 2023
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
arXiv 2023
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
arXiv 2023
ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations
arXiv 2022
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization
arXiv 2022
Quark: Controllable Text Generation with Reinforced Unlearning
arXiv 2022
ProsocialDialog: A Prosocial Backbone for Conversational Agents
arXiv 2022
NaturalProver: Grounded Mathematical Proof Generation with Language Models
arXiv 2022
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering
arXiv 2022
Multimodal Knowledge Alignment with Reinforcement Learning
arXiv 2022
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
NAACL 2022 7
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
ACL 2021 5
Generated Knowledge Prompting for Commonsense Reasoning
ACL 2022 5
NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics
NAACL 2022 7
Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer
arXiv 2021
Affiliations
Frequent co-authors
10from 32 papers
Yejin Choi
professor
Liwei Jiang
Peter West
Sean Welleck
Jack Hessel
researcher
Nouha Dziri
researcher
Ronan Le Bras
Skyler Hallinan
Youngjae Yu
Chandra Bhagavatula