0

Ximing Lu

Papers
32

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
32papers

Authored papers

32

ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

arXiv 2026

2026

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

arXiv 2026

2026

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

arXiv 2025

2025

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

arXiv 2025

2025

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

arXiv 2025

2025

Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers

arXiv 2025

2025

The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage

arXiv 2025

2025

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

arXiv 2024

2024

A Roadmap to Pluralistic Alignment

arXiv 2024

2024

StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements

arXiv 2024

2024

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

arXiv 2024

2024

Faith and Fate: Limits of Transformers on Compositionality

faith-and-fate-limits-of-transformers-on

2023

STEER: Unified Style Transfer with Expert Reinforcement

arXiv 2023

2023

In Search of the Long-Tail: Systematic Generation of Long-Tail Inferential Knowledge via Logical Rule Guided Search

arXiv 2023

2023

Tailoring Self-Rationalizers with Multi-Reward Distillation

arXiv 2023

2023

Localized Symbolic Knowledge Distillation for Visual Commonsense Models

localized-symbolic-knowledge-distillation-for

2023

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

arXiv 2023

2023

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models

arXiv 2023

2023

Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning

arXiv 2023

2023

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

arXiv 2023

2023

ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations

arXiv 2022

2022

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

arXiv 2022

2022

Quark: Controllable Text Generation with Reinforced Unlearning

arXiv 2022

2022

ProsocialDialog: A Prosocial Backbone for Conversational Agents

arXiv 2022

2022

NaturalProver: Grounded Mathematical Proof Generation with Language Models

arXiv 2022

2022

Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering

arXiv 2022

2022

Multimodal Knowledge Alignment with Reinforcement Learning

arXiv 2022

2022

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

NAACL 2022 7

2021

DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

ACL 2021 5

2021

Generated Knowledge Prompting for Commonsense Reasoning

ACL 2022 5

2021

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

NAACL 2022 7

2021

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 32 papers