Quanquan Gu

On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning

arXiv 2025

MARS-M: When Variance Reduction Meets Matrices

arXiv 2025

Group Representational Position Encoding

arXiv 2025

Higher-order Linear Attention

arXiv 2025

TrustLLM: Trustworthiness in Large Language Models

arXiv 2024

Diffusion Language Models Are Versatile Protein Learners

arXiv 2024

General Preference Modeling with Preference Representations for Aligning Language Models

arXiv 2024

MARS: Unleashing the Power of Variance Reduction for Training Large Models

arXiv 2024

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

arXiv 2024

Self-Play Preference Optimization for Language Model Alignment

arXiv 2024

Enhancing Large Vision Language Models with Self-Training on Image Comprehension

arXiv 2024

Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance

arXiv 2024

CryoFM: A Flow-based Foundation Model for Cryo-EM Densities

arXiv 2024

Structure-informed Language Models Are Protein Designers

arXiv 2023

Personalized Federated Learning under Mixture of Distributions

arXiv 2023

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

arXiv 2023

Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves

arXiv 2023

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

arXiv 2023