Min Lin
- Papers
- 48
Cite
Notes
Only stored in your browser.
Authored papers
48Revisiting Parameter Server in LLM Post-Training
arXiv 2026
Rethinking the Trust Region in LLM Reinforcement Learning
arXiv 2026
Understanding R1-Zero-Like Training: A Critical Perspective
arXiv 2025
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization
arXiv 2025
StereoGen: High-quality Stereo Image Generation from a Single Image
ICCV 2025
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
arXiv 2025
FlowReasoner: Reinforcing Query-Level Meta-Agents
arXiv 2025
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs
arXiv 2025
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
arXiv 2025
Variational Reasoning for Language Models
arXiv 2025
Defeating the Training-Inference Mismatch via FP16
arXiv 2025
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
arXiv 2025
Reinforcing General Reasoning without Verifiers
arXiv 2025
Lifelong Safety Alignment for Language Models
arXiv 2025
GEM: A Gym for Agentic LLMs
arXiv 2025
When Attention Sink Emerges in Language Models: An Empirical View
arXiv 2024
Sample-Efficient Alignment for LLMs
arXiv 2024
Sailor: Open Language Models for South-East Asia
arXiv 2024
Scaling up Masked Diffusion Models on Text
arXiv 2024
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
arXiv 2024
Pipeline Parallelism with Controllable Memory
arXiv 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
arXiv 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
arXiv 2024
SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction
arXiv 2024
Bootstrapping Language Models with DPO Implicit Rewards
arXiv 2024
Balancing Pipeline Parallelism with Vocabulary Parallelism
arXiv 2024
Beyond Memorization: The Challenge of Random Memory Access in Language Models
arXiv 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
arXiv 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
arXiv 2024
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
arXiv 2024
Stochastic Taylor Derivative Estimator: Efficient amortization for arbitrary differential operators
arXiv 2024
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
arXiv 2023
Instant3D: Instant Text-to-3D Generation
arXiv 2023
Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
arXiv 2023
Automatic Functional Differentiation in JAX
arXiv 2023
From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning
arXiv 2023
Finetuning Text-to-Image Diffusion Models for Fairness
arXiv 2023
Bag of Tricks for Training Data Extraction from Language Models
arXiv 2023
On Calibrating Diffusion Probabilistic Models
on-calibrating-diffusion-probabilistic-models
Intriguing Properties of Data Attribution on Diffusion Models
arXiv 2023
BAFFLE: A Baseline of Backpropagation-Free Federated Learning
arXiv 2023
NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF
nu-mcc-multiview-compressive-coding-with
Nonparametric Generative Modeling with Conditional Sliced-Wasserstein Flows
arXiv 2023
On Evaluating Adversarial Robustness of Large Vision-Language Models
NeurIPS 2023 11
A Recipe for Watermarking Diffusion Models
arXiv 2023
Better Diffusion Models Further Improve Adversarial Training
arXiv 2023
EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine
arXiv 2022
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition
arXiv 2022
Affiliations
Frequent co-authors
10from 48 papers