Xiaodan Liang
- Papers
- 45
Cite
Notes
Only stored in your browser.
Authored papers
45SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning
arXiv 2026
FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable Diffusion Models
arXiv 2025
SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
arXiv 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
arXiv 2025
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
arXiv 2025
TreeRPO: Tree Relative Policy Optimization
arXiv 2025
CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation
arXiv 2025
ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions
arXiv 2025
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving
arXiv 2024
Realistic and Efficient Face Swapping: A Unified Approach with Diffusion Models
arXiv 2024
DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder
arXiv 2024
DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
arXiv 2024
Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes
arXiv 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
CVPR 2025 1
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
arXiv 2024
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
physgame-uncovering-physical-commonsense
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
arXiv 2024
FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving
arXiv 2024
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
arXiv 2024
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
arXiv 2024
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance
arXiv 2024
Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task
arXiv 2024
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
arXiv 2024
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
arXiv 2024
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling
arXiv 2024
MLP Can Be A Good Transformer Learner
CVPR 2024 1
Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars
arXiv 2024
Surfer: Progressive Reasoning with World Models for Robotic Manipulation
arXiv 2023
CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation
arXiv 2023
Fashion Matrix: Editing Photos by Just Talking
arXiv 2023
DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning
arXiv 2023
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models
arXiv 2023
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations
arXiv 2023
Improving Multi-turn Emotional Support Dialogue Generation with Lookahead Strategy Planning
arXiv 2022
Composable Text Controls in Latent Space with ODEs
arXiv 2022
UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression
arXiv 2022
LogicSolver: Towards Interpretable Math Word Problem Solving with Logical Prompt-enhanced Learning
arXiv 2022
GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning
Findings (ACL) 2021 8
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
ICCV 2021 10
UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model
ultrapose-synthesizing-dense-pose-with-1
Towards Quantifiable Dialogue Coherence Evaluation
ACL 2021 5
Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
ACL 2021 5
Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation
don-t-take-it-literally-an-edit-invariant-1
Dynamic Knowledge Routing Network For Target-Guided Open-Domain Conversation
arXiv 2020
Rethinking Knowledge Graph Propagation for Zero-Shot Learning
rethinking-knowledge-graph-propagation-for-1
Affiliations
Frequent co-authors
10from 45 papers