Xiu Li
- Papers
- 45
Cite
Notes
Only stored in your browser.
Authored papers
45KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration
arXiv 2026
Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars
arXiv 2026
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
arXiv 2026
MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization
arXiv 2026
BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models
arXiv 2026
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?
arXiv 2026
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
ICCV 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
arXiv 2025
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
arXiv 2025
MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds
arXiv 2025
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
arXiv 2025
Puppeteer: Rig and Animate Your 3D Models
arXiv 2025
MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment
arXiv 2025
Controllable Layer Decomposition for Reversible Multi-Layer Image Generation
arXiv 2025
GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation
CVPR 2025 1
ASPO: Asymmetric Importance Sampling Policy Optimization
arXiv 2025
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
arXiv 2025
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR
arXiv 2025
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
arXiv 2025
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
arXiv 2025
S$^2$-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models
arXiv 2025
SkillMimic: Learning Basketball Interaction Skills from Demonstrations
CVPR 2025 1
GrootVL: Tree Topology is All You Need in State Space Model
arXiv 2024
MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models
arXiv 2024
Bridging the Divide: Reconsidering Softmax and Linear Attention
arXiv 2024
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
arXiv 2024
CreativeSynth: Cross-Art-Attention for Artistic Image Synthesis with Multimodal Diffusion
arXiv 2024
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing
arXiv 2024
UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment
arXiv 2024
SEABO: A Simple Search-Based Method for Offline Imitation Learning
arXiv 2024
Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders
dora-sampling-and-benchmarking-for-3d-shape
Diffusion Models in Low-Level Vision: A Survey
arXiv 2024
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts
arXiv 2024
Taming Rectified Flow for Inversion and Editing
arXiv 2024
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives
CVPR 2024 1
Magic-Boost: Boost 3D Generation with Multi-View Conditioned Diffusion
arXiv 2024
MultiBooth: Towards Generating All Your Concepts in an Image from Text
arXiv 2024
BoxSnake: Polygonal Instance Segmentation with Box Supervision
ICCV 2023 1
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
soc-semantic-assisted-object-cluster-for-1
Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects
arXiv 2023
Efficient Meshy Neural Fields for Animatable Human Avatars
arXiv 2023
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
arXiv 2023
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
NeurIPS 2023 11
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
CVPR 2024 1
FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation
ICCV 2023 1
Affiliations
Frequent co-authors
10from 45 papers