0

Xiu Li

Papers
45

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
45papers

Authored papers

45

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

arXiv 2026

2026

Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars

arXiv 2026

2026

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

arXiv 2026

2026

MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization

arXiv 2026

2026

BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models

arXiv 2026

2026

ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

arXiv 2026

2026

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

ICCV 2025

2025

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

arXiv 2025

2025

Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation

arXiv 2025

2025

MVInverse: Feed-forward Multi-view Inverse Rendering in Seconds

arXiv 2025

2025

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

arXiv 2025

2025

Puppeteer: Rig and Animate Your 3D Models

arXiv 2025

2025

MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment

arXiv 2025

2025

Controllable Layer Decomposition for Reversible Multi-Layer Image Generation

arXiv 2025

2025

GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation

CVPR 2025 1

2025

ASPO: Asymmetric Importance Sampling Policy Optimization

arXiv 2025

2025

SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation

arXiv 2025

2025

Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR

arXiv 2025

2025

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

arXiv 2025

2025

GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

arXiv 2025

2025

S$^2$-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models

arXiv 2025

2025

SkillMimic: Learning Basketball Interaction Skills from Demonstrations

CVPR 2025 1

2024

GrootVL: Tree Topology is All You Need in State Space Model

arXiv 2024

2024

MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models

arXiv 2024

2024

Bridging the Divide: Reconsidering Softmax and Linear Attention

arXiv 2024

2024

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators

arXiv 2024

2024

CreativeSynth: Cross-Art-Attention for Artistic Image Synthesis with Multimodal Diffusion

arXiv 2024

2024

COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing

arXiv 2024

2024

UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment

arXiv 2024

2024

SEABO: A Simple Search-Based Method for Offline Imitation Learning

arXiv 2024

2024

Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders

dora-sampling-and-benchmarking-for-3d-shape

2024

Diffusion Models in Low-Level Vision: A Survey

arXiv 2024

2024

Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

arXiv 2024

2024

Taming Rectified Flow for Inversion and Editing

arXiv 2024

2024

Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives

CVPR 2024 1

2024

Magic-Boost: Boost 3D Generation with Multi-View Conditioned Diffusion

arXiv 2024

2024

MultiBooth: Towards Generating All Your Concepts in an Image from Text

arXiv 2024

2024

BoxSnake: Polygonal Instance Segmentation with Box Supervision

ICCV 2023 1

2023

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation

soc-semantic-assisted-object-cluster-for-1

2023

Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects

arXiv 2023

2023

Efficient Meshy Neural Fields for Animatable Human Avatars

arXiv 2023

2023

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

arXiv 2023

2023

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

NeurIPS 2023 11

2023

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

CVPR 2024 1

2023

FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation

ICCV 2023 1

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 45 papers