Huchuan Lu
- Papers
- 39
Cite
Notes
Only stored in your browser.
Authored papers
39Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?
arXiv 2026
Think3D: Thinking with Space for Spatial Reasoning
arXiv 2026
VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?
arXiv 2026
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
arXiv 2025
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
arXiv 2025
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
ICCV 2025
EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling
arXiv 2025
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
arXiv 2025
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
arXiv 2025
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
CVPR 2025 1
OASIS: Open Agent Social Interaction Simulations with One Million Agents
arXiv 2024
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
arXiv 2024
Autoregressive Video Generation without Vector Quantization
arXiv 2024
Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification
arXiv 2024
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity
high-precision-dichotomous-image-segmentation-1
Towards Real-Time Open-Vocabulary Video Instance Segmentation
arXiv 2024
StableIdentity: Inserting Anybody into Anywhere at First Sight
arXiv 2024
CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models
arXiv 2024
Multi-view Aggregation Network for Dichotomous Image Segmentation
CVPR 2024 1
DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting
arXiv 2024
ReNeg: Learning Negative Embedding with Reward Guidance
CVPR 2025 1
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception
CVPR 2024 1
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching
arXiv 2024
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning
arXiv 2024
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
arXiv 2024
Open-Vocabulary Camouflaged Object Segmentation
arXiv 2023
CiteTracker: Correlating Image and Text for Visual Tracking
ICCV 2023 1
Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking
ICCV 2023 1
Tracking Anything in High Quality
arXiv 2023
M^{2}SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation
arXiv 2023
MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation
arXiv 2023
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
arXiv 2023
Plug-and-Play Regulators for Image-Text Matching
arXiv 2023
Towards Deeply Unified Depth-aware Panoptic Segmentation with Bi-directional Guidance Learning
ICCV 2023 1
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
CVPR 2024 1
Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation
ICCV 2023 1
HS-Diffusion: Semantic-Mixing Diffusion for Head Swapping
arXiv 2022
Similarity Reasoning and Filtration for Image-Text Matching
arXiv 2021
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
CVPR 2021 1
Affiliations
Frequent co-authors
10from 39 papers