Xiang Li
- Papers
- 96
Cite
Notes
Only stored in your browser.
Authored papers
96Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
arXiv 2026
AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents
arXiv 2026
LiveMedBench: A Contamination-Free Medical Benchmark for LLMs with Automated Rubric Evaluation
arXiv 2026
GradientStabilizer:Fix the Norm, Not the Gradient
arXiv 2025
StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation
arXiv 2026
SAM 3D: 3Dfy Anything in Images
arXiv 2025
Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library
arXiv 2025
Masked Autoencoders Are Effective Tokenizers for Diffusion Models
arXiv 2025
M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection
arXiv 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error
arXiv 2025
VideoMultiAgents: A Multi-Agent Framework for Video Question Answering
arXiv 2025
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs
arXiv 2025
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
arXiv 2025
DanceTogether! Identity-Preserving Multi-Person Interactive Video Generation
arXiv 2025
MedSAM3: Delving into Segment Anything with Medical Concepts
arXiv 2025
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
arXiv 2025
SAMed-2: Selective Memory Enhanced Medical Segment Anything Model
arXiv 2025
Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology
arXiv 2025
Image Tokenizer Needs Post-Training
arXiv 2025
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data
arXiv 2025
"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models
arXiv 2025
See through the Dark: Learning Illumination-affined Representations for Nighttime Occupancy Prediction
arXiv 2025
REOBench: Benchmarking Robustness of Earth Observation Foundation Models
arXiv 2025
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding
arXiv 2025
AirSim360: A Panoramic Simulation Platform within Drone View
arXiv 2025
MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs
arXiv 2025
NUDT4MSTAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the Wild
arXiv 2025
Measuring the Robustness of Audio Deepfake Detectors
arXiv 2025
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
arXiv 2025
ATPrompt: Textual Prompt Learning with Embedded Attributes
ICCV 2025
TrustLLM: Trustworthiness in Large Language Models
arXiv 2024
ControlVAR: Exploring Controllable Visual Autoregressive Modeling
arXiv 2024
HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes
arXiv 2024
SVIPTR: Fast and Efficient Scene Text Recognition with Vision Permutable Extractor
arXiv 2024
Unified Dual-Intent Translation for Joint Modeling of Search and Recommendation
arXiv 2024
ECHOPulse: ECG controlled echocardio-grams video generation
arXiv 2024
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts
arXiv 2024
Biomedical SAM 2: Segment Anything in Biomedical Images and Videos
arXiv 2024
PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization
arXiv 2024
QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL
arXiv 2024
Cross-model Control: Improving Multiple Large Language Models in One-time Training
arXiv 2024
RELIEF: Reinforcement Learning Empowered Graph Feature Prompt Tuning
arXiv 2024
$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
arXiv 2024
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
arXiv 2024
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
arXiv 2024
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection
arXiv 2024
TableGPT2: A Large Multimodal Model with Tabular Data Integration
arXiv 2024
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
arXiv 2024
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
CVPR 2024 1
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond
arXiv 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
arXiv 2024
SARATR-X: Toward Building A Foundation Model for SAR Target Recognition
arXiv 2024
Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis
arXiv 2024
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
arXiv 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
arXiv 2024
PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training
arXiv 2024
Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
arXiv 2024
Understanding Long Videos with Multimodal Language Models
arXiv 2024
Cascade Prompt Learning for Vision-Language Model Adaptation
arXiv 2024
Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction
arXiv 2024
HopTrack: A Real-time Multi-Object Tracking System for Embedded Devices
arXiv 2024
$\textit{SKIntern}$: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models
arXiv 2024
Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions
arXiv 2024
Open-domain Implicit Format Control for Large Language Model Generation
arXiv 2024
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
CVPR 2025 1
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
arXiv 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
arXiv 2023
ADNet: Lane Shape Prediction via Anchor Decomposition
ICCV 2023 1
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
arXiv 2023
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales
arXiv 2023
Boosting Language Models Reasoning with Chain-of-Knowledge Prompting
arXiv 2023
Soulstyler: Using Large Language Model to Guide Image Style Transfer for Target Object
arXiv 2023
MediViSTA: Medical Video Segmentation via Temporal Fusion SAM Adaptation for Echocardiography
arXiv 2023
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models
arXiv 2023
Creative Birds: Self-Supervised Single-View 3D Style Transfer
ICCV 2023 1
Make Prompt-based Black-Box Tuning Colorful: Boosting Model Generalization from Three Orthogonal Perspectives
arXiv 2023
Leveraging Large Language Models for Node Generation in Few-Shot Learning on Text-Attributed Graphs
arXiv 2023
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks
arXiv 2023
Large Selective Kernel Network for Remote Sensing Object Detection
ICCV 2023 1
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
arXiv 2023
Decoding Natural Images from EEG for Object Recognition
arXiv 2023
CrossKD: Cross-Head Knowledge Distillation for Object Detection
CVPR 2024 1
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
arXiv 2023
Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
arXiv 2023
Fine-Grained Visual Prompting
NeurIPS 2023 11
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4
arXiv 2023
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
arXiv 2023
TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills
arXiv 2023
The NeurIPS 2022 Neural MMO Challenge: A Massively Multiagent Competition with Specialization and Trade
arXiv 2023
CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure
arXiv 2022
Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus
arXiv 2022
Heterogeneous Graph Contrastive Learning with Meta-path Contexts and Adaptively Weighted Negative Samples
arXiv 2022
Lexical Knowledge Internalization for Neural Dialog Generation
ACL 2022 5
Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
arXiv 2022
PVT v2: Improved Baselines with Pyramid Vision Transformer
arXiv 2021
Selective Kernel Networks
selective-kernel-networks-1
Affiliations
Frequent co-authors
10from 96 papers