Xin Li
- Papers
- 77
Cite
Notes
Only stored in your browser.
Authored papers
77InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery
arXiv 2026
RynnBrain: Open Embodied Foundation Models
arXiv 2026
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
arXiv 2026
UniRecGen: Unifying Multi-View 3D Reconstruction and Generation
arXiv 2026
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking
arXiv 2026
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
arXiv 2026
InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search
arXiv 2026
BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs
arXiv 2026
Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding
arXiv 2026
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities
arXiv 2026
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
arXiv 2025
Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support
arXiv 2025
OmniCaptioner: One Captioner to Rule Them All
arXiv 2025
Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?
arXiv 2025
FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail
arXiv 2025
Long-Context Inference with Retrieval-Augmented Speculative Decoding
arXiv 2025
Learning Fused State Representations for Control from Multi-View Observations
arXiv 2025
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
arXiv 2025
HunyuanVideo 1.5 Technical Report
arXiv 2025
RynnVLA-002: A Unified Vision-Language-Action and World Model
arXiv 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
arXiv 2025
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
arXiv 2025
Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures
arXiv 2025
HunyuanImage 3.0 Technical Report
arXiv 2025
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
arXiv 2025
WorldVLA: Towards Autoregressive Action World Model
arXiv 2025
Reconstructing 4D Spatial Intelligence: A Survey
arXiv 2025
RynnEC: Bringing MLLMs into Embodied World
arXiv 2025
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
arXiv 2025
Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny
arXiv 2025
COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes
arXiv 2025
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
arXiv 2025
Controllable 3D Outdoor Scene Generation via Scene Graphs
ICCV 2025
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency
arXiv 2025
UniFit: Towards Universal Virtual Try-on with MLLM-Guided Semantic Alignment
arXiv 2025
DiaBlo: Diagonal Blocks Are Sufficient For Finetuning
arXiv 2025
RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation
arXiv 2025
AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives
arXiv 2025
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
arXiv 2024
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
arXiv 2024
Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal
arXiv 2024
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
arXiv 2024
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
CVPR 2025 1
Learning Latent Dynamic Robust Representations for World Models
arXiv 2024
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
arXiv 2024
V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion
arXiv 2024
UU-Mamba: Uncertainty-aware U-Mamba for Cardiovascular Segmentation
arXiv 2024
HunyuanVideo: A Systematic Framework For Large Video Generative Models
arXiv 2024
Scalable Autoregressive Image Generation with Mamba
arXiv 2024
Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models
arXiv 2024
COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization
arXiv 2024
FlamePINN-1D: Physics-informed neural networks to solve forward and inverse problems of 1D laminar flames
arXiv 2024
Towards Multi-modal Transformers in Federated Learning
arXiv 2024
Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics
arXiv 2024
WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations
arXiv 2024
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages
arXiv 2024
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
arXiv 2024
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
ICCV 2023 1
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
arXiv 2023
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase
ICCV 2023 1
CiteTracker: Correlating Image and Text for Visual Tracking
ICCV 2023 1
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
arXiv 2023
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
arXiv 2023
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models
arXiv 2023
SeaLLMs -- Large Language Models for Southeast Asia
arXiv 2023
Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective
CVPR 2023 1
LMR: A Large-Scale Multi-Reference Dataset for Reference-based Super-Resolution
ICCV 2023 1
CLEX: Continuous Length Extrapolation for Large Language Models
arXiv 2023
Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells
ICCV 2023 1
Fast Full-frame Video Stabilization with Iterative Optimization
ICCV 2023 1
Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic Distance Enhances Open World Object Detection
arXiv 2023
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
CVPR 2023 1
AQE: Argument Quadruplet Extraction via a Quad-Tagging Augmented Generative Approach
arXiv 2023
From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader
arXiv 2022
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
ICCV 2021 10
NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results
arXiv 2021
Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
partial-order-pruning-for-best-speedaccuracy-1
Affiliations
Frequent co-authors
10from 77 papers