0

Xin Li

Papers
77

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
77papers

Authored papers

77

InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

arXiv 2026

2026

RynnBrain: Open Embodied Foundation Models

arXiv 2026

2026

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

arXiv 2026

2026

UniRecGen: Unifying Multi-View 3D Reconstruction and Generation

arXiv 2026

2026

ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking

arXiv 2026

2026

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

arXiv 2026

2026

InterLV-Search: Benchmarking Interleaved Multimodal Agentic Search

arXiv 2026

2026

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

arXiv 2026

2026

Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding

arXiv 2026

2026

Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities

arXiv 2026

2026

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

arXiv 2025

2025

Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support

arXiv 2025

2025

OmniCaptioner: One Captioner to Rule Them All

arXiv 2025

2025

Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?

arXiv 2025

2025

FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail

arXiv 2025

2025

Long-Context Inference with Retrieval-Augmented Speculative Decoding

arXiv 2025

2025

Learning Fused State Representations for Control from Multi-View Observations

arXiv 2025

2025

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

arXiv 2025

2025

HunyuanVideo 1.5 Technical Report

arXiv 2025

2025

RynnVLA-002: A Unified Vision-Language-Action and World Model

arXiv 2025

2025

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

arXiv 2025

2025

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

arXiv 2025

2025

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

arXiv 2025

2025

HunyuanImage 3.0 Technical Report

arXiv 2025

2025

RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

arXiv 2025

2025

WorldVLA: Towards Autoregressive Action World Model

arXiv 2025

2025

Reconstructing 4D Spatial Intelligence: A Survey

arXiv 2025

2025

RynnEC: Bringing MLLMs into Embodied World

arXiv 2025

2025

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

arXiv 2025

2025

Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny

arXiv 2025

2025

COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes

arXiv 2025

2025

EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?

arXiv 2025

2025

Controllable 3D Outdoor Scene Generation via Scene Graphs

ICCV 2025

2025

Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency

arXiv 2025

2025

UniFit: Towards Universal Virtual Try-on with MLLM-Guided Semantic Alignment

arXiv 2025

2025

DiaBlo: Diagonal Blocks Are Sufficient For Finetuning

arXiv 2025

2025

RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation

arXiv 2025

2025

AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives

arXiv 2025

2025

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

arXiv 2024

2024

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

arXiv 2024

2024

Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal

arXiv 2024

2024

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

arXiv 2024

2024

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

CVPR 2025 1

2024

Learning Latent Dynamic Robust Representations for World Models

arXiv 2024

2024

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

arXiv 2024

2024

V2X-R: Cooperative LiDAR-4D Radar Fusion for 3D Object Detection with Denoising Diffusion

arXiv 2024

2024

UU-Mamba: Uncertainty-aware U-Mamba for Cardiovascular Segmentation

arXiv 2024

2024

HunyuanVideo: A Systematic Framework For Large Video Generative Models

arXiv 2024

2024

Scalable Autoregressive Image Generation with Mamba

arXiv 2024

2024

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

arXiv 2024

2024

COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization

arXiv 2024

2024

FlamePINN-1D: Physics-informed neural networks to solve forward and inverse problems of 1D laminar flames

arXiv 2024

2024

Towards Multi-modal Transformers in Federated Learning

arXiv 2024

2024

Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics

arXiv 2024

2024

WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations

arXiv 2024

2024

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

arXiv 2024

2024

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

arXiv 2024

2024

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds

ICCV 2023 1

2023

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

arXiv 2023

2023

UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase

ICCV 2023 1

2023

CiteTracker: Correlating Image and Text for Visual Tracking

ICCV 2023 1

2023

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

arXiv 2023

2023

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

arXiv 2023

2023

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

arXiv 2023

2023

SeaLLMs -- Large Language Models for Southeast Asia

arXiv 2023

2023

Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective

CVPR 2023 1

2023

LMR: A Large-Scale Multi-Reference Dataset for Reference-based Super-Resolution

ICCV 2023 1

2023

CLEX: Continuous Length Extrapolation for Large Language Models

arXiv 2023

2023

Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells

ICCV 2023 1

2023

Fast Full-frame Video Stabilization with Iterative Optimization

ICCV 2023 1

2023

Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic Distance Enhances Open World Object Detection

arXiv 2023

2023

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion

CVPR 2023 1

2023

AQE: Argument Quadruplet Extraction via a Quad-Tagging Augmented Generative Approach

arXiv 2023

2023

From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader

arXiv 2022

2022

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction

ICCV 2021 10

2021

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

arXiv 2021

2021

Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search

partial-order-pruning-for-best-speedaccuracy-1

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 77 papers