0

ran Xu

Papers
25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
25papers

Authored papers

25

AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery

arXiv 2026

2026

Future Optical Flow Prediction Improves Robot Control & Video Generation

arXiv 2026

2026

VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation

arXiv 2026

2026

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

arXiv 2025

2025

MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale

arXiv 2025

2025

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

arXiv 2025

2025

Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration

arXiv 2025

2025

Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs

arXiv 2025

2025

GTA1: GUI Test-time Scaling Agent

arXiv 2025

2025

UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG

arXiv 2025

2025

AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

arXiv 2025

2025

CoDA: Coding LM via Diffusion Adaptation

arXiv 2025

2025

TrustLLM: Trustworthiness in Large Language Models

arXiv 2024

2024

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

arXiv 2024

2024

FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

arXiv 2024

2024

BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers

arXiv 2024

2024

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

arXiv 2024

2024

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

arXiv 2024

2024

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

CVPR 2024 1

2023

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

unicontrol-a-unified-diffusion-model-for

2023

Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models

arXiv 2023

2023

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

arXiv 2023

2023

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

arXiv 2023

2023

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer

arXiv 2022

2022

ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 25 papers