Xue Yang
- Papers
- 41
Cite
Notes
Only stored in your browser.
Authored papers
41SkillOpt: Executive Strategy for Self-Evolving Agent Skills
arXiv 2026
From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills
arXiv 2026
SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation
arXiv 2026
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
arXiv 2026
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence
arXiv 2026
RISE-Video: Can Video Generators Decode Implicit World Rules?
arXiv 2026
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
arXiv 2026
PhotoFlow: Agentic 3D Virtual Photography Missions
arXiv 2026
Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models
arXiv 2026
BizGenEval: A Systematic Benchmark for Commercial Visual Content Generation
arXiv 2026
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
arXiv 2026
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation
arXiv 2026
GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing
arXiv 2026
Quantum Kernel Advantage over Classical Collapse in Medical Foundation Model Embeddings
arXiv 2026
DreamWorld: Unified World Modeling in Video Generation
arXiv 2026
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
arXiv 2025
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
arXiv 2025
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
arXiv 2025
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform
arXiv 2025
Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?
ICCV 2025
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
arXiv 2025
Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models
arXiv 2025
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
arXiv 2025
Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding
arXiv 2025
A Simple Aerial Detection Baseline of Multimodal Language Models
arXiv 2025
A Simple Aerial Detection Baseline of Multimodal Language Models
arXiv 2025
Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation
arXiv 2025
SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
arXiv 2025
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
ICCV 2025
Co-Training Vision Language Models for Remote Sensing Multi-task Learning
arXiv 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
arXiv 2025
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
arXiv 2025
Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?
arXiv 2025
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
arXiv 2025
STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery
arXiv 2024
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
arXiv 2024
FLoRA: Low-Rank Core Space for N-dimension
arXiv 2024
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
CVPR 2024 1
ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection
arXiv 2023
PointOBB: Learning Oriented Object Detection via Single Point Supervision
CVPR 2024 1
H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection
arXiv 2022
Affiliations
Frequent co-authors
10from 41 papers