Xiangxiang Chu
- Papers
- 40
Cite
Notes
Only stored in your browser.
Authored papers
40Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation
arXiv 2026
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
arXiv 2026
SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering
arXiv 2026
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
arXiv 2026
Learning Agentic Policy from Action Guidance
arXiv 2026
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing
arXiv 2026
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation
arXiv 2026
Urban Socio-Semantic Segmentation with Vision-Language Reasoning
arXiv 2026
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
arXiv 2026
Elucidating the SNR-t Bias of Diffusion Probabilistic Models
arXiv 2026
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
arXiv 2026
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization
arXiv 2026
MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
arXiv 2026
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models
arXiv 2026
Code2World: A GUI World Model via Renderable Code Generation
arXiv 2026
Omni-WorldBench: Towards a Comprehensive Interaction-Centric Evaluation for World Models
arXiv 2026
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
arXiv 2026
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
arXiv 2025
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
ICCV 2025
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
arXiv 2025
VMBench: A Benchmark for Perception-Aligned Video Motion Generation
ICCV 2025
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation
arXiv 2025
Tree Search for LLM Agent Reinforcement Learning
arXiv 2025
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training
arXiv 2025
From Editor to Dense Geometry Estimator
arXiv 2025
Omni-Effects: Unified and Spatially-Controllable Visual Effects Generation
arXiv 2025
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
arXiv 2025
ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints
arXiv 2025
S$^2$-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models
arXiv 2025
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
arXiv 2024
MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective
arXiv 2024
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection
arXiv 2024
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
arXiv 2024
Adaptive Task Balancing for Visual Instruction Tuning via Inter-Task Contribution and Intra-Task Difficulty
arXiv 2024
Lenna: Language Enhanced Reasoning Detection Assistant
arXiv 2023
YOLOv6 v3.0: A Full-Scale Reloading
arXiv 2023
PromptDet: Towards Open-vocabulary Detection using Uncurated Images
arXiv 2022
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
NeurIPS 2021 12
Conditional Positional Encodings for Vision Transformers
arXiv 2021
MixPath: A Unified Approach for One-shot Neural Architecture Search
ICCV 2023 1
Affiliations
Frequent co-authors
10from 40 papers