Junchi Yan
- Papers
- 50
Cite
Notes
Only stored in your browser.
Authored papers
50TodoEvolve: Learning to Architect Agent Planning Systems
arXiv 2026
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation
arXiv 2026
GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing
arXiv 2026
FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach
arXiv 2026
MemOS: A Memory OS for AI System
arXiv 2025
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
arXiv 2025
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing
arXiv 2025
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform
arXiv 2025
LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding
arXiv 2025
EtCon: Edit-then-Consolidate for Reliable Knowledge Editing
arXiv 2025
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
arXiv 2025
ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
arXiv 2025
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
arXiv 2025
Rethinking Video Tokenization: A Conditioned Diffusion-based Approach
arXiv 2025
When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms
arXiv 2025
Co-Training Vision Language Models for Remote Sensing Multi-task Learning
arXiv 2025
TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving
arXiv 2025
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence
arXiv 2025
Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views
ICCV 2025
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
arXiv 2025
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
arXiv 2024
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
arXiv 2024
Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
arXiv 2024
PreRoutGNN for Timing Prediction with Order Preserving Partition: Global Circuit Pre-training, Local Delay Learning and Attentional Cell Modeling
arXiv 2024
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
arXiv 2024
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
arXiv 2024
Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation
arXiv 2024
STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery
arXiv 2024
Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach
arXiv 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
arXiv 2024
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments
CVPR 2025 1
TerDiT: Ternary Diffusion Models with Transformers
arXiv 2024
Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation
arXiv 2024
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
ICCV 2025
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
arXiv 2024
Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives
arXiv 2024
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping
openlane-v2-a-topology-reasoning-benchmark
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
CVPR 2024 1
ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
arXiv 2023
Graph-based Topology Reasoning for Driving Scenes
arXiv 2023
LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving
arXiv 2023
ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection
arXiv 2023
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
arXiv 2023
Geometric-aware Pretraining for Vision-centric 3D Object Detection
arXiv 2023
DriveAdapter: Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving
ICCV 2023 1
PointOBB: Learning Oriented Object Detection via Single Point Supervision
CVPR 2024 1
Transformers in Time Series: A Survey
arXiv 2022
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark
persformer-3d-lane-detection-via-perspective
H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection
arXiv 2022
GFTE: Graph-based Financial Table Extraction
arXiv 2020
Affiliations
Frequent co-authors
10from 50 papers