Siyuan Li
- Papers
- 37
Cite
Notes
Only stored in your browser.
Authored papers
37NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation
arXiv 2026
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents
arXiv 2026
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception
arXiv 2026
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
arXiv 2026
The Trinity of Consistency as a Defining Principle for General World Models
arXiv 2026
RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution
arXiv 2026
PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control
arXiv 2026
UniK3D: Universal Camera Monocular 3D Estimation
CVPR 2025 1
MiniCPM4: Ultra-Efficient LLMs on End Devices
arXiv 2025
UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler
arXiv 2025
Skywork Open Reasoner 1 Technical Report
arXiv 2025
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
CVPR 2025 1
One2Any: One-Reference 6D Pose Estimation for Any Object
CVPR 2025 1
SAM 3: Segment Anything with Concepts
arXiv 2025
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
arXiv 2025
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning
arXiv 2025
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation
arXiv 2025
Taming LLMs by Scaling Learning Rates with Gradient Grouping
arXiv 2025
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models
arXiv 2025
Multi-View 3D Point Tracking
ICCV 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
CVPR 2025 1
Enhancing Image Generation Fidelity via Progressive Prompts
arXiv 2025
Matching Anything by Segmenting Anything
CVPR 2024 1
A Survey on Mixup Augmentations and Beyond
arXiv 2024
SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking
arXiv 2024
Switch EMA: A Free Lunch for Better Flatness and Sharpness
arXiv 2024
Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions
arXiv 2024
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
arXiv 2024
OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning
openstl-a-comprehensive-benchmark-of-spatio
InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction
arXiv 2023
Cascade-DETR: Delving into High-Quality Universal Object Detection
ICCV 2023 1
SemiReward: A General Reward Model for Semi-supervised Learning
arXiv 2023
RDesign: Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design
arXiv 2023
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
arXiv 2023
Behavior Contrastive Learning for Unsupervised Skill Discovery
arXiv 2023
SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning
arXiv 2022
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
CVPR 2022 1
Affiliations
Frequent co-authors
10from 37 papers