0

Siyuan Li

Papers
37

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
37papers

Authored papers

37

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

arXiv 2026

2026

PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

arXiv 2026

2026

Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

arXiv 2026

2026

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

arXiv 2026

2026

The Trinity of Consistency as a Defining Principle for General World Models

arXiv 2026

2026

RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

arXiv 2026

2026

PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control

arXiv 2026

2026

UniK3D: Universal Camera Monocular 3D Estimation

CVPR 2025 1

2025

MiniCPM4: Ultra-Efficient LLMs on End Devices

arXiv 2025

2025

UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler

arXiv 2025

2025

Skywork Open Reasoner 1 Technical Report

arXiv 2025

2025

Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation

CVPR 2025 1

2025

One2Any: One-Reference 6D Pose Estimation for Any Object

CVPR 2025 1

2025

SAM 3: Segment Anything with Concepts

arXiv 2025

2025

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

arXiv 2025

2025

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

arXiv 2025

2025

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

arXiv 2025

2025

Taming LLMs by Scaling Learning Rates with Gradient Grouping

arXiv 2025

2025

GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models

arXiv 2025

2025

Multi-View 3D Point Tracking

ICCV 2025

2025

MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

CVPR 2025 1

2025

Enhancing Image Generation Fidelity via Progressive Prompts

arXiv 2025

2025

Matching Anything by Segmenting Anything

CVPR 2024 1

2024

A Survey on Mixup Augmentations and Beyond

arXiv 2024

2024

SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking

arXiv 2024

2024

Switch EMA: A Free Lunch for Better Flatness and Sharpness

arXiv 2024

2024

Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

arXiv 2024

2024

Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

arXiv 2024

2024

OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning

openstl-a-comprehensive-benchmark-of-spatio

2023

InstructUIE: Multi-task Instruction Tuning for Unified Information Extraction

arXiv 2023

2023

Cascade-DETR: Delving into High-Quality Universal Object Detection

ICCV 2023 1

2023

SemiReward: A General Reward Model for Semi-supervised Learning

arXiv 2023

2023

RDesign: Hierarchical Data-efficient Representation Learning for Tertiary Structure-based RNA Design

arXiv 2023

2023

Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training

arXiv 2023

2023

Behavior Contrastive Learning for Unsupervised Skill Discovery

arXiv 2023

2023

SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning

arXiv 2022

2022

UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection

CVPR 2022 1

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 37 papers