0

Zheng Ge

Papers
27

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
27papers

Authored papers

27

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

arXiv 2026

2026

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

arXiv 2026

2026

WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics

arXiv 2026

2026

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

arXiv 2026

2026

GEBench: Benchmarking Image Generation Models as GUI Environments

arXiv 2026

2026

STEP3-VL-10B Technical Report

arXiv 2026

2026

Step1X-Edit: A Practical Framework for General Image Editing

arXiv 2025

2025

Unhackable Temporal Rewarding for Scalable Video MLLMs

arXiv 2025

2025

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

arXiv 2025

2025

Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning

arXiv 2025

2025

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

arXiv 2025

2025

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

arXiv 2025

2025

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

arXiv 2025

2025

M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?

arXiv 2025

2025

Step-GUI Technical Report

arXiv 2025

2025

Perception-R1: Pioneering Perception Policy with Reinforcement Learning

arXiv 2025

2025

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

arXiv 2024

2024

Slow Perception: Let's Perceive Geometric Figures Step-by-step

arXiv 2024

2024

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

arXiv 2024

2024

Reconstructive Visual Instruction Tuning

arXiv 2024

2024

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

arXiv 2024

2024

DreamLLM: Synergistic Multimodal Comprehension and Creation

arXiv 2023

2023

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

arXiv 2023

2023

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

arXiv 2023

2023

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

arXiv 2022

2022

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception

ICCV 2023 1

2022

YOLOX: Exceeding YOLO Series in 2021

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 27 papers