0

Yu Zhou

Papers
29

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
29papers

Authored papers

29

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

arXiv 2026

2026

Semantic Audio-Visual Navigation in Continuous Environments

arXiv 2026

2026

STEP3-VL-10B Technical Report

arXiv 2026

2026

Visual Text Processing: A Comprehensive Review and Unified Evaluation

arXiv 2025

2025

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

arXiv 2025

2025

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

arXiv 2025

2025

HunyuanOCR Technical Report

arXiv 2025

2025

ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction

arXiv 2025

2025

SAM 3: Segment Anything with Concepts

arXiv 2025

2025

MuSc-V2: Zero-Shot Multimodal Industrial Anomaly Classification and Segmentation with Mutual Scoring of Unlabeled Samples

arXiv 2025

2025

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

arXiv 2025

2025

Step-Audio 2 Technical Report

arXiv 2025

2025

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

arXiv 2025

2025

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

arXiv 2025

2025

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

arXiv 2025

2025

VidText: Towards Comprehensive Evaluation for Video Text Understanding

arXiv 2025

2025

DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation

arXiv 2025

2025

MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images

arXiv 2024

2024

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control

arXiv 2024

2024

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

arXiv 2024

2024

SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning

ICCV 2025

2024

AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios

CVPR 2025 1

2024

Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues

arXiv 2024

2024

Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval

arXiv 2024

2024

Toward Real Text Manipulation Detection: New Dataset and New Solution

arXiv 2023

2023

UATVR: Uncertainty-Adaptive Text-Video Retrieval

ICCV 2023 1

2023

Non-Sequential Graph Script Induction via Multimedia Grounding

arXiv 2023

2023

Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge

arXiv 2023

2023

Weakly Supervised Semantic Segmentation via Progressive Patch Learning

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 29 papers