Yuchen Zhang
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads
arXiv 2026
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
arXiv 2025
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
arXiv 2025
PaSa: An LLM Agent for Comprehensive Academic Paper Search
arXiv 2025
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis
arXiv 2025
UFM: A Simple Path towards Unified Dense Correspondence with Flow
arXiv 2025
xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations
arXiv 2025
SSRL: Self-Search Reinforcement Learning
arXiv 2025
MiMo-Embodied: X-Embodied Foundation Model Technical Report
arXiv 2025
A Survey of Reinforcement Learning for Large Reasoning Models
arXiv 2025
Target-Bench: Can World Models Achieve Mapless Path Planning with Semantic Targets?
arXiv 2025
P1: Mastering Physics Olympiads with Reinforcement Learning
arXiv 2025
Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding
arXiv 2025
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones
arXiv 2025
Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation
arXiv 2025
Any4D: Unified Feed-Forward Metric 4D Reconstruction
arXiv 2025
Are AI-Generated Text Detectors Robust to Adversarial Perturbations?
arXiv 2024
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification
arXiv 2024
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
arXiv 2024
A Benchmark Dataset for Multimodal Prediction of Enzymatic Function Coupling DNA Sequences and Natural Language
arXiv 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
arXiv 2024
AGILE: A Novel Reinforcement Learning Framework of LLM Agents
arXiv 2024
Llama 2: Open Foundation and Fine-Tuned Chat Models
arXiv 2023
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
arXiv 2023
Affiliations
Frequent co-authors
10from 24 papers