Hangyu Guo
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13STEP3-VL-10B Technical Report
arXiv 2026
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
arXiv 2026
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
arXiv 2026
WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics
arXiv 2026
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents
arXiv 2026
A Comprehensive Survey on Long Context Language Modeling
arXiv 2025
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
arXiv 2025
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
arXiv 2025
OmniBench: Towards The Future of Universal Omni-Language Models
arXiv 2024
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
arXiv 2024
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
physgame-uncovering-physical-commonsense
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
arXiv 2024
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
arXiv 2024
Affiliations
Frequent co-authors
10from 13 papers