Yi Zhang
- Papers
- 32
Cite
Notes
Only stored in your browser.
Authored papers
32VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions
arXiv 2026
Code2Worlds: Empowering Coding LLMs for 4D World Generation
arXiv 2026
LongCat-Flash-Thinking-2601 Technical Report
arXiv 2026
SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering?
arXiv 2026
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
arXiv 2026
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
arXiv 2026
Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models
arXiv 2026
CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion
arXiv 2026
DreamO: A Unified Framework for Image Customization
arXiv 2025
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
arXiv 2025
Quadratic Interest Network for Multimodal Click-Through Rate Prediction
arXiv 2025
Knowing You Don't Know: Learning When to Continue Search in Multi-round RAG through Self-Practicing
arXiv 2025
Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset
arXiv 2025
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis
arXiv 2025
Qwen3Guard Technical Report
arXiv 2025
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation
arXiv 2025
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning
arXiv 2025
GLM-TTS Technical Report
arXiv 2025
Language Representations Can be What Recommenders Need: Findings and Potentials
arXiv 2024
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
arXiv 2024
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
arXiv 2024
Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
CVPR 2024 1
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
arXiv 2024
Right this way: Can VLMs Guide Us to See More to Answer Questions?
arXiv 2024
3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation
ICCV 2023 1
NatCS: Eliciting Natural Customer Support Dialogues
arXiv 2023
PairingNet: A Learning-based Pair-searching and -matching Network for Image Fragments
arXiv 2023
Clinical Prompt Learning with Frozen Language Models
arXiv 2022
Injecting Domain Knowledge in Language Models for Task-Oriented Dialogue Systems
arXiv 2022
The Euclidean Space is Evil: Hyperbolic Attribute Editing for Few-shot Image Generation
ICCV 2023 1
What Makes Convolutional Models Great on Long Sequence Modeling?
arXiv 2022
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System
ACL 2022 5
Affiliations
Frequent co-authors
10from 32 papers