Shuai Bai
- Papers
- 18
Cite
Notes
Only stored in your browser.
Authored papers
18Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking
arXiv 2026
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
arXiv 2026
Qwen-Image Technical Report
arXiv 2025
Qwen3-Omni Technical Report
arXiv 2025
Qwen2.5-VL Technical Report
arXiv 2025
Qwen3-VL Technical Report
arXiv 2025
Soft Adaptive Policy Optimization
arXiv 2025
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
arXiv 2025
Qwen2.5-Omni Technical Report
arXiv 2025
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think
arXiv 2025
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
arXiv 2024
Qwen2 Technical Report
arXiv 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
arXiv 2024
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
arXiv 2024
Qwen Technical Report
arXiv 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
arXiv 2023
TouchStone: Evaluating Vision-Language Models by Language Models
arXiv 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
arXiv 2023
Affiliations
Frequent co-authors
10from 18 papers