Songyang Zhang
- Papers
- 29
Cite
Notes
Only stored in your browser.
Authored papers
29Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
arXiv 2026
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
Rectifying LLM Thought from Lens of Optimization
arXiv 2025
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
arXiv 2025
CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards
arXiv 2025
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
arXiv 2025
Rethinking Verification for LLM Code Generation: From Generation to Testing
arXiv 2025
PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model
arXiv 2025
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
arXiv 2025
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence
arXiv 2025
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
arXiv 2025
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
arXiv 2025
NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context?
arXiv 2024
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
arXiv 2024
Are Your LLMs Capable of Stable Reasoning?
arXiv 2024
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
arXiv 2024
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
arXiv 2024
InternLM-Law: An Open Source Chinese Legal Large Language Model
arXiv 2024
GTA: A Benchmark for General Tool Agents
arXiv 2024
Adapting LLaMA Decoder to Vision Transformer
arXiv 2024
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
arXiv 2024
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
arXiv 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
arXiv 2024
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
arXiv 2024
Improving Pixel-based MIM by Reducing Wasted Modeling Capability
ICCV 2023 1
BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues
arXiv 2023
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
arXiv 2023
Fake Alignment: Are LLMs Really Aligned Well?
arXiv 2023
Expanding Language-Image Pretrained Models for General Video Recognition
arXiv 2022
Affiliations
Frequent co-authors
10from 29 papers