0

Songyang Zhang

Papers
29

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
29papers

Authored papers

29

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

arXiv 2026

2026

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

arXiv 2025

2025

Rectifying LLM Thought from Lens of Optimization

arXiv 2025

2025

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

arXiv 2025

2025

CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

arXiv 2025

2025

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

arXiv 2025

2025

Rethinking Verification for LLM Code Generation: From Generation to Testing

arXiv 2025

2025

PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model

arXiv 2025

2025

Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective

arXiv 2025

2025

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence

arXiv 2025

2025

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

arXiv 2025

2025

Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement

arXiv 2025

2025

NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context?

arXiv 2024

2024

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

arXiv 2024

2024

Are Your LLMs Capable of Stable Reasoning?

arXiv 2024

2024

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks

arXiv 2024

2024

CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

arXiv 2024

2024

InternLM-Law: An Open Source Chinese Legal Large Language Model

arXiv 2024

2024

GTA: A Benchmark for General Tool Agents

arXiv 2024

2024

Adapting LLaMA Decoder to Vision Transformer

arXiv 2024

2024

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

arXiv 2024

2024

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

arXiv 2024

2024

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

arXiv 2024

2024

LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models

arXiv 2024

2024

Improving Pixel-based MIM by Reducing Wasted Modeling Capability

ICCV 2023 1

2023

BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues

arXiv 2023

2023

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

arXiv 2023

2023

Fake Alignment: Are LLMs Really Aligned Well?

arXiv 2023

2023

Expanding Language-Image Pretrained Models for General Video Recognition

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 29 papers