Weixiang Yan
- Papers
- 7
Cite
Notes
Only stored in your browser.
7papers
Authored papers
7ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces
arXiv 2026
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
arXiv 2025
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
arXiv 2025
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
arXiv 2024
CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?
arXiv 2024
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
arXiv 2023
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
arXiv 2023
Affiliations
No known affiliations.
Frequent co-authors
10from 7 papers