Cunxiang Wang
- Papers
- 16
Cite
Notes
Only stored in your browser.
Authored papers
16GLM-5: from Vibe Coding to Agentic Engineering
arXiv 2026
Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation
arXiv 2026
IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation
arXiv 2026
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
arXiv 2025
Exploring the Evolution of Physics Cognition in Video Generation: A Survey
arXiv 2025
Deep Research: A Systematic Survey
arXiv 2025
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
arXiv 2025
LongSafety: Evaluating Long-Context Safety of Large Language Models
arXiv 2025
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
arXiv 2024
Knowledge Conflicts for LLMs: A Survey
arXiv 2024
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
arXiv 2024
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
arXiv 2024
A Survey on Evaluation of Large Language Models
arXiv 2023
TRAMS: Training-free Memory Selection for Long-range Language Modeling
arXiv 2023
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
arXiv 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
arXiv 2023
Affiliations
Frequent co-authors
10from 16 papers