Wangchunshu Zhou
- Papers
- 49
Cite
Notes
Only stored in your browser.
Authored papers
49EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies
arXiv 2026
Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization
arXiv 2026
MemEvolve: Meta-Evolution of Agent Memory Systems
arXiv 2025
TaskCraft: Automated Generation of Agentic Tasks
arXiv 2025
A Comprehensive Survey on Long Context Language Modeling
arXiv 2025
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
arXiv 2025
A Survey on Latent Reasoning
arXiv 2025
How Far Are We from Genuinely Useful Deep Research Agents?
arXiv 2025
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
arXiv 2025
Efficient Agents: Building Effective Agents While Reducing Cost
arXiv 2025
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
arXiv 2025
Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
arXiv 2025
A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning
arXiv 2025
Reverse-Engineered Reasoning for Open-Ended Generation
arXiv 2025
M+: Extending MemoryLLM with Scalable Long-Term Memory
arXiv 2025
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
arXiv 2025
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values
arXiv 2025
Towards Personalized Deep Research: Benchmarks and Evaluations
arXiv 2025
Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning
arXiv 2025
Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures
arXiv 2025
VeriGUI: Verifiable Long-Chain GUI Dataset
arXiv 2025
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
arXiv 2025
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
arXiv 2025
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
arXiv 2025
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
arXiv 2025
COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes
arXiv 2025
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
arXiv 2025
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding
arXiv 2025
Symbolic Learning Enables Self-Evolving Agents
arXiv 2024
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series
arXiv 2024
AI PERSONA: Towards Life-long Personalization of LLMs
arXiv 2024
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
arXiv 2024
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model
arXiv 2024
MIO: A Foundation Model on Multimodal Tokens
arXiv 2024
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
arXiv 2024
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning
arXiv 2024
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
arXiv 2024
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
arXiv 2023
RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text
arXiv 2023
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
arXiv 2023
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
arXiv 2023
RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
arXiv 2023
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
arXiv 2023
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
arXiv 2023
Mixup Your Own Pairs
arXiv 2023
Evaluating Large Language Models on Controlled Generation Tasks
arXiv 2023
Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
EMNLP 2021 11
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
EMNLP 2020 11
BERT Loses Patience: Fast and Robust Inference with Early Exit
NeurIPS 2020 12
Affiliations
Frequent co-authors
10from 49 papers