0

Wangchunshu Zhou

Papers
49

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
49papers

Authored papers

49

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies

arXiv 2026

2026

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

arXiv 2026

2026

MemEvolve: Meta-Evolution of Agent Memory Systems

arXiv 2025

2025

TaskCraft: Automated Generation of Agentic Tasks

arXiv 2025

2025

A Comprehensive Survey on Long Context Language Modeling

arXiv 2025

2025

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

arXiv 2025

2025

A Survey on Latent Reasoning

arXiv 2025

2025

How Far Are We from Genuinely Useful Deep Research Agents?

arXiv 2025

2025

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

arXiv 2025

2025

Efficient Agents: Building Effective Agents While Reducing Cost

arXiv 2025

2025

MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

arXiv 2025

2025

Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution

arXiv 2025

2025

A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

arXiv 2025

2025

Reverse-Engineered Reasoning for Open-Ended Generation

arXiv 2025

2025

M+: Extending MemoryLLM with Scalable Long-Term Memory

arXiv 2025

2025

ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning

arXiv 2025

2025

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values

arXiv 2025

2025

Towards Personalized Deep Research: Benchmarks and Evaluations

arXiv 2025

2025

Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning

arXiv 2025

2025

Beyond Correctness: Evaluating Subjective Writing Preferences Across Cultures

arXiv 2025

2025

VeriGUI: Verifiable Long-Chain GUI Dataset

arXiv 2025

2025

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use

arXiv 2025

2025

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

arXiv 2025

2025

TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

arXiv 2025

2025

OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

arXiv 2025

2025

COIG-Writer: A High-Quality Dataset for Chinese Creative Writing with Thought Processes

arXiv 2025

2025

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

arXiv 2025

2025

ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding

arXiv 2025

2025

Symbolic Learning Enables Self-Evolving Agents

arXiv 2024

2024

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

arXiv 2024

2024

AI PERSONA: Towards Life-long Personalization of LLMs

arXiv 2024

2024

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

arXiv 2024

2024

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

arXiv 2024

2024

MIO: A Foundation Model on Multimodal Tokens

arXiv 2024

2024

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

arXiv 2024

2024

AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning

arXiv 2024

2024

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

arXiv 2024

2024

ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

arXiv 2023

2023

RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text

arXiv 2023

2023

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

arXiv 2023

2023

Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

arXiv 2023

2023

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

arXiv 2023

2023

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

arXiv 2023

2023

Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

arXiv 2023

2023

Mixup Your Own Pairs

arXiv 2023

2023

Evaluating Large Language Models on Controlled Generation Tasks

arXiv 2023

2023

Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting

EMNLP 2021 11

2021

BERT-of-Theseus: Compressing BERT by Progressive Module Replacing

EMNLP 2020 11

2020

BERT Loses Patience: Fast and Robust Inference with Early Exit

NeurIPS 2020 12

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 49 papers