Ion Stoica

UC Berkeley CS professor; co-founder of Databricks, Anyscale, and LMSYS / LMArena; advisor on the academic side of the Arena infrastructure.

Role: professor / co-founder
Currently at: University of California, Berkeley
Twitter: twitter.com/istoica05
GitHub: github.com/istoica
Scholar: scholar.google.com/citations
Papers: 45

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: scholar.google.com/citations

Attribution policy →

45papers·1eval contribs

Authored papers

45

K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model

arXiv 2026

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

arXiv 2026

ClawEnvKit: Automatic Environment Generation for Claw-Like Agents

arXiv 2026

AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

arXiv 2026

VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents

arXiv 2026

DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis

arXiv 2025

Fast Video Generation with Sliding Tile Attention

arXiv 2025

lmgame-Bench: How Good are LLMs at Playing Games?

arXiv 2025

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

arXiv 2025

Why Do Multi-Agent LLM Systems Fail?

arXiv 2025

S*: Test Time Scaling for Code Generation

arXiv 2025

Prompt-to-Leaderboard

arXiv 2025

Sleep-time Compute: Beyond Inference Scaling at Test-time

arXiv 2025

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

arXiv 2025

FrontierCS: Evolving Challenges for Evolving Intelligence

arXiv 2025

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

arXiv 2025

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

arXiv 2025

Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation

arXiv 2025

Optimizing Model Selection for Compound AI Systems

arXiv 2025

GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents

arXiv 2025

The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

arXiv 2025

Efficient Long-context Language Model Training by Core Attention Disaggregation

arXiv 2025

From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

preprint

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

ICML

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

NeurIPS

RouteLLM: Learning to Route LLMs with Preference Data

arXiv 2024

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

arXiv 2024

GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications

arXiv 2024

JudgeBench: A Benchmark for Evaluating LLM-based Judges

arXiv 2024

Efficient LLM Scheduling by Learning to Rank

arXiv 2024

How to Evaluate Reward Models for RLHF

arXiv 2024

Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

arXiv 2024

Post-Training Sparse Attention with Double Sparsity

arXiv 2024

OR-Bench: An Over-Refusal Benchmark for Large Language Models

arXiv 2024

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

NeurIPS

Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality

blog

SGLang: Efficient Execution of Structured Language Model Programs

arXiv 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention

arXiv 2023

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

arXiv 2023

Online Speculative Decoding

arXiv 2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

arXiv 2023

Rethinking Benchmark and Contamination for Language Models with Rephrased Samples

arXiv 2023

DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

arXiv 2023

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

arXiv 2022

Ray: A Distributed Framework for Emerging AI Applications

arXiv 2017

Eval contributions

1

MT-Bench

LMArena

80 two-turn open-ended questions across 8 categories, graded by GPT-4 as judge to score multi-turn dialogue quality.

SaturatedMulti Turn DialogInstruction FollowingLLM Judging

Affiliations

Currently at

University of California, Berkeley

professor / co-founder · university lab

Previously

Anyscaleinfra Databricksinfra

Frequent co-authors

10

from 45 papers

Joseph E. Gonzalez

20 shared papers

Hao Zhang

professor

11 shared papers

Wei-Lin Chiang

co-founder / President

11 shared papers

Lianmin Zheng

grad-student

9 shared papers

Ying Sheng

researcher

8 shared papers

Shuo Yang

7 shared papers

Dacheng Li

grad-student

6 shared papers

Kurt Keutzer

6 shared papers

Siyuan Zhuang

researcher

5 shared papers

Banghua Zhu

professor

4 shared papers