Joseph E. Gonzalez

S*: Test Time Scaling for Code Generation

arXiv 2025

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

arXiv 2025

Search Arena: Analyzing Search-Augmented LLMs

arXiv 2025

Sleep-time Compute: Beyond Inference Scaling at Test-time

arXiv 2025

TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

arXiv 2025

FrontierCS: Evolving Challenges for Evolving Intelligence

arXiv 2025

ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle

arXiv 2025

The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

arXiv 2025

Adaptive Semantic Prompt Caching with VectorQ

arXiv 2025

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

arXiv 2025

RouteLLM: Learning to Route LLMs with Preference Data

arXiv 2024

From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

preprint

VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models

arXiv 2024

Post-Training Sparse Attention with Double Sparsity

arXiv 2024

GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications

arXiv 2024

Text2SQL is Not Enough: Unifying AI and Databases with TAG

arXiv 2024

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

arXiv 2024

SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights

arXiv 2024

LLoCO: Learning Long Contexts Offline

arXiv 2024

How to Evaluate Reward Models for RLHF

arXiv 2024

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark

arXiv 2024

CLAIR-A: Leveraging Large Language Models to Judge Audio Captions

arXiv 2024

SGLang: Efficient Execution of Structured Language Model Programs

arXiv 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention

arXiv 2023

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

arXiv 2023

Rethinking Benchmark and Contamination for Language Models with Rephrased Samples

arXiv 2023

DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

arXiv 2023

The Wisdom of Hindsight Makes Language Models Better Instruction Followers

arXiv 2023

Describing Differences in Image Sets with Natural Language

CVPR 2024 1

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

arXiv 2023

Multitask Vision-Language Prompt Tuning

arXiv 2022

2022

SkipNet: Learning Dynamic Routing in Convolutional Networks

skipnet-learning-dynamic-routing-in-1

2017

Affiliations

No known affiliations.

Frequent co-authors

from 36 papers

Ion Stoica

professor / co-founder

20 shared papers

Trevor Darrell

professor

8 shared papers

Lianmin Zheng

grad-student

6 shared papers

Tianjun Zhang

researcher

6 shared papers

Kurt Keutzer

Shiyi Cao

Wei-Lin Chiang

co-founder / President

Ying Sheng

researcher