Joseph E. Gonzalez
- Papers
- 36
Cite
Notes
Only stored in your browser.
Authored papers
36K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model
arXiv 2026
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale
arXiv 2026
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
arXiv 2026
Why Do Multi-Agent LLM Systems Fail?
arXiv 2025
Adaptive Semantic Prompt Caching with VectorQ
arXiv 2025
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
arXiv 2025
Search Arena: Analyzing Search-Augmented LLMs
arXiv 2025
Sleep-time Compute: Beyond Inference Scaling at Test-time
arXiv 2025
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
arXiv 2025
FrontierCS: Evolving Challenges for Evolving Intelligence
arXiv 2025
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention
arXiv 2025
ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle
arXiv 2025
S*: Test Time Scaling for Code Generation
arXiv 2025
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
arXiv 2025
RouteLLM: Learning to Route LLMs with Preference Data
arXiv 2024
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
preprint
Post-Training Sparse Attention with Double Sparsity
arXiv 2024
GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications
arXiv 2024
How to Evaluate Reward Models for RLHF
arXiv 2024
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
arXiv 2024
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions
arXiv 2024
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Models
arXiv 2024
Text2SQL is Not Enough: Unifying AI and Databases with TAG
arXiv 2024
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
arXiv 2024
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights
arXiv 2024
LLoCO: Learning Long Contexts Offline
arXiv 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
arXiv 2023
SGLang: Efficient Execution of Structured Language Model Programs
arXiv 2023
Efficient Memory Management for Large Language Model Serving with PagedAttention
arXiv 2023
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
arXiv 2023
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
arXiv 2023
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training
arXiv 2023
The Wisdom of Hindsight Makes Language Models Better Instruction Followers
arXiv 2023
Describing Differences in Image Sets with Natural Language
CVPR 2024 1
Multitask Vision-Language Prompt Tuning
arXiv 2022
SkipNet: Learning Dynamic Routing in Convolutional Networks
skipnet-learning-dynamic-routing-in-1
Affiliations
Frequent co-authors
10from 36 papers
Ion Stoica
professor / co-founder
Trevor Darrell
professor
Lianmin Zheng
grad-student
Tianjun Zhang
researcher
Kurt Keutzer
Shiyi Cao
Wei-Lin Chiang
co-founder / President
Ying Sheng
researcher
Dacheng Li
grad-student
David M. Chan