0

Beidi Chen

Papers
36

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
36papers

Authored papers

36

The Last Human-Written Paper: Agent-Native Research Artifacts

arXiv 2026

2026

AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

arXiv 2026

2026

STEM: Scaling Transformers with Embedding Modules

arXiv 2026

2026

Kinetics: Rethinking Test-Time Scaling Laws

arXiv 2025

2025

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

arXiv 2025

2025

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

arXiv 2025

2025

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

arXiv 2025

2025

APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding

arXiv 2025

2025

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

arXiv 2024

2024

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

arXiv 2024

2024

LLM Inference Unveiled: Survey and Roofline Model Insights

arXiv 2024

2024

Memory Mosaics

arXiv 2024

2024

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

arXiv 2024

2024

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

arXiv 2024

2024

MagicPIG: LSH Sampling for Efficient LLM Generation

arXiv 2024

2024

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

arXiv 2024

2024

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

arXiv 2024

2024

SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

arXiv 2024

2024

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

arXiv 2024

2024

Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training

arXiv 2024

2024

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

arXiv 2024

2024

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

arXiv 2024

2024

On the Surprising Effectiveness of Attention Transfer for Vision Transformers

arXiv 2024

2024

Sirius: Contextual Sparsity with Correction for Efficient LLMs

arXiv 2024

2024

LoCoCo: Dropping In Convolutions for Long Context Compression

arXiv 2024

2024

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

arXiv 2024

2024

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

arXiv 2024

2024

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF

arXiv 2024

2024

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

arXiv 2024

2024

Efficient Streaming Language Models with Attention Sinks

arXiv 2023

2023

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

arXiv 2023

2023

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

arXiv 2023

2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

arXiv 2023

2023

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

arXiv 2022

2022

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

pixelated-butterfly-simple-and-efficient

2021

Scatterbrain: Unifying Sparse and Low-rank Attention Approximation

scatterbrain-unifying-sparse-and-low-rank-1

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 36 papers