0

Mike Lewis

Papers
25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
25papers

Authored papers

25

FlexOlmo: Open Language Models for Flexible Data Use

arXiv 2025

2025

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

arXiv 2024

2024

Law of the Weakest Link: Cross Capabilities of Large Language Models

arXiv 2024

2024

Byte Latent Transformer: Patches Scale Better Than Tokens

arXiv 2024

2024

Efficient Streaming Language Models with Attention Sinks

arXiv 2023

2023

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

arXiv 2023

2023

Scaling Expert Language Models with Unsupervised Domain Discovery

arXiv 2023

2023

In-context Pretraining: Language Modeling Beyond Document Boundaries

arXiv 2023

2023

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

arXiv 2022

2022

Measuring and Narrowing the Compositionality Gap in Language Models

arXiv 2022

2022

InCoder: A Generative Model for Code Infilling and Synthesis

arXiv 2022

2022

Coder Reviewer Reranking for Code Generation

arXiv 2022

2022

Contrastive Decoding: Open-ended Text Generation as Optimization

arXiv 2022

2022

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

arXiv 2022

2022

Nonparametric Masked Language Modeling

arXiv 2022

2022

Improving Passage Retrieval with Zero-Shot Question Generation

arXiv 2022

2022

Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

arXiv 2022

2022

Questions Are All You Need to Train a Dense Passage Retriever

arXiv 2022

2022

Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

train-short-test-long-attention-with-linear-1

2021

8-bit Optimizers via Block-wise Quantization

8-bit-optimizers-via-block-wise-quantization-1

2021

MetaICL: Learning to Learn In Context

NAACL 2022 7

2021

DEMix Layers: Disentangling Domains for Modular Language Modeling

NAACL 2022 7

2021

Multilingual Denoising Pre-training for Neural Machine Translation

arXiv 2020

2020

Shortformer: Better Language Modeling using Shorter Inputs

ACL 2021 5

2020

Deal or No Deal? End-to-End Learning for Negotiation Dialogues

arXiv 2017

2017

Affiliations

No known affiliations.

Frequent co-authors

10

from 25 papers