0

Wen-tau Yih

Papers
29

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
29papers

Authored papers

29

Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model

arXiv 2026

2026

ReasonIR: Training Retrievers for Reasoning Tasks

arXiv 2025

2025

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

arXiv 2025

2025

Data-Efficient Pretraining with Group-Level Data Influence Modeling

arXiv 2025

2025

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

arXiv 2025

2025

Meta CLIP 2: A Worldwide Scaling Recipe

arXiv 2025

2025

FlexOlmo: Open Language Models for Flexible Data Use

arXiv 2025

2025

DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers

arXiv 2025

2025

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

arXiv 2024

2024

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

arXiv 2024

2024

CRAG -- Comprehensive RAG Benchmark

arXiv 2024

2024

Memory Layers at Scale

arXiv 2024

2024

Instruction-tuned Language Models are Better Knowledge Learners

arXiv 2024

2024

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

arXiv 2024

2024

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

arXiv 2023

2023

LEVER: Learning to Verify Language-to-Code Generation with Execution

arXiv 2023

2023

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering

arXiv 2023

2023

Autoregressive Search Engines: Generating Substrings as Document Identifiers

arXiv 2022

2022

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

arXiv 2022

2022

InCoder: A Generative Model for Code Infilling and Synthesis

arXiv 2022

2022

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

NAACL 2022 7

2022

Nonparametric Masked Language Modeling

arXiv 2022

2022

Task-aware Retrieval with Instructions

arXiv 2022

2022

Improving Passage Retrieval with Zero-Shot Question Generation

arXiv 2022

2022

Coder Reviewer Reranking for Code Generation

arXiv 2022

2022

The Web Is Your Oyster -- Knowledge-Intensive NLP against a Very Large Web Corpus

arXiv 2021

2021

Dense Passage Retrieval for Open-Domain Question Answering

EMNLP 2020 11

2020

TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data

arXiv 2020

2020

Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

ICLR 2021 1

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 29 papers