0

Jimmy Lin

Papers
43

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
43papers

Authored papers

43

Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

arXiv 2026

2026

NanoKnow: How to Know What Your Language Model Knows

arXiv 2026

2026

Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient?

arXiv 2026

2026

Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality

arXiv 2025

2025

Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning

arXiv 2025

2025

DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers

arXiv 2025

2025

Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval

arXiv 2025

2025

Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks

arXiv 2025

2025

Conventional Contrastive Learning Often Falls Short: Improving Dense Retrieval with Cross-Encoder Listwise Distillation and Synthetic Data

arXiv 2025

2025

Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation

arXiv 2025

2025

BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

arXiv 2025

2025

Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses

arXiv 2025

2025

CURE: A dataset for Clinical Understanding & Retrieval Evaluation

arXiv 2024

2024

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

arXiv 2024

2024

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

arXiv 2024

2024

UniRAG: Universal Retrieval Augmentation for Large Vision Language Models

arXiv 2024

2024

MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems

arXiv 2024

2024

PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval

arXiv 2024

2024

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

arXiv 2023

2023

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

arXiv 2023

2023

RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!

arXiv 2023

2023

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval

arXiv 2023

2023

"Knowing When You Don't Know": A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation

arXiv 2023

2023

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration

arXiv 2023

2023

Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

arXiv 2023

2023

What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations

arXiv 2023

2023

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

arXiv 2023

2023

Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face

arXiv 2023

2023

What the DAAM: Interpreting Stable Diffusion Using Cross Attention

arXiv 2022

2022

Precise Zero-Shot Dense Retrieval without Relevance Labels

arXiv 2022

2022

Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval

arXiv 2022

2022

Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

arXiv 2022

2022

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

arXiv 2021

2021

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

arXiv 2021

2021

Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval

EMNLP (MRL) 2021 11

2021

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

deebert-dynamic-early-exiting-for-1

2020

Howl: A Deployed, Open-Source Wake Word Detection System

EMNLP (NLPOSS) 2020 11

2020

The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives

arXiv 2020

2020

Showing Your Work Doesn't Always Work

showing-your-work-doesn-t-always-work-1

2020

Inserting Information Bottlenecks for Attribution in Transformers

Findings of the Association for Computational Linguistics 2020

2020

DocBERT: BERT for Document Classification

arXiv 2019

2019

End-to-End Open-Domain Question Answering with BERTserini

end-to-end-open-domain-question-answering-1

2019

Deep Residual Learning for Small-Footprint Keyword Spotting

arXiv 2017

2017

Affiliations

No known affiliations.

Frequent co-authors

10

from 43 papers