Chenyan Xiong

Papers: 26

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

26papers

Authored papers

26

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

arXiv 2026

Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes

arXiv 2026

Benchmark Test-Time Scaling of General LLM Agents

arXiv 2026

Craw4LLM: Efficient Web Crawling for LLM Pretraining

arXiv 2025

Data-Efficient Pretraining with Group-Level Data Influence Modeling

arXiv 2025

FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models

arXiv 2025

What Generative Search Engines Like and How to Optimize Web Content Cooperatively

arXiv 2025

RePro: Training Language Models to Faithfully Recycle the Web for Pretraining

arXiv 2025

ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation

arXiv 2025

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

arXiv 2024

Cleaner Pretraining Corpus Curation with Neural Web Scraping

arXiv 2024

Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval

arXiv 2024

MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models

arXiv 2024

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

arXiv 2024

Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression

arXiv 2024

An In-depth Look at Gemini's Language Abilities

arXiv 2023

Fusion-in-T5: Unifying Document Ranking Signals for Improved Information Retrieval

arXiv 2023

Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In

arXiv 2023

Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers

arXiv 2023

COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

arXiv 2022

Reduce Catastrophic Forgetting of Dense Retrieval Training with Teleportation Negatives

arXiv 2022

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

arXiv 2021

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

NeurIPS 2021 12

Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder

arXiv 2021

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

ICLR 2021 1

Selective Weak Supervision for Neural Information Retrieval

arXiv 2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 26 papers

Zhiyuan Liu

professor

7 shared papers

Zichun Yu

7 shared papers

Arnold Overwijk

5 shared papers

Zhenghao Liu

5 shared papers

Paul Bennett

4 shared papers

Shi Yu

4 shared papers

Ge Yu

3 shared papers

Hao Kang

3 shared papers

Xiaochuan Li

researcher

3 shared papers

Yukun Yan

3 shared papers