0

Chenyan Xiong

Papers
26

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
26papers

Authored papers

26

SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks

arXiv 2026

2026

Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes

arXiv 2026

2026

Benchmark Test-Time Scaling of General LLM Agents

arXiv 2026

2026

Craw4LLM: Efficient Web Crawling for LLM Pretraining

arXiv 2025

2025

Data-Efficient Pretraining with Group-Level Data Influence Modeling

arXiv 2025

2025

FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models

arXiv 2025

2025

What Generative Search Engines Like and How to Optimize Web Content Cooperatively

arXiv 2025

2025

RePro: Training Language Models to Faithfully Recycle the Web for Pretraining

arXiv 2025

2025

PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning

arXiv 2025

2025

MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

arXiv 2024

2024

Cleaner Pretraining Corpus Curation with Neural Web Scraping

arXiv 2024

2024

Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression

arXiv 2024

2024

Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval

arXiv 2024

2024

MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models

arXiv 2024

2024

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

arXiv 2024

2024

Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In

arXiv 2023

2023

Fusion-in-T5: Unifying Document Ranking Signals for Improved Information Retrieval

arXiv 2023

2023

An In-depth Look at Gemini's Language Abilities

arXiv 2023

2023

Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers

arXiv 2023

2023

Reduce Catastrophic Forgetting of Dense Retrieval Training with Teleportation Negatives

arXiv 2022

2022

COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

arXiv 2022

2022

COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

NeurIPS 2021 12

2021

Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder

arXiv 2021

2021

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

arXiv 2021

2021

Selective Weak Supervision for Neural Information Retrieval

arXiv 2020

2020

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

ICLR 2021 1

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 26 papers