Chenyan Xiong
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks
arXiv 2026
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes
arXiv 2026
Benchmark Test-Time Scaling of General LLM Agents
arXiv 2026
Craw4LLM: Efficient Web Crawling for LLM Pretraining
arXiv 2025
Data-Efficient Pretraining with Group-Level Data Influence Modeling
arXiv 2025
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models
arXiv 2025
What Generative Search Engines Like and How to Optimize Web Content Cooperatively
arXiv 2025
RePro: Training Language Models to Faithfully Recycle the Web for Pretraining
arXiv 2025
PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning
arXiv 2025
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels
arXiv 2024
Cleaner Pretraining Corpus Curation with Neural Web Scraping
arXiv 2024
Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression
arXiv 2024
Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval
arXiv 2024
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models
arXiv 2024
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
arXiv 2024
Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In
arXiv 2023
Fusion-in-T5: Unifying Document Ranking Signals for Improved Information Retrieval
arXiv 2023
An In-depth Look at Gemini's Language Abilities
arXiv 2023
Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers
arXiv 2023
Reduce Catastrophic Forgetting of Dense Retrieval Training with Teleportation Negatives
arXiv 2022
COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning
arXiv 2022
COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
NeurIPS 2021 12
Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder
arXiv 2021
Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback
arXiv 2021
Selective Weak Supervision for Neural Information Retrieval
arXiv 2020
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
ICLR 2021 1
Affiliations
Frequent co-authors
10from 26 papers