Xiangru Tang
- Papers
- 40
Cite
Notes
Only stored in your browser.
Authored papers
40The Last Human-Written Paper: Agent-Native Research Artifacts
arXiv 2026
EvoClaw: Evaluating AI Agents on Continuous Software Evolution
arXiv 2026
LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning
arXiv 2026
Agentic Reasoning for Large Language Models
arXiv 2026
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
arXiv 2025
LocAgent: Graph-Guided LLM Agents for Code Localization
arXiv 2025
Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards
arXiv 2025
MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale
arXiv 2025
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
arXiv 2025
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
arXiv 2025
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
arXiv 2025
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
CVPR 2025 1
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
arXiv 2025
AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning
preprint
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents
arXiv 2025
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
arXiv 2025
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
arXiv 2025
CellForge: Agentic Design of Virtual Cell Models
arXiv 2025
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
arXiv 2025
Improving Context Fidelity via Native Retrieval-Augmented Reasoning
arXiv 2025
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
arXiv 2025
InteractComp: Evaluating Search Agents With Ambiguous Queries
arXiv 2025
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
arXiv 2024
ChatCell: Facilitating Single-Cell Analysis with Natural Language
arXiv 2024
PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes
arXiv 2024
A Survey of Generative AI for de novo Drug Design: New Frontiers in Molecule and Protein Generation
arXiv 2024
RWKV: Reinventing RNNs for the Transformer Era
arXiv 2023
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
arXiv 2023
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
arXiv 2023
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
arXiv 2023
OctoPack: Instruction Tuning Code Large Language Models
arXiv 2023
Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios
arXiv 2023
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
arXiv 2023
BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models
arXiv 2023
DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents
arXiv 2023
QTSumm: Query-Focused Summarization over Tabular Data
arXiv 2023
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
arXiv 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
arXiv 2023
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
ACL 2022 5
Crosslingual Generalization through Multitask Finetuning
arXiv 2022
Affiliations
Frequent co-authors
10from 40 papers