Philip S. Yu

CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding

arXiv 2026

AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery

arXiv 2026

Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey

arXiv 2026

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

arXiv 2026

A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook

arXiv 2026

When Users Change Their Mind: Evaluating Interruptible Agents in Long-Horizon Web Navigation

arXiv 2026

EpochX: Building the Infrastructure for an Emergent Agent Civilization

arXiv 2026

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

arXiv 2025

Harnessing Multiple Large Language Models: A Survey on LLM Ensemble

arXiv 2025

A Survey on Large Language Model based Human-Agent Systems

arXiv 2025

Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning

arXiv 2025

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

arXiv 2025

From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents

arXiv 2025

A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy

arXiv 2025

Judge Anything: MLLM as a Judge Across Any Modality

arXiv 2025

ProjectTest: A Project-level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms

arXiv 2025

Seeking and Updating with Live Visual Knowledge

arXiv 2025

TestNUC: Enhancing Test-Time Computing Approaches through Neighboring Unlabeled Data Consistency

arXiv 2025

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

arXiv 2025

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

arXiv 2025

One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs

arXiv 2025

Can Multimodal LLMs Perform Time Series Anomaly Detection?

arXiv 2025

TrustLLM: Trustworthiness in Large Language Models

arXiv 2024

Recent Advances of Multimodal Continual Learning: A Comprehensive Survey

arXiv 2024

ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction

arXiv 2024

LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

arXiv 2024

DAMe: Personalized Federated Social Event Detection with Dual Aggregation Mechanism

arXiv 2024

Towards Effective, Efficient and Unsupervised Social Event Detection in the Hyperbolic Space

arXiv 2024

Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

arXiv 2024

FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection

arXiv 2024

Trustworthiness in Retrieval-Augmented Generation Systems: A Survey

arXiv 2024

IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

arXiv 2024