Yangqiu Song

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

arXiv 2026

Do Reasoning Models Enhance Embedding Models?

arXiv 2026

NOVA: NOise-aware Verbal Confidence CAlibration for Robust Large Language Models in RAG Systems

arXiv 2026

NGDBench: Towards Neural Graph Data Management

arXiv 2026

From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery

arXiv 2025

MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

arXiv 2025

Enhancing Transformers for Generalizable First-Order Logical Entailment

arXiv 2025

From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents

arXiv 2025

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

arXiv 2025

LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning

arXiv 2025

DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects

arXiv 2024

AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation

arXiv 2024

Persona Knowledge-Aligned Prompt Tuning Method for Online Debate

arXiv 2024

ECon: On the Detection and Resolution of Evidence Conflicts

arXiv 2024

KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection

arXiv 2023

AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph

arXiv 2023

Rethinking Complex Queries on Knowledge Graphs with Neural Link Predictors

arXiv 2023

Multi-step Jailbreaking Privacy Attacks on ChatGPT

arXiv 2023

CKBP v2: Better Annotation and Reasoning for Commonsense Knowledge Base Population

arXiv 2023

CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering

arXiv 2023

StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding

arXiv 2023