Peng Xia
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
arXiv 2026
SimpleMem: Efficient Lifelong Memory for LLM Agents
arXiv 2026
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
arXiv 2026
AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios
arXiv 2026
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration
arXiv 2026
ClawArena: Benchmarking AI Agents in Evolving Information Environments
arXiv 2026
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
arXiv 2025
ChemMLLM: Chemical Multimodal Large Language Model
arXiv 2025
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
arXiv 2025
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
arXiv 2025
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
arXiv 2025
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
arXiv 2025
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
arXiv 2025
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
arXiv 2025
Multiplayer Nash Preference Optimization
arXiv 2025
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
arXiv 2024
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
arXiv 2024
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
arXiv 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
arXiv 2024
Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations
arXiv 2024
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
arXiv 2024
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
arXiv 2024
LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
arXiv 2023
HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
hgclip-exploring-vision-language-models-with
Affiliations
Frequent co-authors
10from 24 papers