Cihang Xie
- Papers
- 34
Cite
Notes
Only stored in your browser.
Authored papers
34AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration
arXiv 2026
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning
arXiv 2026
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild
arXiv 2026
SimpleMem: Efficient Lifelong Memory for LLM Agents
arXiv 2026
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
arXiv 2026
From Seeing to Thinking: Decoupling Perception and Reasoning Improves Post-Training of Vision-Language Models
arXiv 2026
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents
arXiv 2026
In-Context Reinforcement Learning for Tool Use in Large Language Models
arXiv 2026
Target-Oriented Pretraining Data Selection via Neuron-Activated Graph
arXiv 2026
ClawArena: Benchmarking AI Agents in Evolving Information Environments
arXiv 2026
Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows
arXiv 2026
Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
arXiv 2026
VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation
arXiv 2026
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
ICCV 2025
$\texttt{Complex-Edit}$: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark
arXiv 2025
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
arXiv 2025
AHELM: A Holistic Evaluation of Audio-Language Models
arXiv 2025
Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails
arXiv 2025
Safety at Scale: A Comprehensive Survey of Large Model Safety
arXiv 2025
SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards
arXiv 2025
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
arXiv 2025
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning
arXiv 2025
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine
arXiv 2024
What If We Recaption Billions of Web Images with LLaMA-3?
arXiv 2024
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
arXiv 2024
Autoregressive Pretraining with Mamba in Vision
arXiv 2024
M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation
arXiv 2024
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \$10,000 Budget; An Extra \$4,000 Unlocks 81.8% Accuracy
arXiv 2023
Rejuvenating image-GPT as Strong Visual Representation Learners
arXiv 2023
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
arXiv 2023
Unleashing the Power of Visual Prompting At the Pixel Level
arXiv 2022
Masked Autoencoders Enable Efficient Knowledge Distillers
CVPR 2023 1
iBOT: Image BERT Pre-Training with Online Tokenizer
arXiv 2021
Adversarial Attacks and Defences Competition
arXiv 2018
Affiliations
Frequent co-authors
10from 34 papers