Zihan Wang

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

arXiv 2026

EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding

arXiv 2026

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

arXiv 2026

Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?

arXiv 2026

SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature

arXiv 2026

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

arXiv 2026

Kimi K2.5: Visual Agentic Intelligence

arXiv 2026

RAGEN-2: Reasoning Collapse in Agentic RL

arXiv 2026

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

arXiv 2025

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

arXiv 2025

Re-thinking Temporal Search for Long-Form Video Understanding

CVPR 2025 1

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

arXiv 2025

NavRAG: Generating User Demand Instructions for Embodied Navigation through Retrieval-Augmented LLM

arXiv 2025

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

arXiv 2025

CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives

arXiv 2025

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

arXiv 2025

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

arXiv 2025

Spatial Mental Modeling from Limited Views

arXiv 2025

Technical Report of TeleChat2, TeleChat2.5 and T1

arXiv 2025

NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation

arXiv 2025

A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning

arXiv 2025

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

arXiv 2025

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models

arXiv 2025

GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities

arXiv 2025

InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning

arXiv 2025

FullStack Bench: Evaluating LLMs as Full Stack Coders

arXiv 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

arXiv 2024

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

arXiv 2024

g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks

CVPR 2025 1

Is Mamba Effective for Time Series Forecasting?

arXiv 2024

Evaluating the Smooth Control of Attribute Intensity in Text Generation with LLMs

arXiv 2024

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

arXiv 2024

CogVLM2: Visual Language Models for Image and Video Understanding

arXiv 2024

SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models

arXiv 2024

Answer is All You Need: Instruction-following Text Embedding via Answering the Question

arXiv 2024

ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

arXiv 2024

BPKD: Boundary Privileged Knowledge Distillation For Semantic Segmentation

arXiv 2023

RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit

arXiv 2023

EmojiLM: Modeling the New Emoji Language

arXiv 2023

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X

arXiv 2023

CogAgent: A Visual Language Model for GUI Agents

CVPR 2024 1

ClusterLLM: Large Language Models as a Guide for Text Clustering

arXiv 2023

GridMM: Grid Memory Map for Vision-and-Language Navigation

ICCV 2023 1

Guiding Pretraining in Reinforcement Learning with Large Language Models

arXiv 2023

Goal-Driven Explainable Clustering via Language Descriptions

arXiv 2023