0

Tong Zhang

Papers
52

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
52papers

Authored papers

52

Code as Agent Harness

arXiv 2026

2026

Channel-wise Vector Quantization

arXiv 2026

2026

Orchard: An Open-Source Agentic Modeling Framework

arXiv 2026

2026

Recursive Multi-Agent Systems

arXiv 2026

2026

AgentSPEX: An Agent SPecification and EXecution Language

arXiv 2026

2026

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

arXiv 2026

2026

PRL: Process Reward Learning Improves LLMs' Reasoning Ability and Broadens the Reasoning Boundary

arXiv 2026

2026

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

arXiv 2025

2025

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

arXiv 2025

2025

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

arXiv 2025

2025

GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

arXiv 2025

2025

RM-R1: Reward Modeling as Reasoning

arXiv 2025

2025

MMHCL: Multi-Modal Hypergraph Contrastive Learning for Recommendation

arXiv 2025

2025

Monte Carlo Diffusion for Generalizable Learning-Based RANSAC

arXiv 2025

2025

LongCat-Video Technical Report

arXiv 2025

2025

Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models

arXiv 2025

2025

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

arXiv 2025

2025

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

arXiv 2025

2025

Self-rewarding correction for mathematical reasoning

arXiv 2025

2025

MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving

arXiv 2025

2025

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

arXiv 2025

2025

Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training

arXiv 2025

2025

Adapt-Pruner: Adaptive Structural Pruning for Efficient Small Language Model Training

arXiv 2025

2025

Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis

ICCV 2025

2024

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

arXiv 2024

2024

Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions

arXiv 2024

2024

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

arXiv 2024

2024

Coherent and Multi-modality Image Inpainting via Latent Space Optimization

arXiv 2024

2024

Personalized Visual Instruction Tuning

arXiv 2024

2024

Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation

arXiv 2024

2024

MatchDiffusion: Training-free Generation of Match-cuts

ICCV 2025

2024

Scaling Mesh Generation via Compressive Tokenization

CVPR 2025 1

2024

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

arXiv 2024

2024

Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs

arXiv 2024

2024

SINDER: Repairing the Singular Defects of DINOv2

arXiv 2024

2024

Entropy-Regularized Process Reward Model

arXiv 2024

2024

TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data

arXiv 2024

2024

Active Prompting with Chain-of-Thought for Large Language Models

arXiv 2023

2023

RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models

arXiv 2023

2023

CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer

arXiv 2023

2023

R-Tuning: Instructing Large Language Models to Say `I Don't Know'

arXiv 2023

2023

Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models Memories

arXiv 2023

2023

Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data

arXiv 2023

2023

What is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL?

arXiv 2023

2023

TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction

arXiv 2023

2023

Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

arXiv 2023

2023

Mitigating the Alignment Tax of RLHF

arXiv 2023

2023

Plum: Prompt Learning using Metaheuristic

arXiv 2023

2023

VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction

CVPR 2023 1

2022

Involution: Inverting the Inherence of Convolution for Visual Recognition

CVPR 2021 1

2021

ZEN 2.0: Continue Training and Adaption for N-gram Enhanced Text Encoders

arXiv 2021

2021

Weakly Supervised Disentangled Generative Causal Representation Learning

disentangled-generative-causal-representation

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 52 papers