0

Tao Zhang

Papers
37

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
37papers

Authored papers

37

SAMTok: Representing Any Mask with Two Words

arXiv 2026

2026

SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training

arXiv 2026

2026

SWE-World: Building Software Engineering Agents in Docker-Free Environments

arXiv 2026

2026

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

arXiv 2025

2025

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

arXiv 2025

2025

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

arXiv 2025

2025

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

arXiv 2025

2025

HunyuanImage 3.0 Technical Report

arXiv 2025

2025

DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World

arXiv 2025

2025

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

arXiv 2025

2025

Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

arXiv 2025

2025

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

arXiv 2025

2025

Baichuan-Omni-1.5 Technical Report

arXiv 2025

2025

On Path to Multimodal Generalist: General-Level and General-Bench

arXiv 2025

2025

IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting

arXiv 2025

2025

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

arXiv 2025

2025

S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models

arXiv 2025

2025

Native Hybrid Attention for Efficient Sequence Modeling

arXiv 2025

2025

Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction

arXiv 2025

2025

UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation

arXiv 2025

2025

Ocean-OCR: Towards General OCR Application via a Vision-Language Model

arXiv 2025

2025

Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs

ICCV 2025

2025

An Empirical Study of GPT-4o Image Generation Capabilities

arXiv 2025

2025

Wavelet Diffusion Neural Operator

arXiv 2024

2024

CFBench: A Comprehensive Constraints-Following Benchmark for LLMs

arXiv 2024

2024

Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

arXiv 2024

2024

TableGPT2: A Large Multimodal Model with Tabular Data Integration

arXiv 2024

2024

Baichuan-Omni Technical Report

arXiv 2024

2024

DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries

arXiv 2024

2024

Generative Regression Based Watch Time Prediction for Short-Video Recommendation

arXiv 2024

2024

Point Cloud Mamba: Point Cloud Learning via State Space Model

arXiv 2024

2024

Compositional Generative Inverse Design

arXiv 2024

2024

SysBench: Can Large Language Models Follow System Messages?

arXiv 2024

2024

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

arXiv 2024

2024

Baichuan 2: Open Large-scale Language Models

arXiv 2023

2023

DVIS: Decoupled Video Instance Segmentation Framework

ICCV 2023 1

2023

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

arXiv 2023

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 37 papers