Shengbang Tong

Papers: 10

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

10papers

Authored papers

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

arXiv 2026

2026

From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models

arXiv 2025

2025

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

arXiv 2025

2025

Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs

arXiv 2025

2025

Diffusion Transformers with Representation Autoencoders

arXiv 2025

2025

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

arXiv 2025

2025

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

arXiv 2024

2024

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

CVPR 2024 1

2024

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

arXiv 2023

2023

Mass-Producing Failures of Multimodal Systems with Language Models

mass-producing-failures-of-multimodal-systems

2023

Affiliations

No known affiliations.

Frequent co-authors

from 10 papers

Saining Xie

4 shared papers

Yann LeCun

VP & Chief AI Scientist

Yi Ma

BoYang Zheng

Ellis Brown

Jihan Yang

Nanye Ma

Rob Fergus

Tianzhe Chu

Yuexiang Zhai