0

Heng Wang

Papers
30

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
30papers

Authored papers

30

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

arXiv 2026

2026

Adaptation of Agentic AI

arXiv 2025

2025

Kimi-VL Technical Report

arXiv 2025

2025

Cosmos World Foundation Model Platform for Physical AI

arXiv 2025

2025

Step-DeepResearch Technical Report

arXiv 2025

2025

OpenCUA: Open Foundations for Computer-Use Agents

arXiv 2025

2025

The Collapse of Patches

arXiv 2025

2025

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

arXiv 2025

2025

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

arXiv 2025

2025

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

arXiv 2025

2025

BannerAgency: Advertising Banner Design with Multimodal LLM Agents

arXiv 2025

2025

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

arXiv 2025

2025

Reward Shaping to Mitigate Reward Hacking in RLHF

arXiv 2025

2025

Fast Prompt Alignment for Text-to-Image Generation

arXiv 2024

2024

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

arXiv 2024

2024

Autoregressive Pretraining with Mamba in Vision

arXiv 2024

2024

Gotta Hear Them All: Sound Source Aware Vision to Audio Generation

arXiv 2024

2024

DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection

arXiv 2024

2024

Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

arXiv 2024

2024

Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

arXiv 2023

2023

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

arXiv 2023

2023

Can Language Models Solve Graph Problems in Natural Language?

NeurIPS 2023 11

2023

Progressive Volume Distillation with Active Learning for Efficient NeRF Architecture Conversion

arXiv 2023

2023

Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?

ICCV 2023 1

2023

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

arXiv 2023

2023

TwiBot-22: Towards Graph-Based Twitter Bot Detection

arXiv 2022

2022

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

CVPR 2022 1

2022

Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds

arXiv 2022

2022

Is Space-Time Attention All You Need for Video Understanding?

arXiv 2021

2021

A Closer Look at Spatiotemporal Convolutions for Action Recognition

a-closer-look-at-spatiotemporal-convolutions-1

2017

Affiliations

No known affiliations.

Frequent co-authors

10

from 30 papers