0

Yi Wang

Papers
61

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
61papers

Authored papers

61

Qwen-Image-VAE-2.0 Technical Report

arXiv 2026

2026

Urban Socio-Semantic Segmentation with Vision-Language Reasoning

arXiv 2026

2026

RIVER: A Real-Time Interaction Benchmark for Video LLMs

arXiv 2026

2026

AcademiClaw: When Students Set Challenges for AI Agents

arXiv 2026

2026

LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics

arXiv 2026

2026

Code2World: A GUI World Model via Renderable Code Generation

arXiv 2026

2026

Qwen-Image Technical Report

arXiv 2025

2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

arXiv 2025

2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

arXiv 2025

2025

VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning

arXiv 2025

2025

Make Your Training Flexible: Towards Deployment-Efficient Video Models

ICCV 2025

2025

ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

arXiv 2025

2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

arXiv 2025

2025

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

arXiv 2025

2025

VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos

ICCV 2025

2025

ExpVid: A Benchmark for Experiment Video Understanding & Reasoning

arXiv 2025

2025

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

arXiv 2025

2025

EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization

arXiv 2025

2025

Synthetic Generation and Latent Projection Denoising of Rim Lesions in Multiple Sclerosis

arXiv 2025

2025

Towards a Unified Copernicus Foundation Model for Earth Vision

ICCV 2025

2025

PATS: Process-Level Adaptive Thinking Mode Switching

arXiv 2025

2025

GeoLangBind: Unifying Earth Observation with Agglomerative Vision-Language Foundation Models

arXiv 2025

2025

Syn-GRPO: Self-Evolving Data Synthesis for MLLM Perception Reasoning

arXiv 2025

2025

VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs

arXiv 2025

2025

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

arXiv 2024

2024

VideoMamba: State Space Model for Efficient Video Understanding

arXiv 2024

2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM

arXiv 2024

2024

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

arXiv 2024

2024

Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation

arXiv 2024

2024

Internal Consistency and Self-Feedback in Large Language Models: A Survey

arXiv 2024

2024

ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation

arXiv 2024

2024

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

CVPR 2025 1

2024

CaRe-Ego: Contact-aware Relationship Modeling for Egocentric Interactive Hand-object Segmentation

arXiv 2024

2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

arXiv 2024

2024

Recurrent Drafter for Fast Speculative Decoding in Large Language Models

arXiv 2024

2024

Explaining Time Series via Contrastive and Locally Sparse Perturbations

arXiv 2024

2024

SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

arXiv 2024

2024

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

arXiv 2024

2024

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

arXiv 2024

2024

Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining

arXiv 2024

2024

XTRUST: On the Multilingual Trustworthiness of Large Language Models

arXiv 2024

2024

Tracking the Feature Dynamics in LLM Training: A Mechanistic Study

arXiv 2024

2024

SSL4EO-L: Datasets and Foundation Models for Landsat Imagery

ssl4eo-l-datasets-and-foundation-models-for

2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

CVPR 2023 1

2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

CVPR 2024 1

2023

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

arXiv 2023

2023

NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer

CVPR 2023 1

2023

SimPLe: Similarity-Aware Propagation Learning for Weakly-Supervised Breast Cancer Segmentation in DCE-MRI

arXiv 2023

2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

ICCV 2023 1

2023

Scaling Data Generation in Vision-and-Language Navigation

ICCV 2023 1

2023

Decoupling Common and Unique Representations for Multimodal Self-supervised Learning

arXiv 2023

2023

Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts

ICCV 2023 1

2023

GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data

arXiv 2023

2023

Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing

arXiv 2023

2023

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

CVPR 2022 1

2022

NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with 360° Views

arXiv 2022

2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

arXiv 2022

2022

SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation

arXiv 2022

2022

VCNet: A Robust Approach to Blind Image Inpainting

ECCV 2020 8

2020

Image Inpainting via Generative Multi-column Convolutional Neural Networks

image-inpainting-via-generative-multi-column-1

2018

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

arXiv 2015

2015

Affiliations

No known affiliations.

Frequent co-authors

10

from 61 papers