0

Donglin Wang

Papers
17

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
17papers

Authored papers

17

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment

arXiv 2026

2026

MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

arXiv 2026

2026

Text-Only Data Synthesis for Vision Language Model Training

arXiv 2025

2026

Exploring the Evolution of Physics Cognition in Video Generation: A Survey

arXiv 2025

2025

OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

arXiv 2025

2025

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

arXiv 2025

2025

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

arXiv 2025

2025

VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model

arXiv 2025

2025

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

arXiv 2025

2025

Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process

arXiv 2025

2025

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

arXiv 2025

2025

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

arXiv 2024

2024

PiTe: Pixel-Temporal Alignment for Large Video-Language Model

arXiv 2024

2024

Multi-Level Correlation Network For Few-Shot Image Classification

arXiv 2024

2024

Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

CVPR 2024 1

2023

Beyond Reward: Offline Preference-guided Policy Optimization

arXiv 2023

2023

VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval

CVPR 2023 1

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 17 papers