0

Jingdong Wang

Papers
44

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
44papers

Authored papers

44

RefAlign: Representation Alignment for Reference-to-Video Generation

arXiv 2026

2026

SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

arXiv 2026

2026

No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves

arXiv 2025

2025

Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation

arXiv 2025

2025

Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

arXiv 2025

2025

Can Understanding and Generation Truly Benefit Together -- or Just Coexist?

arXiv 2025

2025

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer

CVPR 2025 1

2024

LION: Linear Group RNN for 3D Object Detection in Point Clouds

arXiv 2024

2024

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

arXiv 2024

2024

LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

arXiv 2024

2024

Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

arXiv 2024

2024

Dense Connector for MLLMs

arXiv 2024

2024

MS-DETR: Efficient DETR Training with Mixed Supervision

CVPR 2024 1

2024

MonoFormer: One Transformer for Both Diffusion and Autoregression

arXiv 2024

2024

OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

arXiv 2024

2024

TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

arXiv 2024

2024

Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression

arXiv 2024

2024

Training-Free Unsupervised Prompt for Vision-Language Models

arXiv 2024

2024

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

arXiv 2024

2024

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

arXiv 2024

2024

A Survey of Reasoning with Foundation Models

arXiv 2023

2023

HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception

hap-structure-aware-masked-image-modeling-for

2023

Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement

ICCV 2023 1

2023

StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training

arXiv 2023

2023

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

leveraging-vision-centric-multi-modal

2023

PLIP: Language-Image Pre-training for Person Representation Learning

arXiv 2023

2023

What Can Simple Arithmetic Operations Do for Temporal Modeling?

ICCV 2023 1

2023

UATVR: Uncertainty-Adaptive Text-Video Retrieval

ICCV 2023 1

2023

Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation

ICCV 2023 1

2023

CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation

ICCV 2023 1

2023

Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification

unified-pre-training-with-pseudo-texts-for

2023

IRGen: Generative Modeling for Image Retrieval

arXiv 2023

2023

Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching

ICCV 2023 1

2023

Context Autoencoder for Self-Supervised Representation Learning

arXiv 2022

2022

DaViT: Dual Attention Vision Transformers

arXiv 2022

2022

Few-Shot Font Generation by Learning Fine-Grained Local Styles

CVPR 2022 1

2022

Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment

ICCV 2023 1

2022

Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers

CVPR 2023 1

2022

SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

spann-highly-efficient-billion-scale-1

2021

Conditional DETR for Fast Training Convergence

ICCV 2021 10

2021

Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

CVPR 2021 1

2021

Lite-HRNet: A Lightweight High-Resolution Network

CVPR 2021 1

2021

Deep High-Resolution Representation Learning for Human Pose Estimation

deep-high-resolution-representation-learning-1

2019

Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation

ECCV 2020 8

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 44 papers