Jiankang Deng
- Papers
- 37
Cite
Notes
Only stored in your browser.
Authored papers
37RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO
arXiv 2026
LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence
arXiv 2026
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
arXiv 2026
Do You See What I Am Pointing At? Gesture-Based Egocentric Video Question Answering
arXiv 2026
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm
arXiv 2025
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
arXiv 2025
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
arXiv 2025
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training
arXiv 2025
ForCenNet: Foreground-Centric Network for Document Image Rectification
ICCV 2025
Region-based Cluster Discrimination for Visual Representation Learning
ICCV 2025
"Principal Components" Enable A New Language of Images
ICCV 2025
Chirpy3D: Creative Fine-grained 3D Object Fabrication via Part Sampling
arXiv 2025
Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents
arXiv 2025
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
arXiv 2025
WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild
CVPR 2025 1
Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
arXiv 2024
Multi-label Cluster Discrimination for Visual Representation Learning
arXiv 2024
RWKV-CLIP: A Robust Vision-Language Representation Learner
arXiv 2024
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections
arXiv 2024
Adaptive Parametric Activation
arXiv 2024
Spatio-temporal Prompting Network for Robust Video Feature Extraction
spatio-temporal-prompting-network-for-robust
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
arXiv 2024
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
arXiv 2024
OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies
arXiv 2024
Fractal Calibration for long-tailed object detection
CVPR 2025 1
TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective
ICCV 2023 1
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
CVPR 2024 1
ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
ICCV 2023 1
FitMe: Deep Photorealistic 3D Morphable Model Avatars
CVPR 2023 1
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion
ICCV 2023 1
Perspective Reconstruction of Human Faces by Joint Mesh and Landmark Regression
arXiv 2022
Deep Face Restoration: A Survey
arXiv 2022
3DGazeNet: Generalizing Gaze Estimation with Weak-Supervision from Synthetic Views
arXiv 2022
Domain-General Crowd Counting in Unseen Scenarios
arXiv 2022
Redesigning Multi-Scale Neural Network for Crowd Counting
arXiv 2022
Long-tailed Instance Segmentation using Gumbel Optimized Loss
arXiv 2022
Inverse Image Frequency for Long-tailed Image Recognition
arXiv 2022
Affiliations
Frequent co-authors
10from 37 papers