Feng Zheng
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
arXiv 2025
MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning
arXiv 2025
MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
arXiv 2024
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
CVPR 2025 1
Negative Label Guided OOD Detection with Pretrained Vision-Language Models
arXiv 2024
Tuning-Free Image Customization with Image and Text Guidance
arXiv 2024
Deep Industrial Image Anomaly Detection: A Survey
arXiv 2023
Track Anything: Segment Anything Meets Videos
arXiv 2023
Video Understanding with Large Language Models: A Survey
arXiv 2023
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
ICCV 2023 1
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models
ICCV 2023 1
Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples
ICCV 2023 1
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression
CVPR 2022 1
Affiliations
Frequent co-authors
10from 13 papers