Junke Wang
- Papers
- 9
Cite
Notes
Only stored in your browser.
Authored papers
9VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
arXiv 2026
Perception Encoder: The best visual embeddings are not at the output of the network
arXiv 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
arXiv 2025
Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning
arXiv 2025
OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation
arXiv 2024
MouSi: Poly-Visual-Expert Vision-Language Models
arXiv 2024
OmniVid: A Generative Framework for Universal Video Understanding
CVPR 2024 1
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
arXiv 2023
M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection
arXiv 2021
Affiliations
Frequent co-authors
10from 9 papers