Jaemin Cho
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24MolmoAct2: Action Reasoning Models for Real-world Deployment
arXiv 2026
EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance
arXiv 2025
WildDet3D: Scaling Promptable 3D Detection in the Wild
arXiv 2026
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
arXiv 2026
PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation
arXiv 2026
V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising
arXiv 2026
AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories
arXiv 2026
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
arXiv 2025
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning
arXiv 2025
RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation
arXiv 2025
One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration
arXiv 2025
CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
ICCV 2025
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
arXiv 2024
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
arXiv 2024
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation
arXiv 2023
Self-Chained Image-Language Model for Video Localization and Question Answering
self-chained-image-language-model-for-video
Hierarchical Video-Moment Retrieval and Step-Captioning
CVPR 2023 1
Fine-grained Image Captioning with CLIP Reward
Findings (NAACL) 2022 7
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
arXiv 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
ICCV 2023 1
TVLT: Textless Vision-Language Transformer
arXiv 2022
Unifying Vision-and-Language Tasks via Text Generation
arXiv 2021
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
CVPR 2022 1
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
NeurIPS 2021 12
Affiliations
Frequent co-authors
10from 24 papers