0

Jaemin Cho

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

MolmoAct2: Action Reasoning Models for Real-world Deployment

arXiv 2026

2026

EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance

arXiv 2025

2026

WildDet3D: Scaling Promptable 3D Detection in the Wild

arXiv 2026

2026

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

arXiv 2026

2026

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

arXiv 2026

2026

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising

arXiv 2026

2026

AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

arXiv 2026

2026

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

arXiv 2025

2025

Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning

arXiv 2025

2025

RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation

arXiv 2025

2025

One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration

arXiv 2025

2025

CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

ICCV 2025

2025

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

arXiv 2024

2024

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

arXiv 2024

2024

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation

arXiv 2023

2023

Self-Chained Image-Language Model for Video Localization and Question Answering

self-chained-image-language-model-for-video

2023

Hierarchical Video-Moment Retrieval and Step-Captioning

CVPR 2023 1

2023

Fine-grained Image Captioning with CLIP Reward

Findings (NAACL) 2022 7

2022

LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning

arXiv 2022

2022

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models

ICCV 2023 1

2022

TVLT: Textless Vision-Language Transformer

arXiv 2022

2022

Unifying Vision-and-Language Tasks via Text Generation

arXiv 2021

2021

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks

CVPR 2022 1

2021

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer

NeurIPS 2021 12

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers