Jinfa Huang
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models
arXiv 2026
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
arXiv 2026
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
arXiv 2025
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
arXiv 2025
A Survey on Latent Reasoning
arXiv 2025
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
arXiv 2024
Autoregressive Models in Vision: A Survey
arXiv 2024
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
CVPR 2025 1
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
arXiv 2024
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
arXiv 2024
Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach
arXiv 2024
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
arXiv 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
arXiv 2024
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
arXiv 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
CVPR 2023 1
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
arXiv 2023
GPT-4V(ision) as A Social Media Analysis Engine
arXiv 2023
Affiliations
Frequent co-authors
10from 17 papers