Qibin Hou
- Papers
- 32
Cite
Notes
Only stored in your browser.
Authored papers
32InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery
arXiv 2026
See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding
arXiv 2026
Mixture of Style Experts for Diverse Image Stylization
arXiv 2026
Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions
arXiv 2026
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation
arXiv 2026
GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
arXiv 2026
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
arXiv 2026
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
CVPR 2025 1
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
CVPR 2025 1
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
arXiv 2025
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
arXiv 2025
Depth Anything at Any Condition
arXiv 2025
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models
arXiv 2025
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
arXiv 2025
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
arXiv 2025
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding
arXiv 2025
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
arXiv 2024
DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction
ICCV 2025
Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models
arXiv 2024
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
arXiv 2024
SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection
arXiv 2024
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection
arXiv 2023
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
arXiv 2023
Large Selective Kernel Network for Remote Sensing Object Detection
ICCV 2023 1
ChatAnything: Facetime Chat with LLM-Enhanced Personas
arXiv 2023
SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution
ICCV 2023 1
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
CVPR 2023 1
MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention
arXiv 2023
CrossKD: Cross-Head Knowledge Distillation for Object Detection
CVPR 2024 1
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
CVPR 2024 1
Referring Camouflaged Object Detection
arXiv 2023
Rotate to Attend: Convolutional Triplet Attention Module
arXiv 2020
Affiliations
Frequent co-authors
10from 32 papers