0

Qibin Hou

Papers
32

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
32papers

Authored papers

32

InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery

arXiv 2026

2026

See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding

arXiv 2026

2026

Mixture of Style Experts for Diverse Image Stylization

arXiv 2026

2026

Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

arXiv 2026

2026

Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation

arXiv 2026

2026

GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics

arXiv 2026

2026

Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought

arXiv 2026

2026

DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation

CVPR 2025 1

2025

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

CVPR 2025 1

2025

Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning

arXiv 2025

2025

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

arXiv 2025

2025

Depth Anything at Any Condition

arXiv 2025

2025

A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models

arXiv 2025

2025

LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

arXiv 2025

2025

TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs

arXiv 2025

2025

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding

arXiv 2025

2025

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

arXiv 2024

2024

DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction

ICCV 2025

2024

Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models

arXiv 2024

2024

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

arXiv 2024

2024

SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection

arXiv 2024

2024

YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection

arXiv 2023

2023

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

arXiv 2023

2023

Large Selective Kernel Network for Remote Sensing Object Detection

ICCV 2023 1

2023

ChatAnything: Facetime Chat with LLM-Enhanced Personas

arXiv 2023

2023

SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution

ICCV 2023 1

2023

AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation

CVPR 2023 1

2023

MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention

arXiv 2023

2023

CrossKD: Cross-Head Knowledge Distillation for Object Detection

CVPR 2024 1

2023

CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation

CVPR 2024 1

2023

Referring Camouflaged Object Detection

arXiv 2023

2023

Rotate to Attend: Convolutional Triplet Attention Module

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 32 papers