Yujie Zhong

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

arXiv 2024

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

arXiv 2024

LinVT: Empower Your Image-level Large Language Model to Understand Videos

arXiv 2024

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

arXiv 2024

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

arXiv 2023

TriDet: Temporal Action Detection with Relative Boundary Modeling

CVPR 2023 1

SoccerNet 2023 Challenges Results

arXiv 2023

Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

ICCV 2023 1

SoccerNet 2022 Challenges Results

arXiv 2022

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

arXiv 2022

CounTR: Transformer-based Generalised Visual Counting

arXiv 2022

ReAct: Temporal Action Detection with Relational Queries

arXiv 2022