0

Weidi Xie

Papers
47

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
47papers

Authored papers

47

OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams

arXiv 2026

2026

Real-World Point Tracking with Verifier-Guided Pseudo-Labeling

arXiv 2026

2026

SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding

arXiv 2026

2026

Multi-Agent System for Comprehensive Soccer Understanding

arXiv 2025

2025

ChestX-Reasoner: Advancing Radiology Foundation Models with Reasoning through Step-by-Step Verification

arXiv 2025

2025

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

arXiv 2025

2025

Evolving Diagnostic Agents in a Virtual Clinical Environment

arXiv 2025

2025

EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis

arXiv 2025

2025

SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass

arXiv 2025

2025

Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach

arXiv 2025

2025

End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning

arXiv 2025

2025

A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis

arXiv 2024

2024

MatchTime: Towards Automatic Soccer Game Commentary Generation

arXiv 2024

2024

RaTEScore: A Metric for Radiology Report Generation

arXiv 2024

2024

Towards Universal Soccer Video Understanding

CVPR 2025 1

2024

Moving Object Segmentation: All You Need Is SAM (and Flow)

arXiv 2024

2024

LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant

CVPR 2025 1

2024

Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos

arXiv 2024

2024

A Sanity Check for AI-generated Image Detection

arXiv 2024

2024

Towards Evaluating and Building Versatile Large Language Models for Medicine

arXiv 2024

2024

MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities

ICCV 2025

2024

Towards Building Multilingual Language Model for Medicine

arXiv 2024

2024

VISA: Reasoning Video Object Segmentation via Large Language Models

arXiv 2024

2024

Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data

arXiv 2023

2023

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

arXiv 2023

2023

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

arXiv 2023

2023

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

arXiv 2023

2023

OvarNet: Towards Open-vocabulary Object Attribute Recognition

CVPR 2023 1

2023

arXiVeri: Automatic table verification with GPT

arXiv 2023

2023

PMC-LLaMA: Towards Building Open-source Language Models for Medicine

arXiv 2023

2023

Towards Open-Vocabulary Video Instance Segmentation

ICCV 2023 1

2023

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

arXiv 2023

2023

AutoAD: Movie Description in Context

CVPR 2023 1

2023

Grounded Question-Answering in Long Egocentric Videos

CVPR 2024 1

2023

Zero-shot Composed Text-Image Retrieval

arXiv 2023

2023

Joint-Relation Transformer for Multi-Person Motion Prediction

ICCV 2023 1

2023

Boost Video Frame Interpolation via Motion Adaptation

arXiv 2023

2023

ReCo: Retrieve and Co-segment for Zero-shot Transfer

reco-retrieve-and-co-segment-for-zero-shot

2022

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

arXiv 2022

2022

CounTR: Transformer-based Generalised Visual Counting

arXiv 2022

2022

K-Space Transformer for Undersampled MRI Reconstruction

arXiv 2022

2022

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

arXiv 2022

2022

Label, Verify, Correct: A Simple Few Shot Object Detection Method

CVPR 2022 1

2021

Prompting Visual-Language Models for Efficient Video Understanding

arXiv 2021

2021

All you need are a few pixels: semantic segmentation with PixelPick

arXiv 2021

2021

Self-supervised Co-training for Video Representation Learning

NeurIPS 2020 12

2020

VGGSound: A Large-scale Audio-Visual Dataset

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 47 papers