0

Shilong Liu

Papers
26

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
26papers

Authored papers

26

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

arXiv 2026

2026

Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

arXiv 2025

2025

Web World Models

arXiv 2025

2025

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

arXiv 2025

2025

On Path to Multimodal Historical Reasoning: HistBench and HistAgent

arXiv 2025

2025

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

arXiv 2024

2024

DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding

arXiv 2024

2024

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

arXiv 2024

2024

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

arXiv 2024

2024

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

arXiv 2024

2024

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

arXiv 2024

2024

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

arXiv 2023

2023

detrex: Benchmarking Detection Transformers

arXiv 2023

2023

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

arXiv 2023

2023

A Simple Framework for Open-Vocabulary Segmentation and Detection

ICCV 2023 1

2023

Detection Transformer with Stable Matching

ICCV 2023 1

2023

Visual In-Context Prompting

CVPR 2024 1

2023

Recognize Anything: A Strong Image Tagging Model

arXiv 2023

2023

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

arXiv 2023

2023

Interfacing Foundation Models' Embeddings

arXiv 2023

2023

Neural Interactive Keypoint Detection

ICCV 2023 1

2023

InstructPix2NeRF: Instructed 3D Portrait Editing from a Single Image

arXiv 2023

2023

Semantic-SAM: Segment and Recognize Anything at Any Granularity

arXiv 2023

2023

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

dino-detr-with-improved-denoising-anchor

2022

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

mask-dino-towards-a-unified-transformer-based

2022

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

dab-detr-dynamic-anchor-boxes-are-better

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 26 papers