0

Wenqi Shao

Papers
36

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
36papers

Authored papers

36

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

arXiv 2026

2026

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

arXiv 2026

2026

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

arXiv 2026

2026

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

arXiv 2025

2025

CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

arXiv 2025

2025

Enhance-A-Video: Better Generated Video for Free

arXiv 2025

2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

arXiv 2025

2025

MM-ACT: Learn from Multimodal Parallel Generation to Act

arXiv 2025

2025

OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis

arXiv 2025

2025

UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation

arXiv 2025

2025

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

arXiv 2025

2025

GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning

arXiv 2025

2025

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

mdk12-bench-a-multi-discipline-benchmark-for

2025

Needle In A Multimodal Haystack

arXiv 2024

2024

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

arXiv 2024

2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

arXiv 2024

2024

HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model

arXiv 2024

2024

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

CVPR 2025 1

2024

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

CVPR 2024 1

2024

PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization

arXiv 2024

2024

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

arXiv 2024

2024

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality

arXiv 2024

2024

HRVMamba: High-Resolution Visual State Space Model for Dense Prediction

arXiv 2024

2024

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

arXiv 2024

2024

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

arXiv 2024

2024

Adapting LLaMA Decoder to Vision Transformer

arXiv 2024

2024

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

arXiv 2024

2024

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

CVPR 2024 1

2024

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

arXiv 2024

2024

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping

arXiv 2024

2024

ImageBind-LLM: Multi-modality Instruction Tuning

arXiv 2023

2023

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

arXiv 2023

2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

ICCV 2023 1

2023

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

arXiv 2023

2023

MLLMs-Augmented Visual-Language Representation Learning

arXiv 2023

2023

Beyond One-to-One: Rethinking the Referring Image Segmentation

ICCV 2023 1

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 36 papers