Wenqi Shao
- Papers
- 36
Cite
Notes
Only stored in your browser.
Authored papers
36AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
arXiv 2026
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
arXiv 2026
RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback
arXiv 2026
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models
arXiv 2025
Enhance-A-Video: Better Generated Video for Free
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
MM-ACT: Learn from Multimodal Parallel Generation to Act
arXiv 2025
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis
arXiv 2025
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
arXiv 2025
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision
arXiv 2025
GMAI-VL-R1: Harnessing Reinforcement Learning for Multimodal Medical Reasoning
arXiv 2025
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
mdk12-bench-a-multi-discipline-benchmark-for
Needle In A Multimodal Haystack
arXiv 2024
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
arXiv 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
arXiv 2024
HiAgent: Hierarchical Working Memory Management for Solving Long-Horizon Agent Tasks with Large Language Model
arXiv 2024
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
CVPR 2025 1
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
CVPR 2024 1
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantization
arXiv 2024
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
arXiv 2024
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
arXiv 2024
HRVMamba: High-Resolution Visual State Space Model for Dense Prediction
arXiv 2024
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
arXiv 2024
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
arXiv 2024
Adapting LLaMA Decoder to Vision Transformer
arXiv 2024
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
arXiv 2024
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
CVPR 2024 1
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
arXiv 2024
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
arXiv 2024
ImageBind-LLM: Multi-modality Instruction Tuning
arXiv 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
arXiv 2023
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
ICCV 2023 1
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
arXiv 2023
MLLMs-Augmented Visual-Language Representation Learning
arXiv 2023
Beyond One-to-One: Rethinking the Referring Image Segmentation
ICCV 2023 1
Affiliations
Frequent co-authors
10from 36 papers