Peng Wang
- Papers
- 47
Cite
Notes
Only stored in your browser.
Authored papers
47AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
arXiv 2026
CodePercept: Code-Grounded Visual STEM Perception for MLLMs
arXiv 2026
From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning
arXiv 2026
HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam
arXiv 2026
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
arXiv 2026
Qwen-Image Technical Report
arXiv 2025
Qwen3-Omni Technical Report
arXiv 2025
Qwen2.5-VL Technical Report
arXiv 2025
Qwen3 Technical Report
preprint
Qwen3-VL Technical Report
arXiv 2025
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
arXiv 2025
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
arXiv 2025
MiMo-VL Technical Report
arXiv 2025
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
arXiv 2025
ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions
arXiv 2025
GP-GS: Gaussian Processes for Enhanced Gaussian Splatting
arXiv 2025
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
arXiv 2024
Qwen2 Technical Report
arXiv 2024
Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud
arXiv 2024
Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models
arXiv 2024
Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering
arXiv 2024
LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context
arXiv 2024
TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use
arXiv 2024
VSFormer: Mining Correlations in Flexible View Set for Multi-view 3D Shape Understanding
arXiv 2024
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap
arXiv 2024
Diffusion Models as Optimizers for Efficient Planning in Offline RL
arXiv 2024
BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting
arXiv 2024
GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks
arXiv 2024
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing
arXiv 2024
Autoregressive Pretraining with Mamba in Vision
arXiv 2024
Qwen Technical Report
arXiv 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
arXiv 2023
AerialVLN: Vision-and-Language Navigation for UAVs
ICCV 2023 1
F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories
arXiv 2023
MVDream: Multi-view Diffusion for 3D Generation
arXiv 2023
MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion
mvdiffusion-enabling-holistic-multi-view
AirBirds: A Large-scale Challenging Dataset for Bird Strike Prevention in Real-world Airports
arXiv 2023
PERF: Panoramic Neural Radiance Field from a Single Panorama
arXiv 2023
TouchStone: Evaluating Vision-Language Models by Language Models
arXiv 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
arXiv 2023
Multi-Granularity Prediction for Scene Text Recognition
arXiv 2022
Transferring General Multimodal Pretrained Models to Text Recognition
arXiv 2022
BAD-NeRF: Bundle Adjusted Deblur Neural Radiance Fields
CVPR 2023 1
Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
arXiv 2022
NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction
NeurIPS 2021 12
StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis
stylenerf-a-style-based-3d-aware-generator
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
arXiv 2020
Affiliations
Frequent co-authors
10from 47 papers