Wenwei Zhang
- Papers
- 33
Cite
Notes
Only stored in your browser.
Authored papers
33Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
ICCV 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
arXiv 2025
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
arXiv 2025
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
arXiv 2025
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
arXiv 2025
Pre-Trained Policy Discriminators are General Reward Models
arXiv 2025
Rethinking Verification for LLM Code Generation: From Generation to Testing
arXiv 2025
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
ICCV 2025
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
arXiv 2024
Are Your LLMs Capable of Stable Reasoning?
arXiv 2024
OMG-Seg: Is One Model Good Enough For All Segmentation?
CVPR 2024 1
F-LMM: Grounding Frozen Large Multimodal Models
CVPR 2025 1
Can AI Assistants Know What They Don't Know?
arXiv 2024
CriticEval: Evaluating Large Language Model as Critic
arXiv 2024
4D Contrastive Superflows are Dense 3D Representation Learners
arXiv 2024
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data
arXiv 2024
InternLM-Law: An Open Source Chinese Legal Large Language Model
arXiv 2024
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
arXiv 2024
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
arXiv 2024
Unified Human-Scene Interaction via Prompted Chain-of-Contacts
arXiv 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
arXiv 2023
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
arXiv 2023
Evaluating Hallucinations in Chinese Large Language Models
arXiv 2023
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
CVPR 2023 1
CLIM: Contrastive Language-Image Mosaic for Region Representation
arXiv 2023
Fake Alignment: Are LLMs Really Aligned Well?
arXiv 2023
Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
NeurIPS 2023 11
OV-PARTS: Towards Open-Vocabulary Part Segmentation
NeurIPS 2023 11
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
arXiv 2023
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
arXiv 2023
RTMDet: An Empirical Study of Designing Real-Time Object Detectors
arXiv 2022
Affiliations
Frequent co-authors
10from 33 papers