Ji Zhang
- Papers
- 30
Cite
Notes
Only stored in your browser.
Authored papers
30Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling
arXiv 2026
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
arXiv 2025
WritingBench: A Comprehensive Benchmark for Generative Writing
arXiv 2025
SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models
arXiv 2025
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
arXiv 2025
IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes
arXiv 2025
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
CVPR 2025 1
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
arXiv 2025
A Survey on Efficient Vision-Language-Action Models
arXiv 2025
Enhancing Language Multi-Agent Learning with Multi-Agent Credit Re-Assignment for Interactive Environment Generalization
arXiv 2025
Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
CVPR 2025 1
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
arXiv 2024
Model Composition for Multimodal Large Language Models
arXiv 2024
From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News
arXiv 2024
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
CVPR 2025 1
Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion
arXiv 2024
DISC: Plug-and-Play Decoding Intervention with Similarity of Characters for Chinese Spelling Check
arXiv 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
arXiv 2024
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
arXiv 2024
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation
arXiv 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
arXiv 2023
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
arXiv 2023
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
arXiv 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
arXiv 2023
A Closer Look at Few-shot Classification Again
arXiv 2023
Evaluation and Analysis of Hallucination in Large Vision-Language Models
arXiv 2023
DETA: Denoised Task Adaptation for Few-Shot Learning
ICCV 2023 1
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
arXiv 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
CVPR 2024 1
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
arXiv 2022
Affiliations
Frequent co-authors
10from 30 papers