Fan Zhang
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation
arXiv 2026
UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models
arXiv 2026
Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles
arXiv 2026
Ebisu: Benchmarking Large Language Models in Japanese Finance
arXiv 2026
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
arXiv 2025
GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing
arXiv 2025
Uniform Discrete Diffusion with Metric Path for Video Generation
arXiv 2025
Emu3.5: Native Multimodal Models are World Learners
arXiv 2025
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
arXiv 2025
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling
arXiv 2025
Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning
arXiv 2025
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
arXiv 2025
Compressing Chain-of-Thought in LLMs via Step Entropy
arXiv 2025
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment
arXiv 2025
Emu3: Next-Token Prediction is All You Need
arXiv 2024
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
arXiv 2024
Affordance-based Robot Manipulation with Flow Matching
arXiv 2024
Diffusion Feedback Helps CLIP See Better
arXiv 2024
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
arXiv 2024
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
arXiv 2024
Generative Multimodal Models are In-Context Learners
CVPR 2024 1
CapsFusion: Rethinking Image-Text Data at Scale
CVPR 2024 1
MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition
ICCV 2023 1
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer
arXiv 2023
MediaPipe: A Framework for Building Perception Pipelines
arXiv 2019
Affiliations
Frequent co-authors
10from 25 papers