0

Fan Zhang

Papers
25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
25papers

Authored papers

25

SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation

arXiv 2026

2026

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

arXiv 2026

2026

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

arXiv 2026

2026

Ebisu: Benchmarking Large Language Models in Japanese Finance

arXiv 2026

2026

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

arXiv 2025

2025

GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing

arXiv 2025

2025

Uniform Discrete Diffusion with Metric Path for Video Generation

arXiv 2025

2025

Emu3.5: Native Multimodal Models are World Learners

arXiv 2025

2025

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

arXiv 2025

2025

GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling

arXiv 2025

2025

Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning

arXiv 2025

2025

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

arXiv 2025

2025

Compressing Chain-of-Thought in LLMs via Step Entropy

arXiv 2025

2025

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

arXiv 2025

2025

Emu3: Next-Token Prediction is All You Need

arXiv 2024

2024

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

arXiv 2024

2024

Affordance-based Robot Manipulation with Flow Matching

arXiv 2024

2024

Diffusion Feedback Helps CLIP See Better

arXiv 2024

2024

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

arXiv 2024

2024

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

arXiv 2024

2024

Generative Multimodal Models are In-Context Learners

CVPR 2024 1

2023

CapsFusion: Rethinking Image-Text Data at Scale

CVPR 2024 1

2023

MDCS: More Diverse Experts with Consistency Self-distillation for Long-tailed Recognition

ICCV 2023 1

2023

MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer

arXiv 2023

2023

MediaPipe: A Framework for Building Perception Pipelines

arXiv 2019

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 25 papers