Hao Chen
- Papers
- 97
Cite
Notes
Only stored in your browser.
Authored papers
97BitDance: Scaling Autoregressive Generative Models with Binary Tokens
arXiv 2026
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
arXiv 2026
Exploring Spatial Intelligence from a Generative Perspective
arXiv 2026
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models
arXiv 2026
MARBLE: Multi-Aspect Reward Balance for Diffusion RL
arXiv 2026
\$OneMillion-Bench: How Far are Language Agents from Human Experts?
arXiv 2026
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models
arXiv 2026
Glance and Focus Reinforcement for Pan-cancer Screening
arXiv 2026
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering
arXiv 2026
Progressive Residual Warmup for Language Model Pretraining
arXiv 2026
TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training
arXiv 2026
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models
arXiv 2025
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
arXiv 2025
Masked Autoencoders Are Effective Tokenizers for Diffusion Models
arXiv 2025
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
arXiv 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error
arXiv 2025
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
arXiv 2025
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
arXiv 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
arXiv 2025
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
arXiv 2025
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
arXiv 2025
RynnVLA-002: A Unified Vision-Language-Action and World Model
arXiv 2025
The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation
arXiv 2025
Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models
arXiv 2025
UniGame: Turning a Unified Multimodal Model Into Its Own Adversary
arXiv 2025
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
arXiv 2025
WorldVLA: Towards Autoregressive Action World Model
arXiv 2025
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
arXiv 2025
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
arXiv 2025
GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling
arXiv 2025
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
arXiv 2025
MedBookVQA: A Systematic and Comprehensive Medical Benchmark Derived from Open-Access Book
arXiv 2025
Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning
arXiv 2025
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
arXiv 2025
Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology
arXiv 2025
Image Tokenizer Needs Post-Training
arXiv 2025
POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction
ICCV 2025
PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning
arXiv 2025
ConceptCLIP: Towards Trustworthy Medical AI via Concept-Enhanced Contrastive Langauge-Image Pre-training
arXiv 2025
DIDS: Domain Impact-aware Data Sampling for Large Language Model Training
arXiv 2025
Adversarial Flow Models
arXiv 2025
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
arXiv 2025
Open-CD: A Comprehensive Toolbox for Change Detection
arXiv 2024
Large-Scale 3D Medical Image Pre-training with Geometric Context Priors
arXiv 2024
MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology
arXiv 2024
AgentReview: Exploring Peer Review Dynamics with LLM Agents
arXiv 2024
MedIAnomaly: A comparative study of anomaly detection in medical images
arXiv 2024
ControlVAR: Exploring Controllable Visual Autoregressive Modeling
arXiv 2024
Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation
arXiv 2024
What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?
arXiv 2024
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
larp-tokenizing-videos-with-a-learned
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
metric3d-v2-a-versatile-monocular-geometric
TableGPT2: A Large Multimodal Model with Tabular Data Integration
arXiv 2024
Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation
arXiv 2024
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
CVPR 2024 1
Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?
arXiv 2024
GSCo: Towards Generalizable AI in Medicine via Generalist-Specialist Collaboration
arXiv 2024
Prototypical Information Bottlenecking and Disentangling for Multimodal Cancer Survival Prediction
arXiv 2024
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models
arXiv 2024
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
CVPR 2024 1
Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation
arXiv 2024
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
arXiv 2024
A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
arXiv 2024
Aligning Medical Images with General Knowledge from Large Language Models
arXiv 2024
GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing
arXiv 2024
WiseAD: Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model
arXiv 2024
$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
arXiv 2024
Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM
arXiv 2024
GameGen-X: Interactive Open-world Game Video Generation
arXiv 2024
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
CVPR 2024 1
RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model
arXiv 2023
A Survey on Evaluation of Large Language Models
arXiv 2023
CTVIS: Consistent Training for Online Video Instance Segmentation
ICCV 2023 1
Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection
arXiv 2023
Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency for Survival Prediction
ICCV 2023 1
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
arXiv 2023
HNeRV: A Hybrid Neural Representation for Videos
CVPR 2023 1
DoNet: Deep De-overlapping Network for Cytology Instance Segmentation
CVPR 2023 1
Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
arXiv 2023
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
arXiv 2023
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
arXiv 2023
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
arXiv 2023
Diffusion Models for Imperceptible and Transferable Adversarial Attack
arXiv 2023
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
arXiv 2023
SegPrompt: Boosting Open-world Segmentation via Category-level Prompt Learning
ICCV 2023 1
Object-aware Inversion and Reassembly for Image Editing
arXiv 2023
Cross-Modal Translation and Alignment for Survival Analysis
ICCV 2023 1
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning
arXiv 2023
Multi-view Self-supervised Disentanglement for General Image Denoising
ICCV 2023 1
Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning
train-once-get-a-family-state-adaptive
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
arXiv 2023
Better Zero-Shot Reasoning with Role-Play Prompting
arXiv 2023
PromptBench: A Unified Library for Evaluation of Large Language Models
arXiv 2023
USB: A Unified Semi-supervised Learning Benchmark for Classification
arXiv 2022
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
abcnet-real-time-scene-text-spotting-with-1
Practical No-box Adversarial Attacks against DNNs
practical-no-box-adversarial-attacks-against
FCOS: Fully Convolutional One-Stage Object Detection
fcos-fully-convolutional-one-stage-object-1
Affiliations
Frequent co-authors
10from 97 papers