Conghui He
- Papers
- 91
Cite
Notes
Only stored in your browser.
Authored papers
91DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models
arXiv 2026
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation
arXiv 2026
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding
arXiv 2026
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs
arXiv 2026
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery
arXiv 2026
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents
arXiv 2026
ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch
arXiv 2026
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
arXiv 2026
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence
arXiv 2026
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
arXiv 2026
The Trinity of Consistency as a Defining Principle for General World Models
arXiv 2026
MoDora: Tree-Based Semi-Structured Document Analysis System
arXiv 2026
MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods
arXiv 2026
Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs
arXiv 2026
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility
arXiv 2026
PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control
arXiv 2026
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
arXiv 2025
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
arXiv 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
CVPR 2025 1
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation
arXiv 2025
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
arXiv 2025
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
arXiv 2025
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
arXiv 2025
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges
arXiv 2025
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
TRivia: Self-supervised Fine-tuning of Vision-Language Models for Table Recognition
arXiv 2025
OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value
arXiv 2025
OmniAID: Decoupling Semantic and Artifacts for Universal AI-Generated Image Detection in the Wild
arXiv 2025
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
arXiv 2025
Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation
arXiv 2025
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs
arXiv 2025
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
arXiv 2025
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
ICCV 2025
ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning
arXiv 2025
Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning
arXiv 2025
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once
arXiv 2025
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning
arXiv 2025
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning
arXiv 2025
Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation
arXiv 2025
LEGION: Learning to Ground and Explain for Synthetic Image Detection
ICCV 2025
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models
arXiv 2025
WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages
arXiv 2025
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion
arXiv 2025
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis
arXiv 2025
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer
arXiv 2025
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning
arXiv 2025
DOCR-Inspector: Fine-Grained and Automated Evaluation of Document Parsing with VLM
arXiv 2025
From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature
arXiv 2025
Shifting AI Efficiency From Model-Centric to Data-Centric Compression
arXiv 2025
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models
arXiv 2025
Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More
arXiv 2025
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
arXiv 2025
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
arXiv 2025
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models
arXiv 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
arXiv 2025
PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model
arXiv 2025
VADE: Variance-Aware Dynamic Sampling via Online Sample-Level Difficulty Estimation for Multimodal RL
arXiv 2025
GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition
arXiv 2025
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
arXiv 2025
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
arXiv 2024
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
CVPR 2025 1
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
arXiv 2024
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
arXiv 2024
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
arXiv 2024
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
arXiv 2024
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
arXiv 2024
LongWanjuan: Towards Systematic Measurement for Long Text Quality
arXiv 2024
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search
arXiv 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
arXiv 2024
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
CVPR 2025 1
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
arXiv 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
arXiv 2024
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
ICCV 2025
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition
arXiv 2024
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
arXiv 2024
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
arXiv 2024
SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models
arXiv 2024
Synth-Empathy: Towards High-Quality Synthetic Empathy Data
arXiv 2024
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
arXiv 2023
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
arXiv 2023
Parrot Captions Teach CLIP to Spot Text
arXiv 2023
MLLM-DataEngine: An Iterative Refinement Approach for MLLM
arXiv 2023
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models
arXiv 2023
VIGC: Visual Instruction Generation and Correction
arXiv 2023
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
CVPR 2024 1
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
arXiv 2023
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark
persformer-3d-lane-detection-via-perspective
Influence Selection for Active Learning
ICCV 2021 10
Affiliations
Frequent co-authors
10from 91 papers