DaCheng Tao
- Papers
- 102
Cite
Notes
Only stored in your browser.
Authored papers
102PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset
arXiv 2026
Understanding and Enforcing Weight Disentanglement in Task Arithmetic
arXiv 2026
UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation
arXiv 2026
GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering
arXiv 2026
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning
arXiv 2026
Language-based Trial and Error Falls Behind in the Era of Experience
arXiv 2026
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios
arXiv 2026
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
arXiv 2025
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
arXiv 2025
Retrieval-Augmented Perception: High-Resolution Image Perception Meets Visual RAG
arXiv 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
arXiv 2025
Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities
arXiv 2025
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO
arXiv 2025
SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data
arXiv 2025
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging
arXiv 2025
R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search
arXiv 2025
Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
arXiv 2025
Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
arXiv 2025
MAPO: Mixed Advantage Policy Optimization
arXiv 2025
Reasoning with Reinforced Functional Token Tuning
arXiv 2025
Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging
arXiv 2025
JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models
arXiv 2025
VeriGUI: Verifiable Long-Chain GUI Dataset
arXiv 2025
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment
arXiv 2025
GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization
arXiv 2025
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
arXiv 2025
Improving large language models with concept-aware fine-tuning
arXiv 2025
Safety at Scale: A Comprehensive Survey of Large Model Safety
arXiv 2025
A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations
arXiv 2025
Towards Understanding the Safety Boundaries of DeepSeek Models: Evaluation and Findings
arXiv 2025
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities
arXiv 2024
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
arXiv 2024
Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models
arXiv 2024
Communication Learning in Multi-Agent Systems from Graph Modeling Perspective
arXiv 2024
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
arXiv 2024
EMOv2: Pushing 5M Vision Model Frontier
arXiv 2024
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
arXiv 2024
Representation Surgery for Multi-Task Model Merging
arXiv 2024
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models
arXiv 2024
HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning
arXiv 2024
Intention Analysis Makes LLMs A Good Jailbreak Defender
arXiv 2024
Object Detectors in the Open Environment: Challenges, Solutions, and Outlook
arXiv 2024
A Survey on Knowledge Distillation of Large Language Models
arXiv 2024
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
arXiv 2024
Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping
arXiv 2024
Diffusion Model-Based Video Editing: A Survey
arXiv 2024
Aligning Large Language Models from Self-Reference AI Feedback with one General Principle
arXiv 2024
Revisiting Knowledge Distillation for Autoregressive Language Models
arXiv 2024
Deep Learning for Camera Calibration and Beyond: A Survey
arXiv 2023
AdaMerging: Adaptive Model Merging for Multi-Task Learning
arXiv 2023
Upcycling Models under Domain and Category Shift
CVPR 2023 1
Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning
ICCV 2023 1
FedSpeed: Larger Local Interval, Less Communication Round, and Higher Generalization Accuracy
arXiv 2023
Merging Experts into One: Improving Computational Efficiency of Mixture of Experts
arXiv 2023
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
arXiv 2023
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion
arXiv 2023
Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages
arXiv 2023
ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding
condaformer-disassembled-transformer-with
Good Questions Help Zero-Shot Image Reasoning
arXiv 2023
Learning to Learn from APIs: Black-Box Data-Free Meta-Learning
arXiv 2023
TriDet: Temporal Action Detection with Relative Boundary Modeling
CVPR 2023 1
VanillaNet: the Power of Minimalism in Deep Learning
vanillanet-the-power-of-minimalism-in-deep
HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting
arXiv 2023
One for All: Towards Training One Graph Model for All Classification Tasks
arXiv 2023
Vision Transformer with Quadrangle Attention
arXiv 2023
PNT-Edge: Towards Robust Edge Detection with Noisy Labels by Learning Pixel-level Noise Transitions
arXiv 2023
Structured Cooperative Learning with Graphical Model Priors
arXiv 2023
Centroid-centered Modeling for Efficient Vision Transformer Pre-training
arXiv 2023
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
arXiv 2023
Unifying Flow, Stereo and Depth Estimation
arXiv 2022
Unified Discrete Diffusion for Simultaneous Vision-Language Generation
arXiv 2022
Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning
CVPR 2022 1
Knowledge Graph Augmented Network Towards Multiview Representation Learning for Aspect-based Sentiment Analysis
arXiv 2022
Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval
arXiv 2022
Vega-MT: The JD Explore Academy Translation System for WMT22
arXiv 2022
CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose
CVPR 2023 1
ReAct: Temporal Action Detection with Relational Queries
arXiv 2022
Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
arXiv 2022
A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis
COLING 2022 10
SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters
arXiv 2022
PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation
arXiv 2022
Improving Simultaneous Machine Translation with Monolingual Data
arXiv 2022
Knowledge-Aware Federated Active Learning with Non-IID Data
ICCV 2023 1
On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation
COLING 2022 10
Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
arXiv 2022
ViTPose++: Vision Transformer for Generic Body Pose Estimation
arXiv 2022
Generating Holistic 3D Human Motion from Speech
CVPR 2023 1
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
CVPR 2023 1
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning
CVPR 2022 1
VSA: Learning Varied-Size Window Attention in Vision Transformers
arXiv 2022
Diff-Font: Diffusion Model for Robust One-Shot Font Generation
arXiv 2022
Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition
arXiv 2022
Bridging Cross-Lingual Gaps During Leveraging the Multilingual Sequence-to-Sequence Pretraining for Text Generation and Understanding
arXiv 2022
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation
arXiv 2022
TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack
arXiv 2022
GMFlow: Learning Optical Flow via Global Matching
CVPR 2022 1
One-Shot Object Affordance Detection in the Wild
arXiv 2021
CPP-Net: Context-aware Polygon Proposal Network for Nucleus Segmentation
arXiv 2021
Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation
ACL 2021 5
Neural networks behave as hash encoders: An empirical study
neural-networks-behave-as-hash-encoders-an
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data
arXiv 2020
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
arXiv 2019
Affiliations
Frequent co-authors
10from 102 papers