Di Zhang
- Papers
- 42
Cite
Notes
Only stored in your browser.
Authored papers
42δ-mem: Efficient Online Memory for Large Language Models
arXiv 2026
MinT: Managed Infrastructure for Training and Serving Millions of LLMs
arXiv 2026
Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity
arXiv 2026
Flow-GRPO: Training Flow Matching Models via Online RL
arXiv 2025
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
arXiv 2025
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
ICCV 2025
MiMo-VL Technical Report
arXiv 2025
GameFactory: Creating New Games with Generative Interactive Videos
ICCV 2025
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
arXiv 2025
RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification
arXiv 2025
AlignRAG: Leveraging Critique Learning for Evidence-Sensitive Retrieval-Augmented Reasoning
arXiv 2025
MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback
arXiv 2025
Scaling Image and Video Generation via Test-Time Evolutionary Search
arXiv 2025
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
arXiv 2025
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
arXiv 2025
Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics
arXiv 2025
VMoBA: Mixture-of-Block Attention for Video Diffusion Models
arXiv 2025
Chem-R: Learning to Reason as a Chemist
arXiv 2025
HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment
arXiv 2025
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
arXiv 2025
SketchVideo: Sketch-based Video Generation and Editing
CVPR 2025 1
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
arXiv 2025
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
arXiv 2025
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
arXiv 2025
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
arXiv 2024
StyleMaster: Stylize Your Video with Artistic Generation and Translation
CVPR 2025 1
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint
arXiv 2024
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
arXiv 2024
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
arXiv 2024
ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area
arXiv 2024
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
CVPR 2025 1
DragAnything: Motion Control for Anything using Entity Representation
arXiv 2024
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
arXiv 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
arXiv 2024
VideoTetris: Towards Compositional Text-to-Video Generation
arXiv 2024
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors
arXiv 2024
PlacidDreamer: Advancing Harmony in Text-to-3D Generation
arXiv 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
arXiv 2024
Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models
arXiv 2024
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models
arXiv 2024
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
arXiv 2023
I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models
arXiv 2023
Affiliations
Frequent co-authors
10from 42 papers