Min Zhang
- Papers
- 113
Cite
Notes
Only stored in your browser.
Authored papers
113WindowsWorld: A Process-Centric Benchmark of Autonomous GUI Agents in Professional Cross-Application Environments
arXiv 2026
MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading
arXiv 2026
LMEB: Long-horizon Memory Embedding Benchmark
arXiv 2026
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment
arXiv 2026
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
arXiv 2026
Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning
arXiv 2026
AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning
arXiv 2026
MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models
arXiv 2026
LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding
arXiv 2026
Flux Attention: Context-Aware Hybrid Attention for Efficient LLMs Inference
arXiv 2026
Beyond Unimodal Shortcuts: MLLMs as Cross-Modal Reasoners for Grounded Named Entity Recognition
arXiv 2026
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
arXiv 2025
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
arXiv 2025
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation
arXiv 2025
Test-time Computing: from System-1 Thinking to System-2 Thinking
arXiv 2025
VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
arXiv 2025
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
arXiv 2025
Semantic Role Labeling: A Systematical Survey
arXiv 2025
Taming the Titans: A Survey of Efficient LLM Inference Serving
arXiv 2025
Learning from Peers in Reasoning Models
arXiv 2025
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
arXiv 2025
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation
arXiv 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
arXiv 2025
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
CVPR 2025 1
A Unified Agentic Framework for Evaluating Conditional Image Generation
arXiv 2025
Evaluating Intelligence via Trial and Error
arXiv 2025
Towards Text-Image Interleaved Retrieval
arXiv 2025
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
arXiv 2025
Knowledge Grafting of Large Language Models
arXiv 2025
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces
arXiv 2025
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
arXiv 2025
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
arXiv 2025
LOOM-Scope: a comprehensive and efficient LOng-cOntext Model evaluation framework
arXiv 2025
BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs
arXiv 2025
FlexRAG: A Flexible and Comprehensive Framework for Retrieval-Augmented Generation
arXiv 2025
Revisiting Long-context Modeling from Context Denoising Perspective
arXiv 2025
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
arXiv 2025
LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling
arXiv 2025
KaLM-Embedding-V2: Superior Training Techniques and Data Inspire A Versatile Embedding Model
arXiv 2025
Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
arXiv 2025
Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs
arXiv 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
arXiv 2025
HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
ICCV 2025
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
arXiv 2024
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
arXiv 2024
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
arXiv 2024
DISC: Plug-and-Play Decoding Intervention with Similarity of Characters for Chinese Spelling Check
arXiv 2024
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM
arXiv 2024
Rethinking Negative Instances for Generative Named Entity Recognition
arXiv 2024
A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models
arXiv 2024
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding?
arXiv 2024
Parameter Competition Balancing for Model Merging
arXiv 2024
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning
arXiv 2024
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
arXiv 2024
OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
arXiv 2024
Timo: Towards Better Temporal Reasoning for Language Models
arXiv 2024
LOGO -- Long cOntext aliGnment via efficient preference Optimization
arXiv 2024
SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection
arXiv 2024
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
arXiv 2024
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
arXiv 2024
In-Context Learning State Vector with Inner and Momentum Optimization
arXiv 2024
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation
arXiv 2024
PerSRV: Personalized Sticker Retrieval with Vision-Language Model
arXiv 2024
Interpret the Internal States of Recommendation Model with Sparse Autoencoder
arXiv 2024
Multi-Level Correlation Network For Few-Shot Image Classification
arXiv 2024
AutoSurvey: Large Language Models Can Automatically Write Surveys
arXiv 2024
StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning
arXiv 2024
MemLong: Memory-Augmented Retrieval for Long Text Modeling
arXiv 2024
StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs
arXiv 2024
Dynamic Planning for LLM-based Graphical User Interface Automation
arXiv 2024
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates
arXiv 2024
Why Not Transform Chat Large Language Models to Non-English?
arXiv 2024
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval
arXiv 2024
FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema
arXiv 2024
Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore
arXiv 2024
MetaCoCo: A New Few-Shot Classification Benchmark with Spurious Correlation
arXiv 2024
HA-HI: Synergising fMRI and DTI through Hierarchical Alignments and Hierarchical Interactions for Mild Cognitive Impairment Diagnosis
arXiv 2024
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
arXiv 2023
LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
arXiv 2023
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning
arXiv 2023
LMEye: An Interactive Perception Network for Large Language Models
arXiv 2023
A Two-Stage Adaptation of Large Language Models for Text Ranking
arXiv 2023
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
CVPR 2024 1
Generating Visual Spatial Description via Holistic 3D Scene Understanding
arXiv 2023
CMD: a framework for Context-aware Model self-Detoxification
arXiv 2023
Language Models are Universal Embedders
arXiv 2023
A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
arXiv 2023
Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration
arXiv 2023
Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination
arXiv 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
arXiv 2023
A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text
arXiv 2023
A Read-and-Select Framework for Zero-shot Entity Linking
arXiv 2023
OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch
arXiv 2023
NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts
arXiv 2023
A Survey of Large Language Models Attribution
arXiv 2023
Mirror: A Universal Framework for Various Information Extraction Tasks
arXiv 2023
Revisiting Sparse Retrieval for Few-shot Entity Linking
arXiv 2023
LLM-enhanced Self-training for Cross-domain Constituency Parsing
arXiv 2023
Holistic Exploration on Universal Decompositional Semantic Parsing: Architecture, Data Augmentation, and LLM Paradigm
arXiv 2023
Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs
arXiv 2023
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
arXiv 2023
Robust Self-Augmentation for Named Entity Recognition with Meta Reweighting
NAACL 2022 7
Improving Temporal Generalization of Pre-trained Language Models with Lexical Semantic Change
arXiv 2022
Improving Simultaneous Machine Translation with Monolingual Data
arXiv 2022
Evaluating Interpolation and Extrapolation Performance of Neural Retrieval Models
arXiv 2022
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond
arXiv 2022
Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments
COLING 2022 10
Jointly Optimizing Query Encoder and Product Quantization to Improve Retrieval Performance
arXiv 2021
Optimizing Dense Retrieval Model Training with Hard Negatives
arXiv 2021
Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval
arXiv 2021
RepBERT: Contextualized Text Embeddings for First-Stage Retrieval
arXiv 2020
IPRE: a Dataset for Inter-Personal Relationship Extraction
arXiv 2019
Modeling Graph Structure in Transformer for Better AMR-to-Text Generation
modeling-graph-structure-in-transformer-for-1
Affiliations
Frequent co-authors
10from 113 papers