Zhifang Sui

GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation

arXiv 2025

How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation

arXiv 2025

Chain-of-Thought Tokens are Computer Program Variables

arXiv 2025

ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors

arXiv 2024

Towards a Unified View of Preference Learning for Large Language Models: A Survey

arXiv 2024

Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding

arXiv 2024

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

arXiv 2024

Can Large Multimodal Models Uncover Deep Semantics Behind Images?

arXiv 2024

Confidence v.s. Critique: A Decomposition of Self-Correction Capability for LLMs

arXiv 2024

Large Language Model for Science: A Study on P vs. NP

arXiv 2023

Large Language Models are not Fair Evaluators

arXiv 2023

Enhancing Continual Relation Extraction via Classifier Decomposition

arXiv 2023

ImageNetVC: Zero- and Few-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories

arXiv 2023

A Survey on In-context Learning

arXiv 2022

StableMoE: Stable Routing Strategy for Mixture of Experts

ACL 2022 5

Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation

arXiv 2022

Calibrating Factual Knowledge in Pretrained Language Models

arXiv 2022