Tianlong Chen

Quantum Variational Activation Functions Empower Kolmogorov-Arnold Networks

arXiv 2025

Window Token Concatenation for Efficient Visual Large Language Models

arXiv 2025

A Space-Time Transformer for Precipitation Forecasting

arXiv 2025

Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation

arXiv 2025

Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense

arXiv 2025

MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models

arXiv 2025

Skill-Based Mixture-of-Experts: Adaptive Routing for Heterogeneous Reasoning via Inferred Skills

arXiv 2025

GradientStabilizer:Fix the Norm, Not the Gradient

arXiv 2025

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

arXiv 2025

DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis

arXiv 2024

TrustLLM: Trustworthiness in Large Language Models

arXiv 2024

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

ICCV 2025

GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations

arXiv 2024

Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection

arXiv 2024

Glider: Global and Local Instruction-Driven Expert Router

arXiv 2024

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

arXiv 2024

MerRec: A Large-scale Multipurpose Mercari Dataset for Consumer-to-Consumer Recommendation Systems

arXiv 2024

Composable Interventions for Language Models

arXiv 2024

Contextualization Distillation from Large Language Model for Knowledge Graph Completion

arXiv 2024

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild

arXiv 2024

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

arXiv 2023

Robust Mixture-of-Expert Training for Convolutional Neural Networks

ICCV 2023 1

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

arXiv 2023

Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts

ICCV 2023 1

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

the-emergence-of-essential-sparsity-in-large

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

arXiv 2023

TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

CVPR 2024 1

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

the-unreasonable-effectiveness-of-random

Unified Visual Transformer Compression

unified-visual-transformer-compression

M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

arXiv 2022

Advancing Model Pruning via Bi-level Optimization

arXiv 2022

Neural Implicit Dictionary via Mixture-of-Expert Training

arXiv 2022

APP: Anytime Progressive Pruning

arXiv 2022