0

Li Dong

Papers
36

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
36papers

Authored papers

36

LLM-in-Sandbox Elicits General Agentic Intelligence

arXiv 2026

2026

A General Model for Retinal Segmentation and Quantification

arXiv 2026

2026

Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

arXiv 2026

2026

Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models

arXiv 2026

2026

Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

arXiv 2026

2026

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

arXiv 2025

2025

Data Efficacy for Language Model Training

arXiv 2025

2025

Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs

arXiv 2025

2025

VibeVoice Technical Report

arXiv 2025

2025

Black-Box On-Policy Distillation of Large Language Models

arXiv 2025

2025

On-Policy RL with Optimal Reward Baseline

arXiv 2025

2025

BitNet Distillation

arXiv 2025

2025

Multimodal Latent Language Modeling with Next-Token Diffusion

arXiv 2024

2024

Differential Transformer

arXiv 2024

2024

You Only Cache Once: Decoder-Decoder Architectures for Language Models

arXiv 2024

2024

Semi-Parametric Retrieval via Binary Bag-of-Tokens Index

arXiv 2024

2024

Semi-Offline Reinforcement Learning for Optimized Text Generation

arXiv 2023

2023

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

arXiv 2023

2023

Kosmos-2: Grounding Multimodal Large Language Models to the World

arXiv 2023

2023

Large Language Model for Science: A Study on P vs. NP

arXiv 2023

2023

Augmenting Language Models with Long-Term Memory

augmenting-language-models-with-long-term

2023

BioCLIP: A Vision Foundation Model for the Tree of Life

CVPR 2024 1

2023

Pre-Training to Learn in Context

arXiv 2023

2023

A Length-Extrapolatable Transformer

arXiv 2022

2022

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

arXiv 2022

2022

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

arXiv 2022

2022

StableMoE: Stable Routing Strategy for Mixture of Experts

ACL 2022 5

2022

CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation

arXiv 2022

2022

Language Models as Inductive Reasoners

arXiv 2022

2022

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

arXiv 2022

2022

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

ACL 2021 5

2021

Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training

EMNLP 2021 11

2021

Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders

EMNLP 2021 11

2021

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

arXiv 2021

2021

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

arXiv 2021

2021

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks

ECCV 2020 8

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 36 papers