Nan Duan

Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models

arXiv 2026

OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence

arXiv 2026

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

arXiv 2026

SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing

arXiv 2026

OmniForcing: Unleashing Real-time Joint Audio-Visual Generation

arXiv 2026

EasyVideoR1: Easier RL for Video Understanding

arXiv 2026

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

arXiv 2025

2025

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

arXiv 2025

2025

Rho-1: Not All Tokens Are What You Need

arXiv 2024

2024

LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models

arXiv 2024

2024

Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation

arXiv 2024

2024

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

arXiv 2023

LongCoder: A Long-Range Pre-trained Language Model for Code Completion

arXiv 2023

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

arXiv 2023

CMMLU: Measuring massive multitask language understanding in Chinese

arXiv 2023

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

arXiv 2023

LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

arXiv 2023

Allies: Prompting Large Language Model with Beam Search

arXiv 2023

Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models

arXiv 2023

Constructing Multilingual Code Search Dataset Using Neural Machine Translation

arXiv 2023

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators

arXiv 2023

Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data

arXiv 2023

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

arXiv 2023

Low-code LLM: Graphical User Interface over Large Language Models

arXiv 2023

PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion

arXiv 2023

GameEval: Evaluating LLMs on Conversational Games

arXiv 2023

ORES: Open-vocabulary Responsible Visual Synthesis

arXiv 2023

ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

arXiv 2023

GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation

arXiv 2022

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

arXiv 2022

BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning

arXiv 2022

Unsupervised Context Aware Sentence Representation Pretraining for Multi-lingual Dense Retrieval

arXiv 2022

ReACC: A Retrieval-Augmented Code Completion Framework

ACL 2022 5

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

Findings (ACL) 2022 5

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

arXiv 2021

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

arXiv 2021

EL-Attention: Memory Efficient Lossless Attention for Generation

arXiv 2021

Adversarial Retriever-Ranker for dense text retrieval

adversarial-retriever-ranker-for-dense-text-1

WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach

Findings (EMNLP) 2021 11

AR-LSAT: Investigating Analytical Reasoning of Text

arXiv 2021

CoSQA: 20,000+ Web Queries for Code Search and Question Answering

ACL 2021 5