Furu Wei
- Papers
- 94
Cite
Notes
Only stored in your browser.
Authored papers
94Audio-Visual Intelligence in Large Foundation Models
arXiv 2026
LLM-in-Sandbox Elicits General Agentic Intelligence
arXiv 2026
UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark
arXiv 2026
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
arXiv 2026
Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models
arXiv 2026
Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity
arXiv 2026
VibeVoice Technical Report
arXiv 2025
Black-Box On-Policy Distillation of Large Language Models
arXiv 2025
BitNet b1.58 2B4T Technical Report
arXiv 2025
On-Policy RL with Optimal Reward Baseline
arXiv 2025
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data
arXiv 2025
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
arXiv 2025
Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling
arXiv 2025
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
arXiv 2025
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings
arXiv 2025
BitNet Distillation
arXiv 2025
VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models
arXiv 2025
Geometric-Mean Policy Optimization
arXiv 2025
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs
arXiv 2025
Multimodal Latent Language Modeling with Next-Token Diffusion
arXiv 2024
Differential Transformer
arXiv 2024
You Only Cache Once: Decoder-Decoder Architectures for Language Models
arXiv 2024
Preference Optimization for Reasoning with Pseudo Feedback
arXiv 2024
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
arXiv 2024
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
arXiv 2024
Generative Representational Instruction Tuning
arXiv 2024
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token
arXiv 2024
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark
arXiv 2024
Mixture of LoRA Experts
arXiv 2024
LongEmbed: Extending Embedding Models for Long Context Retrieval
arXiv 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
arXiv 2024
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
arXiv 2024
Multi-Head Mixture-of-Experts
arXiv 2024
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks
arXiv 2024
Textual Aesthetics in Large Language Models
arXiv 2024
Little Giants: Synthesizing High-Quality Embedding Data at Scale
arXiv 2024
Semi-Parametric Retrieval via Binary Bag-of-Tokens Index
arXiv 2024
ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework
arXiv 2024
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
arXiv 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
arXiv 2023
Large Language Model for Science: A Study on P vs. NP
arXiv 2023
Inference with Reference: Lossless Acceleration of Large Language Models
arXiv 2023
In-context Autoencoder for Context Compression in a Large Language Model
arXiv 2023
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
arXiv 2023
Pre-Training to Learn in Context
arXiv 2023
ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic Decision-Making with AI Agents
arXiv 2023
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
arXiv 2023
Low-code LLM: Graphical User Interface over Large Language Models
arXiv 2023
Augmenting Language Models with Long-Term Memory
augmenting-language-models-with-long-term
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration
arXiv 2023
Dual-Alignment Pre-training for Cross-lingual Sentence Embedding
arXiv 2023
Are More Layers Beneficial to Graph Transformers?
arXiv 2023
SCALE: Synergized Collaboration of Asymmetric Language Translation Engines
arXiv 2023
On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation
arXiv 2023
Auto-ICL: In-Context Learning without Human Supervision
arXiv 2023
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
arXiv 2023
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models
arXiv 2023
XDoc: Unified Pre-training for Cross-Format Document Understanding
arXiv 2022
DiT: Self-supervised Pre-training for Document Image Transformer
arXiv 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
arXiv 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
arXiv 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
arXiv 2022
BEATs: Audio Pre-Training with Acoustic Tokenizers
arXiv 2022
A Length-Extrapolatable Transformer
arXiv 2022
StableMoE: Stable Routing Strategy for Mixture of Experts
ACL 2022 5
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation
arXiv 2022
UM4: Unified Multilingual Multiple Teacher-Student Model for Zero-Resource Neural Machine Translation
arXiv 2022
PromptBERT: Improving BERT Sentence Embeddings with Prompts
arXiv 2022
Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval
arXiv 2022
Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt
arXiv 2022
CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation
arXiv 2022
Language Models as Inductive Reasoners
arXiv 2022
HLT-MT: High-resource Language-specific Training for Multilingual Neural Machine Translation
arXiv 2022
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
arXiv 2022
GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation
arXiv 2022
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
arXiv 2021
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
arXiv 2021
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
arXiv 2021
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
arXiv 2021
Distilled Dual-Encoder Model for Vision-Language Understanding
arXiv 2021
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment
ACL 2021 5
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
arXiv 2021
Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
EMNLP 2021 11
Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
EMNLP 2021 11
Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders
EMNLP 2021 11
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
arXiv 2021
Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation
arXiv 2021
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
ECCV 2020 8
BERT Loses Patience: Fast and Robust Inference with Early Exit
NeurIPS 2020 12
DocBank: A Benchmark Dataset for Document Layout Analysis
COLING 2020 8
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
EMNLP 2020 11
TableBank: A Benchmark Dataset for Table Detection and Recognition
LREC 2020 5
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
ICLR 2020 1
Neural Question Generation from Text: A Preliminary Study
arXiv 2017
Affiliations
Frequent co-authors
10from 94 papers