Yikang Shen
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17PaTH Attention: Position Encoding via Accumulating Householder Transformations
arXiv 2025
Synthetic Data RL: Task Definition Is All You Need
arXiv 2025
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
arXiv 2024
Scattered Mixture-of-Experts Implementation
arXiv 2024
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
arXiv 2024
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
arXiv 2024
The infrastructure powering IBM's Gen AI model development
arXiv 2024
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
arXiv 2024
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
arXiv 2024
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
principle-driven-self-alignment-of-language
ModuleFormer: Modularity Emerges from Mixture-of-Experts
arXiv 2023
SALMON: Self-Alignment with Instructable Reward Models
arXiv 2023
Gated Linear Attention Transformers with Hardware-Efficient Training
arXiv 2023
Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
CVPR 2023 1
Mixture of Attention Heads: Selecting Attention Heads Per Token
arXiv 2022
StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling
structformer-joint-unsupervised-induction-of
Long Range Arena: A Benchmark for Efficient Transformers
arXiv 2020
Affiliations
Frequent co-authors
10from 17 papers