Yikang Shen

Papers: 17

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

17papers

Authored papers

PaTH Attention: Position Encoding via Accumulating Householder Transformations

arXiv 2025

2025

Synthetic Data RL: Task Definition Is All You Need

arXiv 2025

2025

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

arXiv 2024

2024

Scattered Mixture-of-Experts Implementation

arXiv 2024

2024

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

arXiv 2024

2024

API Pack: A Massive Multi-Programming Language Dataset for API Call Generation

arXiv 2024

2024

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

arXiv 2024

2024

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

arXiv 2024

2024

The infrastructure powering IBM's Gen AI model development

arXiv 2024

2024

ModuleFormer: Modularity Emerges from Mixture-of-Experts

arXiv 2023

2023

SALMON: Self-Alignment with Instructable Reward Models

arXiv 2023

2023

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

CVPR 2023 1

2023

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

principle-driven-self-alignment-of-language

2023

Gated Linear Attention Transformers with Hardware-Efficient Training

arXiv 2023

2023

Mixture of Attention Heads: Selecting Attention Heads Per Token

arXiv 2022

2022

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

structformer-joint-unsupervised-induction-of

2020

Long Range Arena: A Benchmark for Efficient Transformers

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

from 17 papers

Rameswar Panda

Chuang Gan

Zhenfang Chen

David Cox

Mayank Mishra

Shawn Tan

Yiming Yang

professor

3 shared papers

Zhen Guo

3 shared papers

Zhiqing Sun

researcher

3 shared papers

Aaron Courville

2 shared papers