Rio Yokota
- Papers
- 5
Cite
Notes
Only stored in your browser.
5papers
Authored papers
5Rewriting Pre-Training Data Boosts LLM Performance in Math and Code
arXiv 2025
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
arXiv 2025
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
arXiv 2025
Variational Learning is Effective for Large Deep Networks
arXiv 2024
Pre-training Vision Transformers with Very Limited Synthesized Images
ICCV 2023 1
Affiliations
No known affiliations.
Frequent co-authors
10from 5 papers