0

Kaiyan Zhang

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

Post-Trained MoE Can Skip Half Experts via Self-Distillation

arXiv 2026

2026

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

arXiv 2025

2025

TTRL: Test-Time Reinforcement Learning

arXiv 2025

2025

Process Reinforcement through Implicit Rewards

arXiv 2025

2025

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

arXiv 2025

2025

A Survey of Reinforcement Learning for Large Reasoning Models

arXiv 2025

2025

SSRL: Self-Search Reinforcement Learning

arXiv 2025

2025

FlowRL: Matching Reward Distributions for LLM Reasoning

arXiv 2025

2025

Towards a Unified View of Large Language Model Post-Training

arXiv 2025

2025

SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

arXiv 2025

2025

Video-T1: Test-Time Scaling for Video Generation

ICCV 2025

2025

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

arXiv 2025

2025

GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

arXiv 2025

2025

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

arXiv 2025

2025

Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

arXiv 2024

2024

UltraMedical: Building Specialized Generalists in Biomedicine

arXiv 2024

2024

Free Process Rewards without Process Labels

arXiv 2024

2024

Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation

arXiv 2024

2024

Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

arXiv 2024

2024

How to Synthesize Text Data without Model Collapse?

arXiv 2024

2024

Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

arXiv 2024

2024

Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process

arXiv 2024

2024

PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning

arXiv 2023

2023

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

arXiv 2023

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers