Deep reinforcement learning (RL) is increasingly deployed in resource-constrained environments, yet go-to function approximators - multilayer perceptrons (MLPs) - are often parameter-inefficient due to an imperfect inductive bias for the smooth structure of many value functions. This mismatch can also hinder sample efficiency and slow policy learning in this capacity-limited regime. Although model compression techniques exist, they operate post-hoc and do not improve learning efficiency. Spline-based architectures such as Kolmogorov-Arnold Networks (KANs) have been shown to offer parameter efficiency but are widely reported to exhibit significant computational overhead, especially at scale. In seeking to address these limitations, this work introduces SPAN (SPline-based Adaptive Networks) for RL. SPAN adapts the KHRONOS framework with a learnable preprocessing layer. SPAN is evaluated across discrete (PPO) and high-dimensional continuous (SAC) control tasks, offline settings (Minari/D4RL) and a real-world datacenter HVAC control application. SPAN achieves a 30-50% improvement in sample efficiency and 1.3-9 times higher success rates across benchmarks compared to MLP baselines. Despite incurring a per-step evaluation overhead of 1.2-1.8x, SPAN's superior convergence reliability yields an expected total training cost 1.3-6.3x lower than MLP baselines when accounting for convergence failures. In the HVAC application, SPAN reduces energy consumption in 9 of 12 months relative to MLP while simultaneously achieving a 1.1-3.4x reduction in thermal comfort violations across the evaluation year, demonstrating generalization to real-world engineering control. Furthermore, SPAN demonstrates superior anytime performance and robustness to hyperparameter variations, suggesting it as a viable, high-performance alternative for learning efficient policies in resource-limited settings.
Agile Reinforcement Learning through Separable Neural Architecture and Applications
Deep reinforcement learning (RL) is increasingly deployed in resource-constrained environments, yet go-to function approximators - multilayer perceptrons (MLPs) - are often parameter-inefficient due to an imperfect inductive bias for the smooth structure of many value functions.
- Preview

- Year
- 2026
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2601.23225CC-BY-4.0
- TL;DR
- Semantic Scholar