0

Enhancing Symbolic Regression and Universal Physics-Informed Neural Networks with Dimensional Analysis

In engineering and applied mathematics, developing accurate mathematical models to predict and understand real-world phenomena is of utmost importance. Symbolic regression is a useful machine learning-based tool to fit models but it can be computationally expensive.

Year
2024
Hosting
Full text hostedCC-BY-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2411.15919CC-BY-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

In engineering and applied mathematics, developing accurate mathematical models to predict and understand real-world phenomena is of utmost importance. Symbolic regression is a useful machine learning-based tool to fit models but it can be computationally expensive. We present a new method for enhancing symbolic regression for differential equations via dimensional analysis, specifically the Buckingham Π theorem and Ipsen's method. Since symbolic regression often suffers from high computational costs and overfitting, nondimensionalizing datasets reduces the number of input variables, simplifies the search space, and ensures that derived equations are physically meaningful. As a first step, we combine dimensional analysis with the PySR symbolic regression algorithm to show that dimensional analysis improves the accuracy of recovering algebraic equations. The results demonstrate that transforming data into a dimensionless form significantly improves the training and test error of the symbolic expressions found. Then, as our main contribution, we perform nondimensionalization guided by Ipsen's method. We then incorporate the nondimensionalized equation into a pipeline combining Universal Physics-Informed Neural Networks and symbolic regression to recover the unknown term when a differential equation is only partially known. We find that symbolic regression is able to better recover the unknown term after nondimensionalizing the data, under both noisy and noiseless conditions. These findings suggest that integrating dimensional analysis with symbolic regression can significantly lower computational costs and increase accuracy, providing a robust framework for automated discovery of governing equations in complex systems when data is limited.