0

Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification

Random forests construct each tree with a different, randomised representation of the feature space. Their uniform voting cannot correct errors in regions where trees with incorrect representations probabilistically outnumber correct ones, even when the ensemble collectively…

Year
2026
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2605.20716ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Random forests construct each tree with a different, randomised representation of the feature space. Their uniform voting cannot correct errors in regions where trees with incorrect representations probabilistically outnumber correct ones, even when the ensemble collectively holds enough correct information - a reducible error that this paper addresses. We propose using the structural pattern of each tree's decision path as an instance-adaptive reliability signal to identify and differentially weight the more reliable trees. At inference, a random forest reaches its prediction through the root-to-leaf path the sample traverses in each tree, so path-level reliability offers a finer granularity than tree-level weighting can access. We show that this signal reflects the actual reliability of each tree's decision, and that using it yields a statistically significant accuracy improvement over RF on 36 binary classification benchmarks (Wilcoxon p < 0.0001). Class-recall regression - the typical failure mode of RF correction methods - is measured: zero minority-recall regressions and a single majority-recall regression at the 0.2 pp threshold, indicating bias reduction rather than a class trade-off. We further quantify the reducible error accessible to the method from the fitted RF alone; this estimate correlates strongly with per-dataset gain (Pearson r = +0.840, p < 0.0001). On the qualifying group it identifies, the method delivers a mean +0.99 pp accuracy improvement with strict wins on every dataset (7/0/0); an optional amplification mechanism further raises this to +1.48 pp.