Can Decision Trees Teach Large Language Models? Distilling Verbalized Knowledge for Molecular Property Prediction

Molecular Property Prediction (MPP) is a fundamental problem in drug discovery that has recently attracted growing attention. Large Language Models (LLMs), known for their impressive proficiency across domains, show promise as generalist models for MPP. However, their current performance remains below the threshold needed for practical adoption. To bridge this gap, we propose TreeKD for distilling the knowledge of tree-based specialist models into LLMs to complement the internal knowledge of LLMs and improve their predictive accuracy. For each property, we train a specialist decision tree using features derived from 40K functional groups in the input molecules. Then, the predictive rule learned by the decision tree, which encodes its knowledge, is verbalized and incorporated into the prompts for training LLMs. In addition, by replacing a single decision tree with a Random Forest, we introduce a test-time scaling technique called rule-consistency, which aggregates predictions generated from different prompts constructed with different rules. An extensive evaluation with two LLMs, Gemma-2-2B and Granite-3.3-2B, on the TDC benchmark with 22 prediction tasks shows that our method substantially enhances the performance of LLMs, advancing the development of generalist models for MPP.