Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction

Metal-organic frameworks (MOFs) are porous crystalline materials with broad applications such as carbon capture and drug delivery, yet accurately predicting their 3D structures remains a significant challenge. While Large Language Models (LLMs) have shown promise in generating crystal structures, their application to MOFs is hindered by MOFs' high structural complexity arising from the large number of atoms in unit cell. Inspired by the success of block-wise paradigms in deep generative models for MOFs, we pioneer the application of LLMs in this domain by introducing MOF-LLM, the first LLM framework specifically adapted for block-level MOF structure prediction. To effectively harness LLMs for this 3D modular assembly task, our training paradigm integrates spatial-aware continual pre-training (CPT), structural supervised fine-tuning (SFT), and matching-driven reinforcement learning (RL). By incorporating explicit spatial priors and optimizing structural stability via Soft Adaptive Policy Optimization (SAPO), our approach substantially enhances the spatial reasoning in a Qwen-3 8B model for MOF structure prediction. Comprehensive experiments demonstrate that MOF-LLM achieves state-of-the-art performance with a match rate of 35.78% while exhibiting superior sampling efficiency of 0.04 seconds per structure.