Navigating Demand Uncertainty in Container Shipping: Deep Reinforcement Learning for Enabling Adaptive and Feasible Master Stowage Planning

Reinforcement learning (RL) has successfully solved various deterministic and stochastic planning problems. However, conventional RL struggles with complex real-world constraints, particularly when feasibility is explicit and depends on the current state or trajectory. In this work, we address stochastic sequential decision-making with state-dependent constraints through a real-world case study of the master stowage planning problem in container shipping, which aims to optimize revenue and costs under demand uncertainty and operational constraints. We propose a deep RL framework with an encoder-decoder model that integrates problem instance, solution, and uncertainty information to guide planning. We introduce differentiable projection layers that enforce convex polyhedral constraints, while Jacobian corrections offset the projections to yield unbiased policy gradient estimates. Experiments show that our model efficiently finds adaptive, feasible solutions that generalize across distribution shifts and scale to longer planning horizons, outperforming state-of-the-art baselines in constrained RL and stochastic programming. As such, our policies enable adaptive, uncertainty-aware planning that can support resilient and sustainable supply chains.