EpiEvolve: Self-Evolving Agents for Streaming Pandemic Forecasting under Regime Shifts

Epidemic LLM forecasters are usually trained and evaluated as static supervised models, whereas operational pandemic forecasting is a streaming process in which labels arrive after predictions and disease regimes shift over time. We study this mismatch in weekly COVID-19 hospitalization trend forecasting across five variant regimes. We introduce EpiEvolve, a self-evolving agent that wraps an LLM forecaster trained on the warm-start period and keeps its weights fixed during streaming. EpiEvolve adapts by storing forecast outcomes in a hierarchical episodic memory, reflecting on delayed labels, retrieving cases relevant to the current regime, and distilling recurring errors into strategic rules. The resulting context lets the forecaster reuse its own past predictions and outcomes in later weeks while following a chronological protocol that prevents future leakage. On the streaming dataset, EpiEvolve reaches 0.629 average accuracy, compared with 0.561 for the static backbone and 0.325 for the external CDC ensemble, and reduces recovery lag after regime shifts from 5 to 2 weeks. Ablations show that reflection, strategic memory, and regime-aware retrieval each contribute to the gains.