Exploring LLM Agent Designs and Interaction Modalities for Scientific Visualization

This paper examines how large language model (LLM) agents perform on scientific visualization (SciVis) tasks that require generating visualization workflows from natural-language instructions. We compare three representative agent designs: domain-specific agents with structured tool use, computer-use agents, and general-purpose coding agents, across 15 benchmark tasks, evaluating visualization quality, efficiency, robustness, computational cost, and the impact of persistent memory. We further study interaction modalities, including code scripts, model context protocol (MCP) or API calls, command-line interfaces (CLI), and graphical user interfaces (GUI). Our goal is to characterize the tradeoffs among representative SciVis agent configurations used in practice. The results reveal clear tradeoffs across agent designs and interaction modalities. General-purpose coding agents achieve the highest task success rates but incur greater computational cost, whereas domain-specific agents are more efficient and stable but less flexible. Computer-use agents perform well on individual operations but struggle with multi-step workflows. Across both CLI- and GUI-based settings, persistent memory improves performance over repeated trials, but its effectiveness depends on the interaction mode and the quality of feedback. These findings suggest that future SciVis systems should combine structured tool use, interactive capabilities, and adaptive memory mechanisms to balance performance, robustness, and flexibility.