From Silos to Systems: Process-Oriented Hazard Analysis for AI Systems

To effectively address potential harms from Artificial Intelligence (AI) systems, it is essential to identify and mitigate system-level hazards. Current analysis approaches focus on individual components of an AI system, like training data or models, in isolation, overlooking hazards from component interactions or how they are situated within a company's development process. To this end, we draw from the established field of system safety, which considers safety as an emergent property of the entire system. In this work, we translate System Theoretic Process Analysis (STPA) - a recognized system safety framework - for analyzing AI development and operation processes. We focus on systems that rely on machine learning algorithms and conduct STPA on three case studies involving linear regression, reinforcement learning, and transformer-based generative models. Our analysis explored how STPA's control and system-theoretic perspectives apply to AI systems and whether unique AI traits - such as model opacity, capability uncertainty, and output complexity - necessitate modifications to the framework. We find that the key concepts and steps of conducting an STPA apply to AI systems but require targeted adaptations to address AI-specific challenges that arise to differing degrees across three case studies. We present the Process-oriented Hazard Analysis for AI Systems (PHASE) as a guideline that adapts STPA concepts for AI. Applying and interpreting STPA using the PHASE guidelines enables four key affordances for analysts responsible for managing AI system harms: 1) detection of system-level hazards, including those from accumulation of disparate issues; 2) explicit acknowledgment of social factors contributing to algorithmic harms; 3) creation of traceable accountability chains between harms and those who can mitigate them; and 4) ongoing monitoring and mitigation of new hazards.