Failure Domains

Thomas Rocha · February 2026

The Seven Convergent Failure Domains

A cross-domain glossary mapping 61 industry-specific terms to seven stable failure domains. Industries that operate large-scale distributed systems routinely invent local vocabulary to describe failures they can observe but cannot fully control. The vocabulary changes. The underlying failure topology does not.

Part I: The Seven Domains

These are not seven problems. They are seven locations where independently recognized industry failures accumulate because distributed systems lack a shared coordination primitive. Each domain has its own regulatory vocabulary, its own incident history, and its own remediation industry. None of those remediation efforts resolves the underlying condition because they address symptoms at the layer where the symptom is visible, not at the layer where the failure originates.

Domain 1: Accessibility

The convergence point where assistive technology requirements, real-time communication accommodations, and regulatory compliance mandates (ADA, EAA, WCAG, Section 508) persistently fail despite individual component compliance. The structural problem is that accessibility obligations attach to the experience, not the component. Compliance is evaluated at the component level. Failure occurs at the coordination level.

Domain 2: Zero Trust Security

The convergence point where continuous authentication, microsegmentation, least-privilege enforcement, and identity verification systematically underperform their design promises. Security posture is enforced per-hop rather than per-coordination-event, creating gaps between enforcement points where authority is assumed rather than verified. Dashboards report compliance. The coordination layer operates outside it.

Domain 3: AI Coordination

The convergence point where multi-agent orchestration, context sharing, tool coordination, state synchronization, and agent lifecycle management produce compounding failures as systems scale. The failures are not model failures. They are coordination failures that current architectures have no mechanism to detect, bound, or prevent.

Domain 4: Data Residency and Sovereignty

The convergence point where jurisdictional compliance requirements, cross-border data flow restrictions, localization mandates, and regulatory audit obligations cannot be reliably satisfied in distributed systems. No current mechanism enforces sovereignty rules at the moment the coordination decision is made, in real time, for the specific data involved in the specific operation.

Domain 5: Mobile Network Complexity

The convergence point where 5G network slicing, edge compute placement, handoff persistence, roaming continuity, and multi-access convergence fail to deliver the seamless coordination experience their architectures promise. The network kept the pipe open. Nobody kept track of what was flowing through it or why.

Domain 6: Efficiency Paradox

The convergence point where adding coordination infrastructure makes distributed systems more expensive, slower, and more fragile rather than more capable. Coordination overhead between agents scales at an exponent of approximately 1.724, meaning costs compound faster than capability. Organizations discover, after deployment at scale, that the infrastructure built to automate work costs more to operate than the manual processes it replaced.

Domain 7: Concurrency Control

The convergence point where distributed state management, race conditions, checkpoint failures, split-brain scenarios, and eventual consistency drift produce data integrity failures that scale with system complexity. The problem is not that individual components handle concurrency poorly. It is that no layer arbitrates concurrency across the full coordination context.

New terms will continue to emerge. They will map to these seven domains because there are only so many ways coordination fails when no layer owns it.

Part II: Industry Terminology Index

61 terms representing the current industry vocabulary for symptoms, remediation attempts, and failure conditions that map to the seven convergent failure domains.

Distribution: Zero Trust Security (12) · AI Coordination (11) · Concurrency Control (10) · Efficiency Paradox (9) · Data Residency and Sovereignty (7) · Mobile Network Complexity (7) · Accessibility (5)
Agent SprawlAI Coordination

The uncontrolled proliferation of autonomous AI agents across an enterprise without unified visibility, governance, or coordination. Salesforce's 2026 Connectivity Benchmark found the average enterprise runs 12 agents (projected to reach 20 by 2027) while only 27% of applications are connected.

Agent WashingAI Coordination

The practice of vendors rebranding existing automation, chatbots, or workflow tools as "agentic AI" without genuine autonomous capability. Industry analysts estimate only approximately 130 of thousands of claimed AI agent vendors are building genuinely agentic systems.

Agent-to-Agent Protocol GapAI Coordination

The absence of standardized communication protocols between AI agents from different vendors or frameworks. Google's A2A and Anthropic's MCP represent early attempts to close this gap, but the lack of coordination-layer authority means agents can exchange messages without shared governance over what those messages authorize.

Algorithmic LegitimacyZero Trust Security

The condition where credibility is inferred from visibility and engagement metrics rather than institutional integrity or verified authority. In distributed systems, this manifests when orchestration layers grant authority based on connectivity or API access rather than verified coordination context.

Autonomous Join VelocityAI Coordination

The rate at which autonomous AI agents join distributed coordination contexts, sessions, and workflows without requiring explicit human authorization at each join event. Autonomous Join Velocity is not primarily an authentication problem. It is a coordination boundary problem: when agents join at machine speed without a shared structure defining what joining means, what authority it confers, and what obligations it creates, the distinction between authorized and unauthorized participation becomes unenforceable by design.

Boundary ConfusionAI Coordination

The failure condition where AI agents in a multi-agent system develop overlapping, conflicting, or undefined operational boundaries. Without explicit role definitions scoped to coordination context, agents make assumptions about their responsibilities that produce structural hallucinations in complex outputs.

Capability SaturationEfficiency Paradox

The empirically observed threshold (approximately 45% single-agent accuracy) beyond which adding more agents yields diminishing or negative returns. A quantitative manifestation of the Efficiency Paradox.

Cascading FailureConcurrency Control

A chain reaction where the failure of one component triggers failures in dependent components across a distributed system. In architectures without coordination boundaries, cascading failures propagate unpredictably because no coordination layer defines failure boundaries or isolation scopes per coordination context.

Checkpoint FailureConcurrency Control

The inability to save and restore consistent state at defined points during distributed operations. Most agentic AI frameworks lack safe checkpoint mechanisms, meaning that if an agent needs to pause, wait for external input, or recover from failure, no reliable restoration point exists.

Cloud Dependency RiskData Residency and Sovereignty

The systemic vulnerability created by organizational reliance on a single or small number of cloud providers for critical infrastructure. The EU's DORA regulation specifically targets this risk in financial services.

Complexity DebtEfficiency Paradox

The accumulated architectural burden from layering coordination mechanisms (service meshes, API gateways, orchestrators, middleware) atop systems that lack a native coordination primitive. Unlike technical debt, complexity debt compounds non-linearly because each remediation layer itself requires coordination.

Concentration RiskData Residency and Sovereignty

Regulatory and operational term for the danger of critical systems or data depending on a small number of infrastructure providers. DORA and NIS 2 regulations specifically address concentration risk in cloud and telecommunications dependencies.

Context FragmentationAI Coordination

The degradation of shared context when computational resources are distributed across multiple agents. Under fixed computational budgets, multi-agent systems suffer from each agent having insufficient capacity for tool orchestration compared to a single agent maintaining a unified memory stream.

Context Window CollisionConcurrency Control

The conflict that arises when multiple AI agents or processes attempt to operate on overlapping context windows without coordination, producing inconsistent reasoning based on divergent information states.

Continuous Verification FatigueZero Trust Security

The operational and computational burden of re-authenticating and re-authorizing every transaction in a Zero Trust architecture without coordination-scoped trust caching. Systems oscillate between excessive verification and insufficient verification because no coordination primitive defines appropriate verification scope and duration.

Coordination OverheadEfficiency Paradox

The measurable computational and temporal cost of managing communication between distributed agents or services. Research shows this overhead grows super-linearly (exponent of 1.724) as agents increase, meaning three-to-four agents represent a practical ceiling before coordination costs exceed coordination value.

Coordination TaxEfficiency Paradox

The aggregate cost imposed on distributed systems by the absence of a native coordination primitive. Every interaction that requires synchronization, state sharing, authority verification, or conflict resolution across distributed participants pays this tax through latency, compute overhead, integration complexity, and failure surface area.

Coordination TransparencyAI Coordination

A governance mechanism proposed in a 2026 Springer publication targeting agent-to-agent interactions through interaction logging, live coordination monitoring, intervention hooks, and boundary conditions. Addresses monitoring rather than the underlying coordination primitive.

Cost SurpriseEfficiency Paradox

The phenomenon where enterprises discover that AI orchestration at scale costs more than the manual processes it replaced. Thousands of LLM calls per process, each with variable latency and cost, compound without per-operation cost tracking.

Cross-Border Data Flow RestrictionsData Residency and Sovereignty

Regulatory controls limiting or conditioning the transfer of data across national or jurisdictional boundaries. The US DOJ Rule, China's CSL/DSL/PIPL, and EU data protection frameworks all impose distinct and sometimes conflicting requirements.

Data FragmentationData Residency and Sovereignty

The condition where organizational data exists across disconnected systems without unified access or governance. Salesforce's 2026 benchmark found the average organization manages 957 applications with only 27% connected.

Data Localization MandatesData Residency and Sovereignty

Legal requirements that specific categories of data must be stored and/or processed within defined geographic boundaries. Real-time coordination decisions must enforce localization dynamically.

DeadlockConcurrency Control

A condition where two or more distributed processes each hold resources the others need, creating a permanent standstill. In distributed systems without coordination boundaries, deadlocks become harder to detect and resolve because no coordination layer has visibility into the full dependency graph.

Digital Accessibility DebtAccessibility

The accumulated backlog of accessibility deficiencies across digital systems. In distributed systems, accessibility debt compounds because each component may individually meet standards while the coordinated experience fails to maintain accommodation state across transitions.

Distributed State DivergenceConcurrency Control

The condition where agents or services operating in parallel develop inconsistent representations of shared state. The core concurrency failure in multi-agent systems.

Edge Compute IsolationMobile Network Complexity

The architectural gap where processing distributed to network edge nodes loses coordination context with centralized or peer systems. Edge deployments optimize latency but fragment state, creating islands of computation that cannot maintain coherent coordination across network transitions.

Error PropagationConcurrency Control

The spreading of failures across distributed agent pipelines or service chains. In multi-agent systems, errors in one agent's output become corrupted inputs for downstream agents, compounding inaccuracies through the processing chain.

Eventual Consistency DriftConcurrency Control

The temporal gap during which distributed replicas hold different values and coordination decisions based on stale state produce incorrect outcomes. Without coordination governance, the scope and duration of that window cannot be bounded per coordination context.

FinOps for AgentsEfficiency Paradox

The emerging discipline of treating AI agent cost optimization as a first-class architectural concern. Includes heterogeneous model routing, strategic caching, request batching, and per-operation cost tracking. A remediation practice that addresses Efficiency Paradox symptoms without resolving the underlying coordination primitive absence.

Governance IllusionZero Trust Security

The condition where interfaces and dashboards suggest security control while algorithmic coordination unfolds beyond effective intervention. Transparency tooling becomes performative, creating documentation without practical oversight.

Handoff Persistence FailureMobile Network Complexity

The loss of coordination state or identity when a connection transitions between network cells, access technologies, or edge nodes. Transport-layer continuity does not guarantee coordination-layer continuity.

Identity SprawlZero Trust Security

The proliferation of identity credentials, tokens, and authentication contexts across distributed systems without unified lifecycle management. Each service, agent, and integration point maintains its own identity context, creating an unauditable web of access grants that undermines Zero Trust principles.

Integration TaxEfficiency Paradox

The recurring cost of connecting, maintaining, and synchronizing integrations between distributed systems that lack a common coordination primitive. Unlike one-time implementation costs, the integration tax compounds as systems scale and integration points multiply.

Jurisdictional CollisionData Residency and Sovereignty

The conflict that arises when a single distributed operation spans multiple legal jurisdictions with incompatible data governance requirements. A coordination event may simultaneously be subject to GDPR, the DOJ Rule, and local data protection laws with contradictory mandates.

Last-Write-Wins CorruptionConcurrency Control

Data loss or inconsistency caused by concurrent writes where the final write overwrites previous valid state without conflict detection or resolution. A common failure mode in distributed systems that lack coordination-scoped arbitration.

Lateral MovementZero Trust Security

An attacker's ability to move between systems, services, or network segments after gaining initial access. Without dynamic coordination context, segmentation policies cannot adapt to real-time distributed operations.

Microsegmentation DriftZero Trust Security

The gradual divergence between defined network segmentation policies and actual traffic patterns in distributed systems. Without dynamic segmentation tied to coordination context, security posture degrades silently as the operational reality outpaces policy definitions.

Model Collapse PropagationAI Coordination

The risk that AI model degradation (from training on AI-generated data) compounds across multi-agent systems where agents consume each other's outputs. Without coordination-scoped provenance tracking, the system cannot distinguish between original and synthetic data as it flows through coordination chains.

Multi-Access Edge Computing (MEC) SilosMobile Network Complexity

The isolation of processing capabilities deployed at network edge locations, where each MEC node operates as an independent compute island. Applications spanning multiple edge nodes lose coordination coherence because no coordination layer bridges edge-local optimization and end-to-end requirements.

Multi-Agent HallucinationAI Coordination

Confident but fabricated outputs that emerge specifically from coordination failures between AI agents rather than individual model limitations. Individual agents may function perfectly in isolation while the coordinated output is wrong.

Multimodal Accessibility FailureAccessibility

The inability to maintain consistent accommodation state when a coordination event spans multiple interaction modalities (voice, text, video, haptic). Each modality may individually comply with accessibility standards while transitions between modalities drop accommodation context.

Network Slicing FragmentationMobile Network Complexity

The coordination failure where 5G network slices, each optimized for specific service characteristics, cannot maintain unified coordination state across slice boundaries.

Orchestration DebtEfficiency Paradox

The technical and operational burden accumulated from deploying coordination mechanisms without an underlying coordination primitive. Each layer addresses a specific symptom while adding to total coordination overhead, creating compounding debt that makes the system progressively harder to modify, debug, or scale.

Overlay Solution FragilityAccessibility

The inherent brittleness of accessibility solutions applied as an overlay atop applications rather than integrated into the coordination architecture. Overlay tools break when the underlying application's state changes in ways the overlay cannot track.

OverpermissioningZero Trust Security

The granting of excessive access rights to AI agents, services, or users beyond what is required for their specific coordination context. Without coordination-scoped least-privilege enforcement, permissions are granted broadly and persist beyond their intended context.

Policy FragmentationZero Trust Security

The condition where security policies are defined and enforced inconsistently across different layers, services, and enforcement points in a distributed system. Network, identity, data, and application policies each operate with independent logic, creating gaps where no single policy authority governs the full coordination context.

Protocol Translation OverheadMobile Network Complexity

The computational and latency cost of converting between different network protocols as communication traverses heterogeneous transport layers. Each translation point introduces delay and potential state loss.

Race ConditionConcurrency Control

A timing-dependent failure where the outcome of distributed operations depends on the unpredictable sequence in which concurrent processes execute. In systems without shared coordination primitives, race conditions are endemic because no coordination layer arbitrates ordering or priority among concurrent participants.

Real-Time Captioning FailureAccessibility

The breakdown of live captioning, transcription, or sign language interpretation services during distributed communication. These failures occur not because the captioning technology is inadequate but because the coordination architecture cannot maintain synchronization between the primary communication stream and the accommodation stream.

Regulatory FragmentationData Residency and Sovereignty

The proliferation of overlapping, sometimes contradictory regulatory frameworks across jurisdictions. With GDPR, DORA, NIS 2, CCPA, the DOJ Rule, China's CSL/DSL/PIPL, and dozens of national data protection laws, distributed systems face a compliance landscape that cannot be navigated through static configuration.

Retroactive Remediation TrapAccessibility

The increasingly costly cycle of discovering and fixing accessibility failures after deployment rather than building accessibility into the coordination architecture. Each remediation addresses a specific symptom but does not resolve the underlying coordination gap.

Roaming State LossMobile Network Complexity

The loss of coordination context, preferences, or state when a mobile user or device transitions between network operators or roaming agreements. Transport-layer roaming protocols maintain connectivity but do not preserve the coordination-layer state required for continuous application-level coherence.

Rule of FourEfficiency Paradox

The empirically observed limit that effective multi-agent team sizes are constrained to approximately three-to-four agents before coordination overhead exceeds the value of added reasoning. A quantitative boundary condition of the Efficiency Paradox in current architectures.

Session Continuity LossMobile Network Complexity

The general condition where a logical coordination event loses coherence when the underlying network transport changes. The coordination concept exists at the application layer but is not recognized as a primitive by the network layer.

Shadow AIZero Trust Security

The unauthorized deployment and operation of AI agents, models, or automation tools outside the visibility and governance of organizational security frameworks. Without coordination governance, shadow AI is undetectable by design.

Single Point of FailureZero Trust Security

A component whose failure disables the entire system. Ironically, centralized AI orchestrators, identity providers, and governance platforms deployed to solve coordination problems frequently become the single points of failure that Zero Trust architecture was designed to prevent.

Split-Brain ScenarioConcurrency Control

A failure condition where a distributed system partitions into two or more segments that each believe they are the authoritative source of truth. Without coordination-scoped arbitration, both partitions continue processing, producing divergent state that cannot be automatically reconciled when connectivity is restored.

Supervisor BottleneckAI Coordination

The performance and reliability constraint created by centralized supervisor agents that coordinate worker agents. As the number of worker agents grows, the supervisor becomes a throughput limiter and single point of failure.

Training MisalignmentAI Coordination

The divergence that occurs when agents trained on different datasets, with different objectives, or at different points in time develop inconsistent knowledge representations. In multi-agent coordination, training misalignment produces subtle errors that only manifest during inter-agent communication.

Trust Boundary ErosionZero Trust Security

The gradual weakening of defined security boundaries as distributed systems evolve, integrate new services, and adapt to operational demands. Static trust boundaries defined at deployment time cannot track dynamic coordination patterns.

Verification GapZero Trust Security

The interval or scope within which a distributed system cannot verify the identity, authority, or integrity of a participating entity. Without coordination-scoped verification contracts, systems oscillate between over-verification and under-verification with no mechanism to calibrate verification to coordination context.

This glossary will grow. Every quarter, the industry will coin new terms for failures it discovers in distributed coordination. Each new term will map to one of these seven domains because there are only so many ways architectures fail when no layer owns coordination. The vocabulary is unstable. The underlying failure topology is fixed.