v2.4.0 | Report Errata
docs operations docs operations

Operational vs Model Incident Triage When a monitoring alert fires, the triage process determines whether the root cause is operational (infrastructure, configuration, dependency) or model-related (drift, degradation, adversarial input). This distinction matters because the response path, the responsible team, and the regulatory implications differ. The PMM plan defines diagnostic procedures for common alert patterns. A simultaneous spike in latency and error rate with stable model metrics suggests an infrastructure issue. Stable infrastructure metrics with degrading accuracy suggest a model issue. Where the cause is ambiguous, both the engineering and ML teams are engaged simultaneously to avoid sequential diagnosis delays. Model-related incidents may have compliance implications (triggering AISDP updates, risk register entries, or serious incident reporting). Operational incidents typically do not have direct compliance implications unless they cause the system to produce incorrect outputs that affect persons. Key outputs

  • Operational vs model root cause triage framework
  • Diagnostic procedures for common alert patterns
  • Parallel team engagement for ambiguous causes
  • Compliance implication differentiation
On This Page