PMM Plan

v2.4.0 | Report Errata

docs operations docs operations

Data Collection Strategy The PMM plan’s data collection strategy specifies what data is collected, from which sources, and at what frequency. The data collection layer captures inference inputs (the data the system receives), inference outputs (the system’s decisions or recommendations), ground truth labels (where available, whether immediately or with delay), operational metadata (latency, error codes, resource utilisation), human oversight interactions (overrides, review times, escalations), and deployer feedback (complaints, anomaly reports, usage patterns). Each data source has a defined collection mechanism. Inference inputs and outputs are captured asynchronously from the production pipeline, typically streamed to a message queue (Kafka, AWS Kinesis, Google Pub/Sub) to avoid adding latency to the inference path. Ground truth labels may arrive with significant delay; the collection strategy documents the expected delay for each label source and the mechanism for matching labels to the corresponding predictions. The data collection layer must handle production peak throughput without data loss. Dropped monitoring events create blind spots in the compliance record that may coincide precisely with the stress conditions most likely to produce compliance-relevant anomalies. Key outputs

Per-source data collection specification (what, where, how often)
Asynchronous collection to avoid inference latency impact
Ground truth matching with expected delay documentation
Peak throughput handling without data loss

Analysis Methodology The analysis methodology defines how collected data is processed into compliance-relevant metrics. The PMM metric set mirrors the validation gate metrics established during development, adapted for production conditions where ground truth may be unavailable or delayed. For systems where ground truth is available, the Technical SME computes the declared performance metrics directly on production data. For systems where ground truth is delayed, the methodology defines proxy metrics and leading indicators. NannyML’s Confidence-Based Performance Estimation (CBPE) method estimates model accuracy from the confidence score distribution without requiring ground truth labels. Estimated performance is monitored continuously, with alerts when estimates fall below declared thresholds. The methodology also specifies the statistical tests applied to detect drift (PSI, KS test, Jensen-Shannon divergence, Wasserstein distance), the computation frequency for each metric (hourly, daily, weekly), and the minimum sample sizes required for statistically meaningful computation. Metrics computed on insufficient sample sizes are flagged as inconclusive rather than reported as definitive. Key outputs

Production metric set mirroring validation gate metrics
Ground truth delay handling (proxy metrics, CBPE estimation)
Statistical tests and computation frequency per metric
Minimum sample size requirements

Threshold & Trigger Framework The threshold framework distinguishes normal variation from alert conditions. Each PMM metric has a defined tolerance band (the range of values considered normal), a warning threshold (the boundary at which investigation is warranted), and a critical threshold (the boundary at which immediate action is required). Thresholds are derived from the system’s validation performance and calibrated against the deployment context. A drift threshold of PSI > 0.2 is a common starting point for investigation; PSI between 0.1 and 0.2 warrants monitoring. Performance thresholds align with the AISDP-declared accuracy and fairness commitments: the critical threshold corresponds to the declared minimum, and the warning threshold is set above the critical threshold to provide early warning. Thresholds are reviewed quarterly at the PMM governance meeting. Initial thresholds set conservatively (generating more alerts) are tuned based on operational experience. Threshold tuning is documented, with the rationale for each adjustment and the AI Governance Lead’s approval. Key outputs

Per-metric tolerance band, warning threshold, and critical threshold
Derivation from validation performance and deployment context
Quarterly review with documented adjustment rationale
AI Governance Lead approval for threshold changes

Escalation Procedures The escalation procedures define who is notified, how quickly, and what actions follow when a threshold is breached. The procedures are tiered by severity (informational, warning, critical) and specify the notification channel (dashboard, email, PagerDuty/Opsgenie), the initial responder, the escalation timeline, and the expected actions at each step. Escalation procedures account for out-of-hours scenarios, key person unavailability (named alternates for every role), and multi-jurisdiction incidents where different authorities may need notification in different time zones. The procedures are rehearsed annually through tabletop exercises and documented in the PMM plan. The escalation procedures cross-reference the serious incident reporting process. Where a critical alert indicates potential harm that meets the Article 3(49) serious incident definition, the escalation pathway transitions directly into the incident reporting workflow. Key outputs

Severity-tiered escalation with notification channels and timelines
Out-of-hours and key-person-unavailability contingencies
Annual rehearsal through tabletop exercises
Cross-reference to serious incident reporting

Feedback Loop Definition The feedback loop connects PMM findings to the risk management system, the AISDP, and the development cycle. When monitoring identifies a performance degradation, a fairness drift, or a new risk that was not anticipated during development, the feedback loop ensures this information is acted upon rather than merely recorded. PMM findings feed into the risk register: a newly identified risk from production monitoring is added to the register with its source identified as “PMM finding.” Findings that affect documented AISDP claims (for example, a sustained performance degradation below the declared threshold) trigger an AISDP update. Findings that indicate a need for model retraining, additional data collection, or architecture changes enter the development backlog through the change management framework. The quarterly PMM review meeting is the primary governance forum for the feedback loop. The AI Governance Lead reviews monitoring trends, approves corrective actions, and confirms that the feedback loop is functioning, meaning that findings are producing changes, not accumulating in reports. Key outputs

PMM findings integrated into risk register, AISDP, and development backlog
Change management framework as the channel for corrective actions
Quarterly governance review confirming feedback loop operation
Module 12 and Module 6 AISDP documentation