Data Collection Strategy The PMM plan’s data collection strategy specifies what data is collected, from which sources, and at what frequency. The data collection layer captures inference inputs (the data the system receives), inference outputs (the system’s decisions or recommendations), ground truth labels (where available, whether immediately or with delay), operational metadata (latency, error codes, resource utilisation), human oversight interactions (overrides, review times, escalations), and deployer feedback (complaints, anomaly reports, usage patterns). Each data source has a defined collection mechanism. Inference inputs and outputs are captured asynchronously from the production pipeline, typically streamed to a message queue (Kafka, AWS Kinesis, Google Pub/Sub) to avoid adding latency to the inference path. Ground truth labels may arrive with significant delay; the collection strategy documents the expected delay for each label source and the mechanism for matching labels to the corresponding predictions. The data collection layer must handle production peak throughput without data loss. Dropped monitoring events create blind spots in the compliance record that may coincide precisely with the stress conditions most likely to produce compliance-relevant anomalies. Key outputs
- Per-source data collection specification (what, where, how often)
- Asynchronous collection to avoid inference latency impact
- Ground truth matching with expected delay documentation
- Peak throughput handling without data loss
Analysis Methodology The analysis methodology defines how collected data is processed into compliance-relevant metrics. The PMM metric set mirrors the validation gate metrics established during development, adapted for production conditions where ground truth may be unavailable or delayed. For systems where ground truth is available, the Technical SME computes the declared performance metrics directly on production data. For systems where ground truth is delayed, the methodology defines proxy metrics and leading indicators. NannyML’s Confidence-Based Performance Estimation (CBPE) method estimates model accuracy from the confidence score distribution without requiring ground truth labels. Estimated performance is monitored continuously, with alerts when estimates fall below declared thresholds. The methodology also specifies the statistical tests applied to detect drift (PSI, KS test, Jensen-Shannon divergence, Wasserstein distance), the computation frequency for each metric (hourly, daily, weekly), and the minimum sample sizes required for statistically meaningful computation. Metrics computed on insufficient sample sizes are flagged as inconclusive rather than reported as definitive. Key outputs
- Production metric set mirroring validation gate metrics
- Ground truth delay handling (proxy metrics, CBPE estimation)
- Statistical tests and computation frequency per metric
- Minimum sample size requirements
Threshold & Trigger Framework The threshold framework distinguishes normal variation from alert conditions. Each PMM metric has a defined tolerance band (the range of values considered normal), a warning threshold (the boundary at which investigation is warranted), and a critical threshold (the boundary at which immediate action is required). Thresholds are derived from the system’s validation performance and calibrated against the deployment context. A drift threshold of PSI > 0.2 is a common starting point for investigation; PSI between 0.1 and 0.2 warrants monitoring. Performance thresholds align with the AISDP-declared accuracy and fairness commitments: the critical threshold corresponds to the declared minimum, and the warning threshold is set above the critical threshold to provide early warning. Thresholds are reviewed quarterly at the PMM governance meeting. Initial thresholds set conservatively (generating more alerts) are tuned based on operational experience. Threshold tuning is documented, with the rationale for each adjustment and the AI Governance Lead’s approval. Key outputs
- Per-metric tolerance band, warning threshold, and critical threshold
- Derivation from validation performance and deployment context
- Quarterly review with documented adjustment rationale
- AI Governance Lead approval for threshold changes
Escalation Procedures The escalation procedures define who is notified, how quickly, and what actions follow when a threshold is breached. The procedures are tiered by severity (informational, warning, critical) and specify the notification channel (dashboard, email, PagerDuty/Opsgenie), the initial responder, the escalation timeline, and the expected actions at each step. Escalation procedures account for out-of-hours scenarios, key person unavailability (named alternates for every role), and multi-jurisdiction incidents where different authorities may need notification in different time zones. The procedures are rehearsed annually through tabletop exercises and documented in the PMM plan. The escalation procedures cross-reference the serious incident reporting process. Where a critical alert indicates potential harm that meets the Article 3(49) serious incident definition, the escalation pathway transitions directly into the incident reporting workflow. Key outputs
- Severity-tiered escalation with notification channels and timelines
- Out-of-hours and key-person-unavailability contingencies
- Annual rehearsal through tabletop exercises
- Cross-reference to serious incident reporting
Feedback Loop Definition The feedback loop connects PMM findings to the risk management system, the AISDP, and the development cycle. When monitoring identifies a performance degradation, a fairness drift, or a new risk that was not anticipated during development, the feedback loop ensures this information is acted upon rather than merely recorded. PMM findings feed into the risk register: a newly identified risk from production monitoring is added to the register with its source identified as “PMM finding.” Findings that affect documented AISDP claims (for example, a sustained performance degradation below the declared threshold) trigger an AISDP update. Findings that indicate a need for model retraining, additional data collection, or architecture changes enter the development backlog through the change management framework. The quarterly PMM review meeting is the primary governance forum for the feedback loop. The AI Governance Lead reviews monitoring trends, approves corrective actions, and confirms that the feedback loop is functioning, meaning that findings are producing changes, not accumulating in reports. Key outputs
- PMM findings integrated into risk register, AISDP, and development backlog
- Change management framework as the channel for corrective actions
- Quarterly governance review confirming feedback loop operation
- Module 12 and Module 6 AISDP documentation