Composite System Monitoring

v2.4.0 | Report Errata

docs operations docs operations

Per-Component & Aggregate Monitoring Composite systems (combining multiple models, modalities, or pipeline stages) require monitoring at both the component and aggregate levels. Degradation in one component can be masked by stability in another when only aggregate output is monitored. A medical imaging system where the vision component becomes less accurate but the text generation component continues producing fluent summaries would not be detected by end-to-end quality monitoring alone. The Technical SME computes performance metrics at the component level and at the system level. Discrepancies between component and aggregate metrics generate alerts. For example, if a component’s accuracy has degraded by 5% but the aggregate accuracy has degraded by only 1% (because another component partially compensates), the component-level degradation should still be investigated. The PMM plan specifies thresholds at both levels. Component-level thresholds may be tighter than aggregate thresholds, reflecting the principle that catching problems at the component level is cheaper and faster than waiting for them to manifest in aggregate output. Key outputs

Component-level and aggregate-level metric computation
Discrepancy alerting between component and aggregate
Component thresholds potentially tighter than aggregate
Module 12 AISDP documentation

Intermediate Representation Monitoring Between pipeline stages, data takes on intermediate representations (feature vectors, embedding spaces, intermediate predictions). Monitoring the distribution of these intermediate representations detects problems that neither component-level nor aggregate-level monitoring catches. If a feature engineering component silently changes its output distribution (due to a data source change), the downstream model may continue producing outputs within the expected range (compensating for small input changes) but with degraded accuracy for specific subgroups. The intermediate representation distribution shift is detectable even when aggregate metrics remain within threshold. Where intermediate outputs are not directly interpretable, proxy measures (distribution statistics, anomaly scores, consistency checks between parallel paths) provide detection capability. The Technical SME establishes baseline distributions for intermediate representations at deployment and monitors for shifts on the same schedule as input and output drift monitoring. Key outputs

Intermediate representation distribution monitoring
Silent upstream change detection
Proxy measures for non-interpretable intermediates
Baseline establishment at deployment with ongoing tracking

Cross-Modal Consistency Checks For systems processing multiple input modalities (text and image, structured data and free text), the outputs based on each modality should be consistent. If the text modality suggests one conclusion and the image modality suggests another, the system logs the conflict and the Technical SME monitors its resolution. A persistently high inconsistency rate may indicate that one modality’s model has drifted or that the fusion mechanism is not functioning as designed. The Technical SME tracks the cross-modal inconsistency rate as a PMM metric, with thresholds based on the expected disagreement rate observed during validation. The fusion logic itself (whether a weighted ensemble, a learned fusion layer, or a rule-based aggregation) is also monitored. Changes in the relative contribution of each modality to the final output, even when individual modality performance is stable, can indicate drift in the fusion logic or a shift in input patterns changing the effective weighting. Key outputs

Cross-modal inconsistency rate tracking
Fusion logic contribution monitoring
Thresholds based on validation-stage disagreement rates
Module 12 AISDP documentation