Data Drift Monitoring

v2.4.0 | Report Errata

docs operations docs operations

Input Drift The Technical SME compares the distribution of incoming data against the training data distribution using statistical measures. Population Stability Index (PSI), Kolmogorov-Smirnov test statistics, Jensen-Shannon divergence, and Wasserstein distance each capture different aspects of distributional change. Each input feature is monitored individually, and the Technical SME computes composite drift scores. Defined thresholds guide the response: PSI below 0.1 is stable, 0.1 to 0.2 warrants investigation, and above 0.2 requires immediate attention. The thresholds are calibrated during initial deployment based on the feature’s natural variability, and tuned through operational experience. Input drift monitoring detects situations where the system is receiving data it was not designed for. A recruitment screening system trained on applications from software engineers that begins receiving applications from financial analysts has experienced input drift that may invalidate the model’s predictions, even if the model’s behaviour appears superficially normal. Key outputs

Per-feature drift monitoring with PSI, KS, JS divergence, Wasserstein distance
Composite drift scores across features
Three-tier thresholds (stable, investigate, immediate attention)
Detection of out-of-distribution deployment contexts

Concept Drift Where ground truth labels become available (even with delay), the Technical SME monitors the relationship between input features and outcomes for changes. Concept drift occurs when the underlying relationship between inputs and outputs changes, meaning the model’s learned patterns no longer reflect reality. Concept drift is often more consequential than input drift because it means the model is fundamentally wrong, not merely operating on unfamiliar data. A credit scoring model trained during an economic expansion may experience concept drift during a recession, as the relationship between income levels and default probability changes. The model’s predictions remain confident but are miscalibrated for the new economic reality. Detection approaches include monitoring the model’s residual error distribution over time (increasing residuals suggest concept drift), comparing the model’s feature importance ranking against a baseline (a change in which features are most predictive suggests a concept shift), and applying drift detection methods on the input-output joint distribution. Where concept drift is detected, model retraining or recalibration is typically required. Key outputs

Input-output relationship monitoring for concept drift
Residual error distribution tracking
Feature importance stability analysis
Model retraining or recalibration trigger

Feature Drift The Technical SME monitors individual feature distributions for shifts that may not be captured by aggregate drift measures. A single feature shifting significantly while others remain stable can cause localised performance degradation that aggregate metrics miss. Feature drift often has identifiable root causes: an upstream data source changes its encoding or scale, a data pipeline introduces a transformation error, a categorical feature acquires a new value that the model was not trained on, or a seasonal pattern affects a specific feature. Identifying the specific drifted feature (rather than observing aggregate drift) accelerates root cause analysis and remediation. Feature-level drift monitoring generates per-feature drift scores on the computation schedule defined in the PMM plan. Features are ranked by drift magnitude, and the top-N drifted features are flagged for investigation. The monitoring should also track feature availability: a feature that becomes missing for a significant proportion of inputs (due to an upstream data source failure) may cause the model to use a default or imputed value, silently degrading performance. Key outputs

Per-feature drift scores on scheduled computation
Root cause acceleration through feature-level identification
Feature availability monitoring for missing data detection
Top-N drifted feature flagging for investigation