v2.4.0 | Report Errata
docs development docs development

Training-Serving Consistency — Feature Stores & Single Computation Spec

Training-serving skew is a pernicious failure mode in which the features used during production inference are computed differently from those used during training. This often occurs because the training feature pipeline and the serving feature pipeline are maintained by different teams, use different code paths, or run on different infrastructure. A model trained on features computed with one normalisation scheme, then served features computed with a slightly different scheme, will produce silently degraded predictions.

Feature stores are the standard mitigation. Feast, Tecton, and Hopsworks each centralise feature definitions so that each feature has a single computation specification used for both training and serving. The store also versions feature values, making the exact features that trained a given model version retrievable for audit purposes. Feast is open-source and integrates with most cloud and on-premises data infrastructure; Tecton and Hopsworks are commercial offerings with additional real-time computation and monitoring capabilities.

The single computation specification is a compliance requirement as well as an engineering best practice. If the features entering the model at inference time differ from those it was trained on, the validation results documented in AISDP Module 5 are no longer representative of the system’s actual behaviour. Training-serving consistency is therefore a prerequisite for the validity of the conformity assessment.

Key outputs

  • Feature store deployment (Feast, Tecton, or Hopsworks)
  • Single computation specification per feature, version-controlled
  • Parity verification tests confirming training and serving feature equivalence
  • Module 3 and Module 5 documentation of the consistency mechanism

Feature Distribution Monitoring vs Training Baseline

Computed feature values in production must be monitored for distributional shift against the training baseline. Drift in individual features can cause localised performance degradation that aggregate metrics may miss entirely. A feature whose distribution shifts may push the model into a regime it was not trained for, with potentially subgroup-specific consequences.

Evidently AI provides automated feature distribution monitoring, computing drift metrics such as Population Stability Index (PSI), Kolmogorov-Smirnov tests, and Jensen-Shannon divergence on a per-feature basis. Thresholds are configured for each metric, and breaches trigger alerts that feed into the post-market monitoring framework. This monitoring should run continuously on production data.

The distinction between feature-level drift and aggregate output drift is significant. A system may continue to produce acceptable aggregate accuracy while individual features shift in ways that degrade performance for specific subgroups. Feature distribution monitoring catches these shifts early, before they manifest as fairness violations in the post-processing layer. The monitoring results are documented in AISDP Module 12 as part of the ongoing post-market surveillance record.

Key outputs

  • Per-feature drift monitoring configuration (metrics, thresholds, alerting)
  • Baseline feature distributions from the training dataset
  • Integration with post-market monitoring dashboards
  • Module 12 evidence of continuous feature drift surveillance

Feature Registry — Proxy Variable Flags & Justifications

Every feature used by the system must be defined in a central feature registry. The registry records each feature’s name, source, transformation logic, data type, expected distribution, business justification for inclusion, and an assessment of its proxy variable risk. New features cannot be added to the production system without registry approval.

Proxy variable risk is a particular concern for high-risk systems. A feature that correlates strongly with a protected characteristic (such as postcode correlating with ethnicity) may introduce indirect discrimination even when the protected characteristic itself is excluded from the model’s inputs. The feature engineering layer computes each feature’s correlation with protected characteristics and records the result in the registry.

Features exceeding a defined correlation threshold are reviewed by the Technical SME and the AI Governance Lead. Where such a feature is retained, the registry must include a documented justification explaining why the feature’s predictive value outweighs the proxy risk. This justification must be specific and evidence-based, not a generic assertion that the feature improves accuracy. The feature registry feeds into AISDP Module 4 (data governance) and Module 6 (risk management) and supports the post-training fairness evaluation described above.

Key outputs

  • Central feature registry with all required metadata fields
  • Proxy variable correlation analysis per feature
  • Documented justifications for retained high-correlation features
  • AI Governance Lead sign-off on proxy variable decisions

Intent Drift Control — Upstream Normalisation Change Detection

The feature engineering layer is vulnerable to a specific class of intent drift: upstream normalisation changes. When a source system alters the way it computes or formats a field, the raw data may still pass schema validation at the ingestion layer, but the features derived from that data may shift in meaning. A field that previously represented a percentage expressed as a decimal (0.00–1.00) reinterpreted as a whole number (0–100) would produce wildly different feature values without triggering a schema error.

Detection of these changes requires monitoring the distribution of computed features against the training baseline, combined with transformation versioning. Feature transformation logic is version-controlled alongside model code, and each model version is explicitly linked to the specific transformation version that produced its training features. If the feature values in production diverge from the expected distribution for the active transformation version, the system flags the anomaly.

When a normalisation change is detected, the affected feature pipeline is investigated. The resolution may involve updating the transformation logic to accommodate the upstream change, reverting to the previous source format through coordination with the data provider, or retraining the model on features computed under the new normalisation. The investigation and resolution are documented in the AISDP, contributing to Module 4 and Module 12 evidence.

Key outputs

  • Transformation versioning linked to model versions
  • Feature distribution anomaly detection at the engineering layer
  • Investigation and resolution procedures for normalisation changes
  • AISDP Module 4 and Module 12 evidence records
On This Page