Post-Processing Tests (Thresholds, Calibration, Business Rules, Edge Cases)

v2.4.0 | Report Errata

docs development docs development

Threshold application, score calibration, business rule application, and output formatting each require unit tests confirming correctness, edge case handling, and consistency with the documented behaviour. The post-processing layer transforms the model’s raw output into the decision that affects individuals; errors in post-processing can negate even a well-performing model.

Threshold tests should verify the threshold value itself (confirming it matches the version-controlled configuration), the behaviour at exactly the threshold (boundary case), and the logging of the decision. If a rule rejects applicants below a threshold of 0.65, the test should verify what happens at 0.65, at 0.6499, and at 0.6501. Calibration tests should verify that the calibrated scores produce the expected probability distribution on a reference dataset.

Business rule tests should verify that each rule produces the documented effect, that rules are applied in the correct sequence, and that the override logging captures the required information when a rule modifies the model’s raw output. If a fairness calibration adjusts thresholds per subgroup, the test should verify that the adjusted thresholds produce the expected selection rate ratios on a reference dataset. Edge cases, such as a score that falls exactly on a subgroup-specific threshold, deserve dedicated test cases.

Key outputs

Threshold boundary tests at and around each configured threshold
Calibration validation on a reference dataset
Business rule sequence and effect verification tests
Fairness calibration tests verifying per-subgroup selection rate ratios