Error Rate Tracking by Type

v2.4.0 | Report Errata

docs operations docs operations

Error Rate Tracking by Type Errors are classified by type, each with different compliance implications. Input validation failures: the system correctly rejected malformed input. Inference failures: the model failed to produce an output. Post-processing failures: the output was produced but could not be delivered. Timeout failures: the inference took too long and was abandoned. A rising input validation failure rate may indicate a change in the data source or upstream pipeline. A rising inference failure rate may indicate a model or infrastructure problem. Each category has its own alert threshold defined in the PMM plan. The error taxonomy is documented, and the Technical SME ensures that every error is classified into exactly one category, with no errors falling through to an uncategorised bucket. Error rate tracking should also distinguish between errors that are visible to deployers (failed API responses, error messages) and errors that are silently handled (fallback values, default responses). Silent errors are more dangerous because the deployer has no signal that the system’s output may be unreliable. Key outputs

Four-category error taxonomy (validation, inference, post-processing, timeout)
Per-category alert thresholds in PMM plan
Silent error detection (fallback values, defaults)
Error taxonomy documentation