Inference Timeout Enforcement

v2.4.0 | Report Errata

docs security docs security

Inference timeout enforcement sets a maximum execution time per request, terminating any request that exceeds it. This prevents a single adversarially crafted input, designed to trigger worst-case computational complexity, from consuming resources indefinitely. The timeout should be set above the p99 latency for legitimate requests and below the threshold where a single request materially impacts other users.

The reference nginx configuration uses proxy_read_timeout 30s for the inference endpoint. The appropriate timeout value depends on the model’s architecture and typical inference latency. For a model with p99 latency of 500ms, a timeout of 5 seconds provides generous headroom; for a large language model with p99 latency of 10 seconds, the timeout must be higher.

Timed-out requests receive a structured error response (HTTP 504 or equivalent) and the timeout event is logged with the request metadata. A high rate of timeouts may indicate an adversarial denial-of-service attempt or a legitimate performance degradation; the monitoring layer should alert on timeout rate anomalies. The timeout configuration is documented in Module 9 and tested as part of the denial-of-service testing described above.

Key outputs

Inference timeout enforcement above p99, below impact threshold
Structured error responses for timed-out requests
Timeout rate monitoring and alerting
Module 9 AISDP documentation