Data Collection Layer The data collection layer captures inference inputs, outputs, and metadata from the production system. It operates asynchronously to avoid adding latency to the inference path. Common patterns include streaming inference events to a message queue (Kafka, AWS Kinesis, Google Pub/Sub) from which the monitoring pipeline consumes. The collection layer must handle the production system’s peak throughput without data loss. Dropped monitoring events create blind spots in the compliance record. The layer should also be independent of the AI system it monitors; a monitoring system that fails when the AI system fails provides no information at the moment it is most needed. For systems processing personal data, the data collection layer must comply with the same data governance requirements (Module 4) as the inference system itself. Monitoring data that includes personal data requires the same retention policies, access controls, and processing justification. Key outputs
- Asynchronous streaming to message queue (Kafka, Kinesis, Pub/Sub)
- Peak throughput handling without data loss
- Independence from the monitored system
- Data governance compliance for monitoring data