LLM / Generative AI Tooling Summary Five monitoring domains for LLM/generative AI systems: hallucination detection (NLI entailment scoring, citation verification, consistency checking; RAGAS/Trulens for RAG systems), safety monitoring (Lakera Guard for prompt injection/PII/toxicity, NeMo Guardrails for topic boundaries, Llama Guard for safety classification), prompt/response distribution monitoring (BERTopic embedding clustering, output characteristic tracking), human evaluation programme (Argilla/Label Studio/Prodigy; 100–500 outputs weekly with structured rubric), and annotation quality (inter-annotator agreement measurement). See for detailed treatment. Key outputs
- Five LLM monitoring domains
- RAGAS, Trulens, Lakera Guard, NeMo Guardrails tooling
- Human evaluation programme with structured rubric