OFFRE LISEUSES
Une liseuse achetée = une housse offerte* jusqu'au 21 juin
- Accueil /
- Shrikant Wagh
Shrikant Wagh

Dernière sortie
Evaluating RAG and Agentic AI Systems - Failure Taxonomy & Contracts
Your test suite is green. Your CI pipeline passed. And your agentic AI system just leaked customer data in production. This is the crisis no one warned you about - unfolding right now across every industry deploying RAG and agentic AI systems without the tools to truly test them. A fintech agent leaks customer records through a manipulated tool description. An enterprise RAG pipeline silently cross-contaminates tenant data without raising a single exception.
A model update quietly shifts agent behavior in ways no test ever caught. These aren't software bugs. They're a new category of failure - and conventional testing was never built to catch them. Evaluating RAG and Agentic AI Systems - Failure Taxonomy & Contracts is the definitive answer to that gap. Written by Shrikant Wagh - a veteran of over three decades in software quality, co-founder of a patented testing tools company, and IIT Madras alumnus - this framework gives engineering teams the language, architecture, and working code to test agentic AI with mission-critical rigor.
Not through informal spot-checking. Through deterministic, CI-gateable, production-grade contracts. At the heart of the book is the Eleven Contract Taxonomy: behavioral invariants covering every critical failure surface - Knowledge, Retrieval, Generation, Agent and Tool, Skill, Protocol, Security, Operational, Multi-Agent, Multi-Modal, and Fine-Tuning. These contracts give you testable, automatable assertions for catching failure before it reaches your users.
When your system is non-deterministic, contracts need muscle. The MITM Testing Pattern delivers it - using fake retrievers, fake LLMs, in-process MCP clients, and in-memory tracers to inject precise control at every agent boundary. Write deterministic tests for probabilistic systems, isolate every layer, and assert correctness - without expensive live model calls. On top of this sits a complete production evaluation stack: golden datasets, LLM-as-Judge pipelines, Recall@K, MRR, and NDCG@K metrics, regression quality gates, drift detection, and a full GitHub Actions CI pipeline - each chapter backed by real Python code and exercises.
The final chapters address the organization: a five-level maturity model, sprint-by-sprint roadmap, and Investment Decision Framework for building a sustainable testing program at scale. This is not a book about theory. It was born from real failures - MCP rug pull exploits, retrieval authorization bypass, silent hallucination, citation fabrication, multi-agent cascade failure. Each has a named contract and a test that catches it.
Not "did it pass the tests?" - but "do we have the right tests?"The systems are in production. The failures are real. Now there is a framework built to catch them. Build the contracts. Gate the pipeline. Ship with confidence.
A model update quietly shifts agent behavior in ways no test ever caught. These aren't software bugs. They're a new category of failure - and conventional testing was never built to catch them. Evaluating RAG and Agentic AI Systems - Failure Taxonomy & Contracts is the definitive answer to that gap. Written by Shrikant Wagh - a veteran of over three decades in software quality, co-founder of a patented testing tools company, and IIT Madras alumnus - this framework gives engineering teams the language, architecture, and working code to test agentic AI with mission-critical rigor.
Not through informal spot-checking. Through deterministic, CI-gateable, production-grade contracts. At the heart of the book is the Eleven Contract Taxonomy: behavioral invariants covering every critical failure surface - Knowledge, Retrieval, Generation, Agent and Tool, Skill, Protocol, Security, Operational, Multi-Agent, Multi-Modal, and Fine-Tuning. These contracts give you testable, automatable assertions for catching failure before it reaches your users.
When your system is non-deterministic, contracts need muscle. The MITM Testing Pattern delivers it - using fake retrievers, fake LLMs, in-process MCP clients, and in-memory tracers to inject precise control at every agent boundary. Write deterministic tests for probabilistic systems, isolate every layer, and assert correctness - without expensive live model calls. On top of this sits a complete production evaluation stack: golden datasets, LLM-as-Judge pipelines, Recall@K, MRR, and NDCG@K metrics, regression quality gates, drift detection, and a full GitHub Actions CI pipeline - each chapter backed by real Python code and exercises.
The final chapters address the organization: a five-level maturity model, sprint-by-sprint roadmap, and Investment Decision Framework for building a sustainable testing program at scale. This is not a book about theory. It was born from real failures - MCP rug pull exploits, retrieval authorization bypass, silent hallucination, citation fabrication, multi-agent cascade failure. Each has a named contract and a test that catches it.
Not "did it pass the tests?" - but "do we have the right tests?"The systems are in production. The failures are real. Now there is a framework built to catch them. Build the contracts. Gate the pipeline. Ship with confidence.
Your test suite is green. Your CI pipeline passed. And your agentic AI system just leaked customer data in production. This is the crisis no one warned you about - unfolding right now across every industry deploying RAG and agentic AI systems without the tools to truly test them. A fintech agent leaks customer records through a manipulated tool description. An enterprise RAG pipeline silently cross-contaminates tenant data without raising a single exception.
A model update quietly shifts agent behavior in ways no test ever caught. These aren't software bugs. They're a new category of failure - and conventional testing was never built to catch them. Evaluating RAG and Agentic AI Systems - Failure Taxonomy & Contracts is the definitive answer to that gap. Written by Shrikant Wagh - a veteran of over three decades in software quality, co-founder of a patented testing tools company, and IIT Madras alumnus - this framework gives engineering teams the language, architecture, and working code to test agentic AI with mission-critical rigor.
Not through informal spot-checking. Through deterministic, CI-gateable, production-grade contracts. At the heart of the book is the Eleven Contract Taxonomy: behavioral invariants covering every critical failure surface - Knowledge, Retrieval, Generation, Agent and Tool, Skill, Protocol, Security, Operational, Multi-Agent, Multi-Modal, and Fine-Tuning. These contracts give you testable, automatable assertions for catching failure before it reaches your users.
When your system is non-deterministic, contracts need muscle. The MITM Testing Pattern delivers it - using fake retrievers, fake LLMs, in-process MCP clients, and in-memory tracers to inject precise control at every agent boundary. Write deterministic tests for probabilistic systems, isolate every layer, and assert correctness - without expensive live model calls. On top of this sits a complete production evaluation stack: golden datasets, LLM-as-Judge pipelines, Recall@K, MRR, and NDCG@K metrics, regression quality gates, drift detection, and a full GitHub Actions CI pipeline - each chapter backed by real Python code and exercises.
The final chapters address the organization: a five-level maturity model, sprint-by-sprint roadmap, and Investment Decision Framework for building a sustainable testing program at scale. This is not a book about theory. It was born from real failures - MCP rug pull exploits, retrieval authorization bypass, silent hallucination, citation fabrication, multi-agent cascade failure. Each has a named contract and a test that catches it.
Not "did it pass the tests?" - but "do we have the right tests?"The systems are in production. The failures are real. Now there is a framework built to catch them. Build the contracts. Gate the pipeline. Ship with confidence.
A model update quietly shifts agent behavior in ways no test ever caught. These aren't software bugs. They're a new category of failure - and conventional testing was never built to catch them. Evaluating RAG and Agentic AI Systems - Failure Taxonomy & Contracts is the definitive answer to that gap. Written by Shrikant Wagh - a veteran of over three decades in software quality, co-founder of a patented testing tools company, and IIT Madras alumnus - this framework gives engineering teams the language, architecture, and working code to test agentic AI with mission-critical rigor.
Not through informal spot-checking. Through deterministic, CI-gateable, production-grade contracts. At the heart of the book is the Eleven Contract Taxonomy: behavioral invariants covering every critical failure surface - Knowledge, Retrieval, Generation, Agent and Tool, Skill, Protocol, Security, Operational, Multi-Agent, Multi-Modal, and Fine-Tuning. These contracts give you testable, automatable assertions for catching failure before it reaches your users.
When your system is non-deterministic, contracts need muscle. The MITM Testing Pattern delivers it - using fake retrievers, fake LLMs, in-process MCP clients, and in-memory tracers to inject precise control at every agent boundary. Write deterministic tests for probabilistic systems, isolate every layer, and assert correctness - without expensive live model calls. On top of this sits a complete production evaluation stack: golden datasets, LLM-as-Judge pipelines, Recall@K, MRR, and NDCG@K metrics, regression quality gates, drift detection, and a full GitHub Actions CI pipeline - each chapter backed by real Python code and exercises.
The final chapters address the organization: a five-level maturity model, sprint-by-sprint roadmap, and Investment Decision Framework for building a sustainable testing program at scale. This is not a book about theory. It was born from real failures - MCP rug pull exploits, retrieval authorization bypass, silent hallucination, citation fabrication, multi-agent cascade failure. Each has a named contract and a test that catches it.
Not "did it pass the tests?" - but "do we have the right tests?"The systems are in production. The failures are real. Now there is a framework built to catch them. Build the contracts. Gate the pipeline. Ship with confidence.
Les livres de Shrikant Wagh

12,99 €
