OFFRE LISEUSES
Une liseuse achetée = une housse offerte* jusqu'au 21 juin
Nouveauté
Evaluating Gen AI Applications - A Safety And Validation Engineering Guide
Par :Formats :
Disponible dans votre compte client Decitre ou Furet du Nord dès validation de votre commande. Le format ePub est :
- Compatible avec une lecture sur My Vivlio (smartphone, tablette, ordinateur)
- Compatible avec une lecture sur liseuses Vivlio
- Pour les liseuses autres que Vivlio, vous devez utiliser le logiciel Adobe Digital Edition. Non compatible avec la lecture sur les liseuses Kindle, Remarkable et Sony
, qui est-ce ?Notre partenaire de plateforme de lecture numérique où vous retrouverez l'ensemble de vos ebooks gratuitement
Pour en savoir plus sur nos ebooks, consultez notre aide en ligne ici
- FormatePub
- ISBN8235886612
- EAN9798235886612
- Date de parution27/05/2026
- Protection num.pas de protection
- Infos supplémentairesepub
- ÉditeurIoakim Ioakim
Résumé
Evaluating Gen AI Applications is a practical Safety and Validation Engineering guide for teams that need to make Gen AI systems measurable, observable, secure, and production-ready. It shows how to move beyond ad-hoc prompt testing and build evaluation systems that produce evidence: test results, traces, rubrics, release gates, human-review records, red-team findings, and audit-ready governance artefacts.
The book focuses on evaluating the full application, not just the model response. A RAG system must be checked for retrieval quality, source freshness, citation validity, grounding, and hallucination risk. An agentic system must be evaluated through its plan, tool calls, permissions, handoffs, cost, and final outcome. A multimodal system must be validated across image, audio, video, OCR, JSON extraction, and cross-modal reasoning.
A regulated enterprise system must produce evidence that risk, quality, and governance controls are actually working. Through the running Meridian Insurance scenario, the book shows how production Gen AI failures actually happen: stale retrieval, unsupported claims, prompt drift, weak observability, unsafe tool use, inconsistent human review, adversarial manipulation, and missing audit evidence. Each chapter turns those risks into practical engineering controls.
Inside the book, you will learn how to:Build a Safety and Validation Engineering approach for Gen AI applications. Design evaluation pipelines that produce decisions, explanations, and reusable evidence records. Evaluate RAG systems for retrieval relevance, source freshness, citation validity, faithfulness, and hallucination risk. Test agentic workflows using tool-call traces, permission boundaries, step order, escalation rules, and cost per task.
Validate multimodal AI systems involving images, video, audio, OCR, structured extraction, and cross-modal outputs. Create evaluation datasets from incidents, production traces, adversarial examples, edge cases, expert review, and holdout sets. Use observability to monitor drift, regressions, latency, cost, model changes, and production behaviour. Build structured human-in-the-loop evaluation using rubrics, calibration, adjudication, and reviewer agreement.
Apply red teaming to prompt injection, jailbreaks, prompt leakage, data exfiltration, and tool misuse. Turn evaluation results into governance evidence for audits, executive oversight, compliance, and release decisions. This book is written for AI platform teams, software engineers, ML engineers, architects, product owners, risk teams, governance leaders, and technology executives responsible for shipping Gen AI systems that must be trusted in production.
The future of Gen AI will not be won by teams that merely generate impressive outputs. It will be won by teams that can prove their systems are accurate, grounded, observable, secure, cost-aware, and operating within clearly defined boundaries. Evaluating Gen AI Applications gives you the practical Safety and Validation Engineering framework to build that proof.
The book focuses on evaluating the full application, not just the model response. A RAG system must be checked for retrieval quality, source freshness, citation validity, grounding, and hallucination risk. An agentic system must be evaluated through its plan, tool calls, permissions, handoffs, cost, and final outcome. A multimodal system must be validated across image, audio, video, OCR, JSON extraction, and cross-modal reasoning.
A regulated enterprise system must produce evidence that risk, quality, and governance controls are actually working. Through the running Meridian Insurance scenario, the book shows how production Gen AI failures actually happen: stale retrieval, unsupported claims, prompt drift, weak observability, unsafe tool use, inconsistent human review, adversarial manipulation, and missing audit evidence. Each chapter turns those risks into practical engineering controls.
Inside the book, you will learn how to:Build a Safety and Validation Engineering approach for Gen AI applications. Design evaluation pipelines that produce decisions, explanations, and reusable evidence records. Evaluate RAG systems for retrieval relevance, source freshness, citation validity, faithfulness, and hallucination risk. Test agentic workflows using tool-call traces, permission boundaries, step order, escalation rules, and cost per task.
Validate multimodal AI systems involving images, video, audio, OCR, structured extraction, and cross-modal outputs. Create evaluation datasets from incidents, production traces, adversarial examples, edge cases, expert review, and holdout sets. Use observability to monitor drift, regressions, latency, cost, model changes, and production behaviour. Build structured human-in-the-loop evaluation using rubrics, calibration, adjudication, and reviewer agreement.
Apply red teaming to prompt injection, jailbreaks, prompt leakage, data exfiltration, and tool misuse. Turn evaluation results into governance evidence for audits, executive oversight, compliance, and release decisions. This book is written for AI platform teams, software engineers, ML engineers, architects, product owners, risk teams, governance leaders, and technology executives responsible for shipping Gen AI systems that must be trusted in production.
The future of Gen AI will not be won by teams that merely generate impressive outputs. It will be won by teams that can prove their systems are accurate, grounded, observable, secure, cost-aware, and operating within clearly defined boundaries. Evaluating Gen AI Applications gives you the practical Safety and Validation Engineering framework to build that proof.




