Workshop: Validating Generative AI Models

October 28, 5:30–7:30 PM • Downtown Toronto
GenAI is racing into production—your validation program has to keep pace. Join us after work in downtown Toronto for a practical, hands-on session focused on how to validate GenAI systems you can defend to stakeholders, auditors, and regulators. Looking for E-23 compliance strategies?
Who should attend
- Model Risk & Validation, Compliance, Internal Audit
- Data Science & ML Engineering, ML Ops
- Product Owners and Risk Champions for GenAI initiatives
What you’ll learn (and practice)
- Scope the system, not just the model: Map prompts, RAG pipelines, tools, and guardrails; define boundaries and dependencies.
- Design defensible tests: Build acceptance criteria for hallucination, harmful content, bias, robustness, privacy leakage, and IP risk.
- Evaluation methods that work: Pairwise judging, rubric-based LLM-as-judge, golden sets, and human review—and when to use each.
- RAG & retrieval validation: Groundedness, citation quality, retrieval recall/precision, and corpus hygiene checks.
- Prompt & config change control: Versioning, test harnesses, regression suites, and rollback criteria.
- Monitoring in production: Metrics for degradation, drift, jailbreaks, and safety incidents; alert→ticket→remediation loops.
- Evidence & documentation: Model/system cards, validation reports, and audit-ready artifacts that align with policy and controls.
Format (interactive + practical)
Short lightning talks followed by guided mini-exercises with realistic case studies and editable templates. You’ll leave with assets you can adapt immediately.
Agenda (2 hours)
- 5:30 PM — Arrival & networking (light bites & drinks)
- 5:45 PM — Welcome & objectives
- 5:55 PM — Scoping GenAI Systems (components, risks, and control points)
- 6:10 PM — Mini-Exercise: Define acceptance criteria for a GenAI use case
- 6:30 PM — Evaluation Techniques (LLM-as-judge, golden sets, human review)
- 6:45 PM — Mini-Exercise: Build a small evaluation plan & test harness outline
- 7:05 PM — Monitoring & Change Control (from pre-prod tests to on-call playbooks)
- 7:20 PM — Debrief & immediate next steps
- 7:30 PM — Close
What you’ll take away
- A GenAI Validation Plan template (scope, risks, test strategy, acceptance criteria)
- Evaluation workbook: examples for groundedness, harmful content, bias, and robustness
- A lightweight test harness checklist for prompts, RAG, and guardrails
- A 60–90 day playbook to operationalize validation and monitoring
Logistics
- Date & Time: [Insert Date], 5:30–7:30 PM
- Location: Downtown Toronto (venue details provided upon registration)
- Capacity: Limited to keep the session highly interactive
- Bring: Laptop recommended (we’ll share templates); optional: a GenAI use case from your org
Why attend
- Actionable, not academic: Concrete artifacts, controls, and workflows.
- Cross-functional by design: Risk, data, and product perspectives in one room.
- Audit-ready outcomes: Evidence you can stand behind.