PyData London 2024

The evolving conversation: How continuous testing keeps your LLM on track.
06-15, 11:15–11:55 (Europe/London), Warwick

LLM systems are powerful, but it can be challenging to ensure their reliable and effective operation in production. In this talk, we will explore continuous testing, one of the critical components for LLM safety. We will discuss how one can monitor unintended behaviors and low-quality responses, identify evolving user patterns, and help LLM adapt and improve over time.


Systems based on LLMs have immense business potential, from chatbots to automating complex agent workflows. However, their unpredictable nature and sensitivity to behavior shifts raise concerns about safety and reliability in production environments. While thorough pre-deployment testing is crucial, it can't catch everything.

There are several strategies to enhance LLM safety and reliability. One is adding safeguards within the LLM's response mechanism. Another is continuous testing and monitoring to identify and address ongoing issues.

This talk dives deep into complementary roles of monitoring and continuous testing:
- Monitoring: We'll explain how tracking quantitative measures like sentiment, toxicity, length, and trigger words in model inputs and outputs can give a dynamic overview of the system performance and help alert on issues.
- Continuous testing: We'll discuss how one can complement monitoring with tests for complex behaviors, such as compliance with specific policies and emergence of new topics.

We will show practical examples and implementation strategies using open-source tools. This talk is helpful for anyone working with LLMs, developers, engineers, and product managers seeking to use these systems responsibly and effectively.


Prior Knowledge Expected

No previous knowledge expected

Emeli Dral is a Co-founder and CTO at Evidently AI, a startup developing open-source tools to evaluate, test, and monitor the performance of machine learning models.

Earlier, she co-founded an industrial AI startup and served as the Chief Data Scientist at Yandex Data Factory. She led over 50 applied ML projects for various industries - from banking to manufacturing. Emeli is a data science lecturer at Harbour.Space University, and a co-author of the Machine Learning and Data Analysis curriculum at Coursera with over 100,000 students.