PyData London 2024

Sofie Van Landeghem

I am a machine learning and NLP engineer who firmly believes in the power of data to transform decision making in industry. I have a Master in Computer Science (software engineering) and a PhD in Sciences (Bioinformatics), and more than 16 years of experience in Natural Language Processing and Machine Learning, including in the pharmaceutical industry and the food industry. Since 2019, I have been a core maintainer of spaCy, a popular open-source NLP library created by Explosion. Additionally, I work as a consultant through my company OxyKodit. Throughout my code and projects, I am passionate about quality assurance and testing, introducing proper levels of abstraction, and ensuring code robustness and modularity.

The speaker's profile picture

Sessions

06-16
11:00
40min
How to uncover and avoid structural biases in evaluating your Machine Learning/NLP projects
Sofie Van Landeghem

This talk will highlight common pitfalls that occur when evaluating Machine Learning (ML) and Natural Language Processing (NLP) approaches. It will provide comprehensive advice on how to set up a solid evaluation procedure in general, and dive into a few specific use-cases to demonstrate artificial bias that unknowingly can creep in. It will tell the story hidden behind the performance numbers, and get the audience into the right critical mindset to run unbiased evaluations and data analyses for their own projects.

With AI technology booming, the entry barrier to using ML/NLP in applications is continuously decreasing thanks to the release of novel open-source libraries, pretrained LLM/transformer models, and convenient API access for all. It has never been easier to integrate ML or NLP models into a commercial product or research application. As a consequence, the need for meaningful evaluation of these techniques to specific use-cases and domains has only become more pressing, both for developers as well as for users of these AI tools.

Salisbury