PyData London 2024

Sultan Al Awar

Sultan is an experienced data scientist with proven records of delivering business solutions and data products through the application of AI, predictive modeling, and advanced analytics.

He is rigorous about collaborating with technical and non-technical stakeholders to transform data into meaningful business insights, ultimately enabling commercial advantages.

Sultan is also a ML Subject Matter Expert (SME) at Amazon Web Services and technical author at Towards Data Science (TDS), skilled in machine learning, data engineering, natural language processing, deep learning, and statistics.

He has a master's degree in Business Analytics from University College London.

Beyond his professional pursuits, Sultan has interests in traveling, hiking, and Tag Rugby.

The speaker's profile picture

Sessions

06-14
15:30
90min
From Classic to Cutting Edge Text Classification: Generating Customers Insights with Topic Modelling and HuggingFace SetFit Method
Sultan Al Awar

Stop data skimming and dive deep into your customer voices! Are you working with a load of unstructured reviews and you would like to gain an understanding on what customers are commenting about? This hands-on tutorial equips you with powerful text analysis techniques to unlock hidden insights and inform data-driven decisions. Whether you're an experienced data scientist or analyst or just starting out, this session will guide you through two text classification approaches:

1) Classic Topic Modelling: Uncover recurring themes and trends within customer comments using generative probabilistic modelling approach like LDA (Latent Dirichlet Allocation).

2) SetFit Few-Shot Learning: Fine-tune a HuggingFace (HF) sentence transformers model with minimal data to automatically categorise and label reviews, offering deeper insights into key strengths as well as opportunities for improvement.

Upon completing the tutorial, you will be equipped with hands-on experience gained through the utilisation of a Google Colab notebook provided beforehand which enable you to effectively apply the tutorial's knowledge and achieve the following outcomes:
- Apply topic modelling with necessary text pre-processing and feature engineering techniques to discover underlying topics in a collection of text.
- Fine-tune a HF transformer on a small labeled dataset using set-fit few-shot learning method
- Evaluate the performance of the fine-tuned transformers model
- Use the fine-tuned model to generate classification themes on unlabelled data
- Develop a baseline evaluation mechanism to monitor the model in production

Please follow these steps to prepare for the tutorial:

1) Set up Google Colab.

2) Download the data and notebooks folders from this repository: https://rb.gy/ovru2m.

This will allow you to run the notebooks and follow along with the tutorial using Google Colab!

Ready to transform your understanding of multi text classification on customers data? Join me and unleash its power!

Minories