PyData London 2024

Mastering Data Flow: Empower Your Projects with Prefect's Pipeline Magic
06-14, 09:00–10:30 (Europe/London), Warwick

Embark on a transformative journey into the realm of data engineering with our 90-minute workshop dedicated to Prefect 2. In this hands-on session, participants will learn the ins and outs of building robust data pipelines using the latest features and enhancements of Prefect 2. From data ingestion to advanced analytics, attendees will gain hands-on experience and practical insights to elevate their data engineering skills.


Join us for an engaging workshop where we'll dive deep into the world of data engineering with Prefect 2.19. Throughout the session, participants will explore the following key topics:

  • Overview of Prefect and its core features
  • Understanding the Prefect ecosystem and its integration with popular data science tools
  • Setting up a Prefect environment: Installation, configuration, and project setup

Building Data Pipelines:
- Data ingestion: Fetching data from various sources including RSS feeds, APIs, and databases
- Data transformation and manipulation using Prefect tasks and flows
- Data storage and persistence: Storing processed data into local databases such as MongoDB or SQL
- Integrating machine learning models for advanced data processing and analysis within Prefect workflows

Advanced Techniques and Best Practices:
- Implementing error handling and retry strategies for fault tolerance and reliability
- Sending real-time alerts and notifications based on pipeline analysis using Prefect's notification features
- Exploring Prefect's advanced features such as parallel execution, versioning, and dependency management

Workshop Materials and Requirements:
Participants will have access to workshop materials, including code examples, instructions, and sample datasets, which will be provided in advance via GitHub. To ensure seamless participation, attendees are required to have Docker installed on their machines as we'll be running services locally through Docker or utilising free cloud services for certain components.

By the end of the workshop, attendees will have gained a comprehensive understanding of Prefect 2 and its capabilities, empowering them to design, execute, and optimise data pipelines efficiently in real-world scenarios.

We invite you to join us on this exciting journey of mastering data flows with Prefect!

To prepare for the workshop check out the github repo here: https://github.com/Cadarn/PyData-Prefect-Workshop/tree/main - we will update it nearer to the tutorial but there are some preparatory steps that will make things smoother on the day.
To participate fully in this workshop you will need the following:
1. A Python environment (I will be using Python 3.12 but everything should work with a recent version of Python)
2. A local install of Docker; we will use Docker to run some additional services locally
- Pull a copy of the PostgreSQL v13 image
- Pull a copy of python:3.11-slim image
3. Create an account on Upstash, https://upstash.com/ - you will only need the free tier
- You will need to create a free Kafka cluster
4. Create an account on MongoDB Atlas, https://www.mongodb.com/cloud/atlas/register - we will be using the free tier again


Prior Knowledge Expected

No previous knowledge expected

I spent 10 years as an astrophysics researcher analysing high-energy data from space telescopes in the search for new objects in the universe and a better understanding of what we already knew to be out there. In 2015 I transitioned to data science joining a smart-cities startup called HAL24K. Over the next 8 years, I built data science solutions that enabled city governments and suppliers to derive actionable intelligence from their data to make cities more efficient, better informed, and better use of resources. During that time I built and led a team of 10 data scientists and helped the company spin out four new companies. In 2022, I joined ComplyAdvantage as a Senior Data Scientist working to combat financial crime and fraud.

I have been an active member of the PyData community since 2015 and founded PyData Southampton in 2023. I am also a long-time supporter of DataKind UK in their mission to bring pro-bono data science support to charities and NGOs in the third sector.