PyData London 2024

What a serverless database means for users
06-15, 15:45–16:25 (Europe/London), Minories

This talk aims to compare the performance of ArcticDB to the most popular Dataframe file formats for raw reads and writes, and then demonstrate the simplicity with which more complex data modification and access patterns can be achieved using ArcticDB without sacrificing performance.


With the explosion in popularity of the Python data science ecosystem, the long-term persistence of Dataframes, the data structure at the heart of many data science applications, has become more important than ever. A variety of traditional, server-side databases exist to address this problem, which come with all of the traditional complexity of maintaining the infrastructure and redundancy necessary to ensure the availability and performance of data retrieval from such databases.

At the other end of the spectrum, there are a variety of raw file formats for storing Dataframes, such as HDF5 and Parquet. Storing your data in files like this, whether on local disk or over a protocol such as S3, pushes the availability and at least some of the performance concerns down to the storage layer, which modern local and cloud storages are well equipped to handle.

Somewhere in the middle sits ArcticDB, the client-side Dataframe database developed at Man Group, and released publicly last year. ArcticDB shares the maintenance simplicity of a file-format based storage approach, while adding a whole host of capabilities on top. This talk aims to compare the performance of ArcticDB to the most popular Dataframe file formats for raw reads and writes, and then demonstrate the simplicity with which more complex data modification and access patterns can be achieved using ArcticDB without sacrificing performance. Common operations such as appending one Dataframe to another, or replacing a subset of existing rows, which require users to intimately understand how their data is partitioned with file-based solutions, become trivial once this detail is abstracted away.


Prior Knowledge Expected

No previous knowledge expected

Alex Owens has been working in a combination of Python and C++ for the past 7 years. For the last 2 and a half of those, he has been a senior engineer and more recently tech lead on the new open-source Dataframe database, ArcticDB, which is backed by long-time Python enthusiasts Man Group and Bloomberg