PyData London 2024

Backtesting and error metrics for modern time series forecasting
06-16, 16:30–17:10 (Europe/London), Minories

Evaluating time series forecasting models for modern use cases has become incredibly challenging. This is because modern forecasting problems often involve a large number of related time series, often hierarchical, with a diverse set of characteristics such as intermittency, non-normality, and non-stationarity. In this talk we'll discuss all the tips, tricks, and pitfalls in creating model evaluation strategies and error metrics to overcome these challenges.


Forecasting is the process of making predictions about the future based on past data. In the most traditional scenario, we have a time series and want to predict its future values. Knowing how to correctly evaluate a forecasting model is critical for: 1) selecting models, features, and hyperparameters, 2) monitoring of models in production, 3) assessing the feasibility of a forecasting problem. In many modern use cases, from forecasting retail sales to energy demand, data is being gathered at higher frequencies and at more granular levels. The resulting datasets often have the following properties which make model evaluation more challenging:

  • a large number of related time series;
  • time series are arranged hierarchically, for example, by geographical region or by product taxonomy;
  • some time series are count-like with many zero values;
  • the scale of time series can vary by orders of magnitude;
  • different time series have different combinations of seasonality, trend, and outliers.

Deep learning models and traditional regression models, like gradient boosted trees, have become increasingly popular for addressing these types of problems. Whilst it has become easier to use more traditional machine learning models for forecasting, it is still a challenge to select the right error metrics and model evaluation strategy when working with time series data.

Traditional cross-validation techniques cannot be used with time series data, instead we use backtesting where the time ordering of the data is preserved when creating train and test folds. There are many different ways to perform backtesting. Do we keep the size of the training set fixed or should it expand with time? Do we refit the model for all of our folds? How do we summarise the errors over all of our folds? We will discuss how to select the right backtesting strategy depending on the use case.

The need to compare and aggregate errors on time series of different scales has resulted in a confusingly vast number of proposed error metrics. Each metric has its own pros and cons, there is no single metric that is suitable for all use cases. The mere presence of a zero value can break a commonly requested metric from business stakeholders - the percentage error. We will discuss a general framework to think about error metrics and how to pick an appropriate one for a given dataset.

Which error metrics should we select when we have many time series with different characteristics? How can we speed up backtesting? What if 90% of my time series are zero? My forecasts need to be made at several different levels of aggregation - which error metric should I use? In this talk we will discuss all of these topics and more.

Slides: https://github.com/KishManani/PyDataLondon2024


Prior Knowledge Expected

No previous knowledge expected

Kishan is a machine learning and data science lead, course instructor, and open source software contributor. He has contributed to well known Python packages including statsmodels, Feature-engine, and sktime. He has 10+ years of experience in applying machine learning and statistics in finance, e-commerce, and healthcare research. He leads data science teams to deliver data and machine learning products end-to-end.

Kishan attained a PhD in Physics from Imperial College London in applied large scale time-series analysis and modelling of cardiac arrhythmias; during this time he taught and supervised undergraduates and master's students.

LinkedIn: https://www.linkedin.com/in/kishanmanani/

Medium: https://medium.com/@kish.manani

Twitter: https://twitter.com/KishManani

GitHub: https://github.com/KishManani