Technical blog from our Data Science Team

Read the latest technical blogs from our data science team.

Can we completely automate forecasting for industry?

Mobile phone with visualisation of data and inmydata logo
Author: Guest Author(s)

Guest blog by Nick Finch, CTO at DataPA Ltd

We’re delighted to be able to bring you a technical blog from another of our speakers who would have been speaking at DataTech20 back in March.

 

To do any sort of business planning, you need forecasting. Our reason for existing is to help our customers drive success from their data, so, around two years ago we began to research adding automated forecasting to our cloud analytics platform.

The challenges of forecasting

This did however present a huge challenge. Take the typical lifecycle of a forecasting project. It’s an iterative process of refining a model, generating a forecast and evaluating the results to understand how to further refine the model. This cycle is repeated until you are happy with the results.

Circular diagram showing automated and analyst tasks

 

Evaluating the results and refining the model are usually manual tasks. They need an analyst with a good understanding of the model and the business or process that generates the data. That’s a pretty specific skillset.

That’s fine if you are creating one forecast for one organisation. We, however, want to scale our business rapidly, taking on lots of different customers. It’s not practical for us to recruit large numbers of highly skilled analysts in a short space of time. So, we need to automate as much of the process as we can.

The differences in the solutions that are available

We looked at a number of solutions. An obvious first choice, as we already use AWS, was the Amazon AI platform SageMaker, which includes DeepAR. DeepAR is a ML forecasting model based on a recurrent neural network with long-short term memory cells allowing it to model trends and seasonality. Importantly for us, the platform offers automatic model tuning. So, you run your data through the model numerous times, and it automatically selects hyperparameters to improve the forecast.

We also looked at Facebook Prophet. Unlike DeepAR, it’s a statistical model, similar to the generalised additive model (GAM). It copes particularly well with trends, seasonality and holidays, each being the additive components for the model. It’s also open source, and available in Python or R, so was easy for us to integrate into our platform.

Finally, we also considered Amazon Forecast. As a fully managed service on AWS, you upload your data and it uses automated machine learning (AutoML) to choose and refine an appropriate model. It includes a growing number of algorithms, including ARIMA, DeepAR+ (an enhancement of DeepAR), Prophet, ETS and NPTS.

Using a data set that is noisy, with important random values

We began the project using real world data from one of our existing customers, a jewellery retailer. We chose them primarily because they were enthusiastic to take part, but also their data was particularly challenging to forecast. The nature of their sales means the data is noisy, outliers are often high value sales that occur randomly, yet account for a large proportion of the value so cannot be discounted. The sales are also highly seasonal, with the 5 weeks leading up to Christmas accounting for as much as 50% of the sales.

Area chart showing one year sales by day

 

We trained the models on three years data, from 2015-2017, then tested the resulting forecasts against the real-world results for 2018. For comparison we had the manual forecasts the company had used at the time, but unfortunately this was only detailed enough for the fairly rudimentary measure of the total difference over 12 months.

Bar graph showing forecast v actual over one year

 

On total error over 12 months all the methods we tested significantly outperformed the manual forecast. The best results were generated by Prophet, with a total error of under 300k (out of annual sales over 100 million). This compared to over 15 million error with the manual forecast. However, using more meaningful measures (RMSE and MAE) DeepAR gave us the most accurate results.

The level of automation and accuracy of our forecasting increases as we test different algorithms

In the 9 months since we conducted this research, we’ve integrated automated forecasting into our platform. We initially chose to use Prophet, even though DeepAR was arguably more accurate. Our reasoning was that Prophet was significantly easier to configure, and less resource hungry to train. Given the relatively small increase in accuracy DeepAR gave us, the ease at which we could scale the solution with Prophet won the day. In more recent months, with the added complication of forecasting during the Covid-19 crisis, we’ve begun using a combination of Prophet and LightGMB, with some success.

It has been fascinating to see how quickly this field has developed in the last year. The proliferation of available models, if anything, has gathered pace. We’ve continued to test different algorithms and techniques, and the level of automation and accuracy of our forecasting increases almost daily. We are already much closer to the goal of completely automating forecasting for industry than we thought possible 12 months ago. Who knows where we’ll be in another 12 months!

Your data contains the key to your business’ success. Find out more about inmydata and get help unlocking success from your data.

Share this technical blog post:

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on email
Email

Comments

We’d love to hear your views on this article.

Comments will appear after they have been approved by the Data Science Team.

Leave a comment

Comments will appear after they have been  approved by the Data Science Team.

Search technical blog posts by author:

More Technical Blog Posts: