News & blog

Read the latest news and blogs from The Data Lab and Scotland’s data science community.

A Look at Data on Fruit and Veg Prices

Author: Roman Popat, Data Scientist, The Data Lab.

Key Points

  • Ever wondered when is the best time in the season to buy certain products?
  • We examined a dataset from DEFRA on wholesale prices of fruit and veg.
  • We applied a technique called ‘seasonal trend decomposition’.
  • When buying Strawberries (and anything in the table below), pay attention to the time of year.

Is the price of certain vegetables and fruits sensitive to the time of year? You might expect so, given growing seasons and fluctuations in supply. How would we decide in a systematic way which products to look out for when buying ‘in season’ veg? Here is something that might help; the Department for the Environment, Food and Rural Affairs (DEFRA) published a dataset on the wholesale prices of various homegrown fruit and vegetables. These prices are sampled from a selection of markets around the UK. Here is an example of the price of a single product; carrots, over time. In this image the red and white stripes represent the changing years.

There is a fair bit of variation in the price of carrots from 2004 until 2015, with some big peaks and troughs. Is there a good and bad time of year to buy this product (purely with regards to price)? One way to measure this is via a method called ‘STL’ or seasonal-trend decomposition via loess1. If you want the full details, see the paper in the footnotes, it is a great read. This method decomposes the time series into 3 components; seasonal variation, trend and the remainder. This is done in such a way that when the three components are summed, this reproduces the original data.

Now we can clearly see that there is some seasonal pattern, which itself varies from year to year. It seems that carrots are cheapest at the beginning and end of a year (red and white stripes) and the price peaks towards the middle of the year in the summer. This pattern was especially pronounced in the summer of 2012 when carrots reached their peak price of £0.66 per kg (remember this is the wholesale price). This was when the seasonal variation was also at its most pronounced. Secondly, there is a fairly pronounced positive trend in the price from 2004 on the left to 2015 on the right. This takes a little bit of a dip in 2014 and then begins to recover again in 2015. Finally, the 4th panel ‘remainder’ shows us what is left once the seasonal variation and trend are accounted for. We could use this to identify any erratic price variation. If the seasonal variation and trend are very predictable then this variation in this final panel should be small.

A significant benefit of this method is that we can now measure the relative size of the seasonal and trend components of these prices. This can be done for the whole DEFRA library of fruit, vegatebles and flowers. This would allow us to see and compare at a glance the relative extent of seasonal and trend variation. This then tells us for which products it is especially important to pay attention to the time of year.

3Runner Beans0.50510.2903

Now we are getting closer to the answer; in the top left of the plot are products that have a high seasonal variation, so we can save a lot of money by only buying at particular times of year. I have ranked these in order of the proportionally highest seasonal variation. What is also revealed is that a large number of products have a relatively high trend over the years compared to the seasonal variation (bottom right of the plot). The way the calculations are done, it would be impossible to have both2. There is a bit of a disclaimer here; I have not made any corrections for inflation and so we would expect prices to be going up over an 11 year period. I would take these interpretations lightly, this excercise was just to imagine; if you needed a tool to decide when to buy certain produce, what would it look like?

The code for the data cleaning, calculations and plots are all on github. And feel free to email us for more info.

See you next time.

1. I used the R function: stl() but a full description of the method can be found in the following reference. R. B. Cleveland, W. S. Cleveland, J.E. McRae, and I. Terpenning (1990) STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, 6, 3“73

2. The numbers are proportional variance for example seasonal variance is calculated as variance in the seasonal component divided by the sum of the variances in all three components.

Share this story:

Share on facebook
Share on twitter
Share on linkedin
Share on email