Performance Zone is brought to you in partnership with:

Rob Hyndman is a Professor of Statistics at Monash University, Australia. He is Editor-in-Chief of the International Journal of Forecasting and author of over 100 research papers in statistical science. He also maintains an active consulting practice, assisting hundreds of companies and organizations. His recent consulting work has involved forecasting electricity demand, tourism demand, the Australian government health budget and case volume at a US call centre. Rob J is a DZone MVB and is not an employee of DZone and has posted 43 posts at DZone. You can read more from them at their website. View Full User Profile

Batch forecasting in R

01.08.2013
| 1911 views |
  • submit to reddit

I some­times get asked about fore­cast­ing many time series auto­mat­i­cally. Here is a recent email, for example:

I have looked but can­not find any info on gen­er­at­ing fore­casts on mul­ti­ple data sets in sequence. I have been using analy­sis ser­vices for sql server to gen­er­ate fit­ted time series but it is too much of a black box (or I don’t know enough to tweak/​manage the inputs). In short, what pack­age should I research that will allow me to load data, gen­er­ate a fore­cast (pre­sum­ably best fit), export the fore­cast then repeat for a few thou­sand items. I have read that R does not like ‘loops’ but not sure if the cur­rent cpu power off­sets that or not. Any guid­ance would be greatly appre­ci­ated. Thank you!!

My response

Loops are fine in R. They are frowned upon because peo­ple use them inap­pro­pri­ately when there are often much more effi­cient vec­tor­ized ver­sions avail­able. But for this task, a loop is the only approach.

Read­ing data and export­ing fore­casts is stan­dard R and does not require any addi­tional pack­ages to load. To gen­er­ate the fore­casts, use the fore­cast pack­age. Either the ets() func­tion or the auto.arima() func­tion depend­ing on what type of data you are mod­el­ling. If it’s high fre­quency data (fre­quency greater than 24) than you would need the tbats() func­tion but that is very slow.

Some sam­ple code

In the fol­low­ing exam­ple, there are many columns of monthly data in a csv file with the first col­umn con­tain­ing the month of obser­va­tion (begin­ning with April 1982). Fore­casts have been gen­er­ated by apply­ing forecast() directly to each time series. That will select an ETS model using the AIC, esti­mate the para­me­ters, and gen­er­ate fore­casts. Although it returns pre­dic­tion inter­vals, in the fol­low­ing code, I’ve sim­ply extracted the point fore­casts (named mean in the returned fore­cast object because they are usu­ally the mean of the fore­cast distribution).

library(forecast)
 
retail <- read.csv("http://robjhyndman.com/data/ausretail.csv",header=FALSE)
retail <- ts(retail[,-1],f=12,s=1982+3/12)
 
ns <- ncol(retail)
h <- 24
fcast <- matrix(NA,nrow=h,ncol=ns)
for(i in 1:ns)
  fcast[,i] <- forecast(retail[,i],h=h)$mean
 
write(t(fcast),file="retailfcasts.csv",sep=",",ncol=ncol(fcast))

Note that the trans­pose of the fcast matrix is used in write() because the file is writ­ten row-​​by-​​row rather than column-​​by-​​column.

This code does not actu­ally do what the ques­tioner asked as I am writ­ing all fore­casts at once rather than export­ing them at each iter­a­tion. The lat­ter is much less efficient.

If ns is large, this could prob­a­bly be more effi­ciently coded using the par­al­lel pack­age.

Published at DZone with permission of Rob J Hyndman, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)