Big Data/Analytics Zone is brought to you in partnership with:

Arthur Charpentier, ENSAE, PhD in Mathematics (KU Leuven), Fellow of the French Institute of Actuaries, professor at UQàM in Actuarial Science. Former professor-assistant at ENSAE Paritech, associate professor at Ecole Polytechnique and professor assistant in economics at Université de Rennes 1. Arthur is a DZone MVB and is not an employee of DZone and has posted 158 posts at DZone. You can read more from them at their website. View Full User Profile

The Statistics of Easter

04.01.2013
| 846 views |
  • submit to reddit

This morning, there was an interesting post entitled “why does Easter move around so much?” online on http://economist.com/blogs/economist-explains/…

In my time series classes, I keep saying that sometimes, series can exhibit seasonlity, but the seasonal effect can be quite irregular. It is the cas for river levels, where snowmelt can have a huge impact, and it is irregular. Similarly, chocolate sales (even monthly, or quarterly) depends on Easter. Because it can be either in March, or in April, the seasonal pattern is not as regular as flower sales for instance (Valentine beeing always on February 14th, as far as I remember). If we look at the word eggson http://google.com/trends/q=eggs…, we do observe a cycle related to Easter.

The title of the article published by http://economist.com/blogs/economist-explains/… claims that there is a lot of variability on Eater’s day. Let us check ! The answer to the question “When is Easter ?” can be the following (if we want a short answer): Easter Sunday is the first Sunday after the first full moon after vernal equinox. For more details, see e.g. http://ortelius.de/east. The algorithm used to compute the date of Easter can is online, on http://smart.net/~mmontes/….

> century = year/100
> G = year % 19
> K = (century - 17)/25
> I = (century - century/4 - (century - K)/3 + 19*G + 15) % 30
> I = I - (I/28)*(1 - (I/28)*(29/(I + 1))*((21 - G)/11))
> J = (year + year/4 + I + 2 - century + century/4) % 7
> L = I - J
> EasterMonth = 3 + (L + 40)/44
> EasterDay = L + 28 - 31*(EasterMonth/4)

Actually, this algorithm can be found in some R packages. Here we use the date of Easter from AD 1000 and AD 3000,

> library(timeDate)
> E=Easter(1000:3000)
> D=as.Date(E)
> table(months(D))/2001

    april     march 
0.7651174 0.2348826

(April being before March, in the alphabetical order) If we look at the distribution of the date, it is the following, the starting point being March 1st,

> J=as.numeric(D-as.Date(paste("01/03/",1000:3000,sep=""),"%d/%m/%Y"))
> hist(J,breaks=seq(20,55),col="light green")

And if we look at the autocorrelation function, we can observe that indeed, after 19 years, there is a strong correlation (that could be seen in the algorithm given previously),

> plot(acf(J))

But in order to get a better understanding of the dynamics, we can also look at transiftion matrices. Define

> Q=quantile(J,seq(0,1,by=.25))
> Q[1]=Q[1]-1
> C=cut(J,Q)

Then, the one year transition matrix is (in %)

> k=1; n=length(C)
> B=data.frame(X1=(C[1:(n-k)]),X2=(C[(k+1):n]))
> (T=table(B$X1,B$X2))

          (20,31] (31,39] (39,46] (46,55]
  (20,31]       0       0     265     277
  (31,39]     316       0      13     182
  (39,46]     224     264       0       0
  (46,55]       1     247     211       0
> P=T/apply(T,1,sum)
> round(P*1000)/10

          (20,31] (31,39] (39,46] (46,55]
  (20,31]     0.0     0.0    48.9    51.1
  (31,39]    61.8     0.0     2.5    35.6
  (39,46]    45.9    54.1     0.0     0.0
  (46,55]     0.2    53.8    46.0     0.0

I.e. if  Easter was early in the year (say in March, in the first quartile), then very likely, the year after, it will be late in the year (with 50% chance in the third quartile, and 50% chance in the fourth one).

Published at DZone with permission of Arthur Charpentier, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)