{"title": "Modelling Seasonality and Trends in Daily Rainfall Data", "book": "Advances in Neural Information Processing Systems", "page_first": 985, "page_last": 991, "abstract": null, "full_text": "Modelling Seasonality and Trends in Daily \n\nRainfall Data \n\nPeter M Williams \n\nSchool of Cognitive and Computing Sciences \n\nUniversity of Sussex \n\nFalmer, Brighton BN1 9QH, UK. \nemail: peterw@cogs.susx.ac.uk \n\nAbstract \n\nThis paper presents a new approach to the problem of modelling daily \nrainfall using neural networks. We first model the conditional distribu(cid:173)\ntions of rainfall amounts, in such a way that the model itself determines \nthe order of the process, and the time-dependent shape and scale of the \nconditional distributions. After integrating over particular weather pat(cid:173)\nterns, we are able to extract seasonal variations and long-term trends. \n\n1 \n\nIntroduction \n\nAnalysis of rainfall data is important for many agricultural, ecological and engineering \nactivities. Design of irrigation and drainage systems, for instance, needs to take account \nnot only of mean expected rainfall, but also of rainfall volatility. In agricultural planning, \nchanges in the annual cycle, e.g. advances in the onset of winter rain, are significant in \ndetermining the optimum time for planting crops. Estimates of crop yields also depend \non the distribution of rainfall during the growing season, as well as on the overall amount. \nSuch problems require the extrapolation of longer term trends as well as the provision of \nshort or medium term forecasts. \n\n2 Occurrence and amount processes \n\nModels of daily precipitation commonly distinguish between the occurrence process, i.e. \nwhether or not it rains, and the amount process, i.e. how much it rains, if it does. The \noccurrence process is often modelled as a two-state Markov chain of first or higher order. \nIn discussion of [12], Katz traces this approach back to Quetelet in 1852. A first order \nchain has been considered adequate for some weather stations, but second or higher order \nmodels may be required for others, or at different times of year. Non-stationary Markov \nchains have been used by a number of investigators, and several approaches have been taken \n\n\f986 \n\nP. M Williams \n\nto the problem of seasonal variation, e.g. using Fourier series to model daily variation of \nparameters [16, 12, 15]. \nThe amount of rain X on a given day, assuming it rains, normally has a roughly exponential \ndistribution. Smaller amounts of rain are generally more likely than larger amounts. Several \nmodels have been used for the amount process. Katz & Parlange [9], for example, assume \nthat \\IX has a normal distribution, where n is a positive integer empirically chosen to \nminimise the skewness of the resulting historical distribution. But use has more commonly \nbeen made of a gamma distribution [7,8, 12] or a mixture of two exponentials [16, 15]. \n\n3 Stochastic model \n\nThe present approach is to deal with the occurrence and amount processes jointly, by as(cid:173)\nsuming that the distribution of the amount of rain on a given day is a mixture of a discrete \nand continuous component. The discrete component relates to rainfall occurrence and the \ncontinuous component relates to rainfall amount on rainy days. \nWe use a gamma distribution for the continuous component. l This has density proportional \nto x v - 1 e- x to within an adjustable scaling of the x-axis. The shape parameter v > 0 \ncontrols the ratio of standard deviation to mean. It also determines the location of the \nmode, which is strictly positive if v > 1. For certain patterns of past precipitation, larger \namounts may be more likely on the following day than smaller amounts. Specifically the \ndistribution of the amount X of rain on a given day is modelled by the three parameter \nfamily \n\nif x < 0 \nif x ~ 0 \n\n(1) \n\nwhere 0 ~ a ~ 1 and v,O > 0 and \n\nr(v,z) = r(V)-l 100 yv-l e- y dy \n\nis the incomplete gamma function. For a < 1, there is a discontinuity at x = 0 corre(cid:173)\nsponding to the discrete component. Putting x = 0, it is seen that a = P(X > 0) is the \nprobability of rain on the day in question. The mean daily rainfall amount is avO and the \nvariance is aV{l + v(l - a)}02. \n\n4 Modelling time dependency \n\nThe parameters a, v, 0 determining the conditional distribution for a given day, are under(cid:173)\nstood to depend on the preceding pattern of precipitation, the time of year etc. To model \nthis dependency we use a neural network with inputs corresponding to the conditioning \nevents, and three outputs corresponding to the distributional parameters.2 Referring to the \nactivations of the three output units as zO:, ZV and zO, we relate these to the distributional \nparameters by \n\n1 \n\na=----\n1 + expzO: \n\n(2) \nin order to ensure an unconstrained parametrization with 0 < a < 1 and v,O > 0 for any \nreal values of zO:, zV, zO. \n\n0= expzo \n\nv = expzv \n\n1 It would be straightforward to use a mixture of gammas, or exponentials, with time-dependent \n\nmixture components. A single gamma was chosen for simplicity to illustrate the approach. \n\n2 A similar approach to modelling conditional distributions, by having the network output distri(cid:173)\n\nbutional parameters, is used, for example, by Ghabramani & Jordan [6], Nix & Weigend [10], Bishop \n& Legleye [3], Williams [14], Baldi & Chauvin [2]. \n\n\fModelling Seasonality and Trends in Daily Rainfall Data \n\n987 \n\nOn the input side, we first need to make additional assumptions about the statistical proper(cid:173)\nties of the process. Specifically it is assumed that the present is stochastically independent \nof the distant past in the sense that \n\n(t > T) \n\n(3) \n\nfor a sufficiently large number of days T. In fact the stronger assumption will be made that \n\nP(Xt>X!Xt-1,,,,,XO) = P(Xt>X!Rt-1, ,, ,,Rt-T) \n\n(4) \nwhere Rt = (Xt > 0) is the event of rain on day t. This assumes that today's rainfall \namount depends stochastically only on the occurrence or non-occurrence of rain in the \nrecent past, and not on the actual amounts. Such a simplification is in line with previous \napproaches [8, 16, 12J. For the present study T was taken to be 10. \n\n(t>T) \n\nTo assist in modelling seasonal variations, cyclic variables sin T and cos T were also pro(cid:173)\nvided as inputs, where T = 21rt/ D and D = 365.2422 is the length of the tropical year. \nThis corresponds to using Fourier series to model seasonality [16, 12J but with the num(cid:173)\nber of harmonics adaptively determined by the model. 3 To allow for non-periodic non(cid:173)\nstationarity, the current value of t was also provided as input. \n\n5 Model fitting \n\nSuppose we are given a sequence of daily rainfall data of length N. Equation (4) implies \nthat the likelihood of the full data sequence (x N -1 I \u2022\u2022\u2022 I Xo) factorises as \n\np(XN-1 , .. . I Xo; w) = p(XT-1I\" . I Xo) II p(Xt ! Tt-1 I' .. I Tt-T; w) \n\nN-1 \n\nt=T \n\n(5) \n\nwhere the likelihood p(XT-1I'\" IXO) of the initial sequence is not modelled and can be \nconsidered as a constant (compare [14]). Our interest is in the likelihood (5) of the actual \nsequence of observations, which is understood to depend on the variable weights w of the \nneural network. Note that p(Xt ! Tt-1 I ' \u2022\u2022 I Tt-T; w) is computed by means of the neural \nnetwork outputs zf I zf I zf, using weights wand the inputs corresponding to time t. \nThe log likelihood of the data can therefore be written, to within a constant, as \n\nlogp(xN-1 I' .. IXO; w) = L logp(xt ! Tt-1,\u00b7 .. I Tt-T; w) \n\nN-1 \n\nt=T \n\nor, more simply, \n\nwhere from (1) \n\nN-1 \n\nL(w) = L Lt(w) \n\nt=T \n\nL () {log(1 - at) \n\nt w = \n\nlog at + (lit -1) logxt -lit logOt -logr(lIt) - xt/Ot \n\n(6) \n\nif Xt = 0 \nif Xt > 0 \n\n(7) \n\nwhere dependence of at, lit I Ot on w, and also on the data, is implicit. \nTo fit the model, it is useful to know the gradient 'VL(w). This can be computed using \nbackpropagation if we know the partial derivatives of L(w) with respect to network out(cid:173)\nputs. In view of (6) we can concentrate on a single observation and perform a summation. \n\n3Note that both sin nr and cos nr can be expressed as non-linear functions of sin r and cos r. \n\nwhich can be approximated by the network. \n\n\f988 \n\nP. M. Williams \n\nOmitting subscript references to t for simplicity, and recalling the links between network \noutputs and distributional parameters given by (2), we have \n\n8L \n8z Q \n8L \n8z v \n\n{ -a \n{ 0 \n8L { 0 \n\n8zo \n\n= \n\nx \nv'I/J (v) - v log (j \n\nif x = 0 \nifx> 0 \n\n(8) \n\nif x = 0 \n1-0 if x > 0 \n\nif x = 0 \nif x > 0 \n\nx \nv--\n() \n\nwhere \n\nr'(v) \n'I/J(v) = -logr(v) = --\nr(v) \n\nd \ndv \n\nis the digamma function of v. Efficient algorithms for computing log r(v) in (7) and 'I/J(v) \nin (8) can be found in Press et al. [11] and Amos [1]. \n\n6 Regularization \n\nSince neural networks are universal approximators, some form of regularization is needed. \nAs in all statistical modelling, it is important to strike the right balance between jumping \nto conclusions (overfitting) and refusing to learn from experience (underfitting). For this \npurpose, each model was fitted using the techniques of [13] which automatically adapt the \ncomplexity of the model to the information content of the data, though other comparable \ntechniques might be used. The natural interpretation of the regularizer is as a Bayesian \nprior. The Bayesian analysis is completed by integration over weight space. In the present \ncase, this was achieved by fitting several models and taking a suitable mixture as the so(cid:173)\nlution. On account of the large datasets used, however, the results are not particularly \nsensitive to this aspect of the modelling process. \n\n7 Results for conditional distributions \n\nThe process was applied to daily rainfall data from 5 stations in south east England and 5 \nstations in central Italy.4 The data covered approximately 40 years providing some 15,000 \nobservations for each station. A simple fully connected network was used with a single \nlayer of 13 input units, 20 hidden units and 3 output units corresponding to the 3 parameters \nof the conditional distribution shown in (2). As a consequence of the pruning features of the \nregularizer, the models described here used an average of roughly 65 of the 343 available \nparameters. \nTo illustrate the general nature of the results, Figure 1 shows an example from the analysis \nof an early part of the Falmer series. It is worth observing the succession of 16 rainy days \nfrom day 39 to day 54. The lefthand figure shows that the conditional probability of rain \nincreases rapidly at first, and then levels out after about 5-7 days.s Similar behaviour is \nobserved for successive dry days, for example between days 13 and 23. This suggests \nthat the choice of 10 time lags was sufficient. Previous studies have used mainly first \nor second order Markov chains [16, 12]. Figure 1 confirms that conditional dependence \n\n4The English stations were at Cromptons, FaImer, Kemsing, Petworth, Rothertield; the Italian \n\nstations were at Monte Oliveto, Pisticci, Pomarico, Siena, Taverno d' Arbia. \n\nsIn view of the number of lags used as inputs, the conditional probability would necessarily be \nconstant after 10 days apart from seasonal effects. In fact this is the last quarter of 1951 and the \nincidence of rain is increasing here at that time of year. \n\n\f0.8 \n\n0.6 \n\n0.4 \n\n0.2 \n\n0 \n\n0 \n\n10 \n\n20 \n\nI \n\nii \nII \nII \nII \nII \nI II \ni \nI \n\nI \nII \n30 \n\nFALMER: conditional mean \n\n20 \n\n15 \n\n10 \n\n5 \n\nModelling Seasonality and Trends in Daily Rainfall Data \n\n989 \n\nF ALMER: conditional probability \n\n40 \n\n50 \n\n60 \n\n70 \n\n10 \n\n20 \n\n30 \n\n40 \n\n50 \n\n60 \n\n70 \n\nFigure 1: Results for the 10 weeks from 18 September to 27 November, 1951. The lefthand \nfigure shows the conditional probability of rain for each day, with days on which rain \noccurred indicated by vertical lines. The righthand figure shows the conditional expected \namount of rain in millimeters for the same period, together with the actual amount recorded. \n\ndecays rapidly at this station, at this time of year, but also indicates that it can persist for up \nto at least 5 days (compare [5,4]). \n\n8 Seasonality and trends \n\nConditional probabilities and expectations displayed in Figure 1 show considerable noise \nsince they are realisations of random variables depending on the rainfall pattern for the last \n10 days. For the purpose of analysing seasonal effects and longer term trends, it is more \nindicative to integrate out the noise resulting from individual weather patterns as follows. \nLet Rt denote the event (Xt > 0) and let Rt denote the complementary event (Xt = 0). \nThe expected value of Xt can then be expressed as \n\nE(Xt ) = L E(Xt I A t - 1, . .. ,At-T) P(At- 1, . .. ,At-T) \n\n(9) \n\nwhere each event At stands for either R t or R t , and summation is over the 2T possible \ncombinations. Equation (9) takes the full modelled jOint distribution over the variables \nX N -1, .. . ,X 0 and extracts the marginal distribution for X t . This should be distinguished \nfrom an unconditional distribution which might be estimated by pooling the data over all \n40 years. E(Xt ) relates to a specific day t. Note that (9) also holds if X t is replaced by \nany integrable function of X t , in particular by the indicator function of the event (Xt > 0) \nin which case (9) expresses the probability of rain on that day. \nExamining (9) we see that the conditional expectations in the first term on the right are \nknown from the model, which supplies a conditional distribution not only for the sequence \nof events which actually occurred, but for any possible sequence over the previous T days. \nIt therefore only remains to calculate the probabilities P( At- 1 , ... , At-T) of T -day se(cid:173)\nquences preceding a given day t. Note that these are again time-dependent marginal prob(cid:173)\nabilities, which can be calculated recursively from \n\nP(At , .. . , At-T+t} = \n\nP(At I At- 1,\u00b7 . . , A t -T+1 R t-T) P(At- 1 , . .\u2022 , At-T+1 R t-T) + \nP(At I At-I,\u00b7\u00b7 ., At-T+IRt-T) P(At- 1, . .. , At-T+IRt-T) \n\nprovided we assume a prior distribution over the 2T initial sequences (AT-I, . .. , Ao) as \na base for the recursion. The conditional probabilities on the right are given by the model, \n\n\f990 \n\nP. M. Williams \n\nPOMARICO: probability of rain \n\n0.35 .----r---..---.---~-___r-__, \n\nPOMARICO: mean and standard deviation \n\n0.3 HHHI\"*-lHI-IHHHH1\u00b7-lH\u00b7lI-H~-f+H \n\n0.25 M-I++H++I-tI+i1~++HHI#H-fHHHHI-{HI-1HHHHI\u00b7H-II-fHli \n\n0.2 I\\I*H*I*+H1flHl-+l!#~iHHIl-Iffi~+++H,HIf~KH+IHIH\u00b7HH1ffHII \n\n0.15 H+1-HrH-l1-f , \n\nI ! \n\n0.1 L--_-'--_-'--_~ _ __'_ _ _'__~ \n1985 \n\n1960 \n\n1980 \n\n1970 \n\n1965 \n\n1975 \n\n1955 \n\n2~~~~~~~~~~~~~-~ \n\n1 \n\nOL----'----'---~---'---'---~ \n1955 \n1985 \n\n1965 \n\n1970 \n\n1975 \n\n1980 \n\n1960 \n\nFigure 2: Integrated results for Pomarico from 1955-1985. The lefthand figure shows the \ndaily probability of rain, indicating seasonal variation from a summer minimum to a winter \nmaximum. The righthand figure shows the daily mean (above) and standard deviation \n(below) of rainfall amount in millimeters. \n\nas before, and the unconditional probabilities are given by the recursion. It turns out that \nresults are insensitive to the choice of initial distribution after about 50 iterations, verifying \nthat the occurrence process, as modelled here, is in fact ergodic. \n\n9 Integrated results \n\nResults for the integrated distribution at one of the Italian stations are shown in Figure 2. \nBy integrating out the random shocks we are left with a smooth representation of time \ndependency alone. The annual cycles are clear. Trends are also evident over the 30 year \nperiod. The mean rainfall amount is decreasing significantly, although the probability of \nrain on a given day of the year remains much the same. Rain is occurring no less frequently, \nbut it is occurring in smaller amounts. Note also that the winter rainfall (the upper envelope \nof the mean) is decreasing more rapidly than the summer rainfall (the lower envelope of \nthe mean) so that the difference between the two is narrowing. \n\n10 Conclusions \n\nThis paper provides a new example of time series modelling using neural networks. The \nuse of a mixture of a discrete distribution and a gamma distribution emphasises the general \nprinciple that the \"error function\" for a neural network depends on the particular statistical \nmodel used for the target data. The use of cyclic variables sin T and cos T as inputs shows \nhow the problem of selecting the number of harmonics required for a Fourier series analysis \nof seasonality can be solved adaptively. Long term trends can also be modelled by the use \nof a linear time variable, although both this and the last feature require the presence of \na suitable regularizer to avoid overfitting. Lastly we have seen how a suitable form of \nintegration can be used to extract the underlying cycles and trends from noisy data. These \ntechniques can be adapted to the analysis of time series drawn from other domains. \n\n\fModelling Seasonality and Trends in Daily Rainfall Data \n\n991 \n\nAcknowledgement \n\nI am indebted to Professor Helen Rendell of the School of Chemistry, Physics and Envi(cid:173)\nronmental Sciences, University of Sussex, for kindly supplying the rainfall data and for \nvaluable discussions. \n\nReferences \n\n[1] D. E. Amos. A portable fortran subroutine for derivatives of the psi function. ACM \n\nTransactions on Mathematical Software, 9:49~502, 1983. \n\n[2] P. Baldi and Y. Chauvin. Hybrid modeling, HMM/NN architectures, and protein \n\napplications. Neural Computation, 8:1541-1565, 1996. \n\n[3] C. M. Bishop and C. Legleye. Estimating conditional probability densities for peri(cid:173)\nodic variables. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural \nInformation Processing Systems 7, pages 641-648. The MIT Press, 1995. \n\n[4] E. H. Chin. Modelling daily precipitation occurrence process with Markov chain. \n\nWater Resources Research, 13:949-956,1977. \n\n[5] P. Gates and H. Tong. On Markov chain modelling to some weather data. Journal of \n\nAppliedMeteorology, 15:1145-1151, 1976. \n\n[6] Z. Ghahramani and M. 1. Jordan. Supervised learning from incomplete data via an \nEM approach. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Ad(cid:173)\nvances in Neural Information Processing Systems 6, pages 120-127. Morgan Kauf(cid:173)\nmann, 1994. \n\n[7] N. T. Ison, A. M. Feyerherm, andL. D. Bark. Wet period precipitation and the gamma \n\ndistribution. Journal of Applied Meteorology, 10:658-665, 1971. \n\n[8] R. W. Katz. Precipitation as a chain-dependent process. Journal of Applied Meteo(cid:173)\n\nrology, 16:671-676,1977. \n\n[9] R. W. Katz and M. B. Parlange. Effects of an index of atmospheric circulation on \nstochastic properties of precipitation. Water Resources Research, 29:2335-2344, \n1993. \n\n[10] D. A. Nix and A. S. Weigend. Learning local error bars for nonlinear regression. In \nGerald Tesauro, David S. Touretzky, and Todd K. Leen, editors, Advances in Neural \nInformation Processing Systems 7, pages 489-496. MIT Press, 1995. \n\n[11] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes \n\nin C. Cambridge University Press, 2nd edition, 1992. \n\n[12] R. D. Stern and R. Coe. A model fitting analysis of daily rainfall data, with discussion. \n\nJournal of the Royal Statistical Society A, 147(Part 1):1-34,1984. \n\n[13] P. M. Williams. Bayesian regularization and pruning using a Laplace prior. Neural \n\nComputation, 7:117-143,1995. \n\n[14] P. M. Williams. Using neural networks to model conditional multivariate densities. \n\nNeural Computation, 8:843-854, 1996. \n\n[15] D. A. Woolhiser. Modelling daily precipitation-progress and problems. In An(cid:173)\ndrew T. Walden and Peter Guttorp, editors, Statistics in the Environmental and Earth \nSciences, chapter 5, pages 71-89. Edward Arnold, 1992. \n\n[16] D. A. Woolhiser and G. G. S. Pegram. Maximum likelihood estimation of Fourier co(cid:173)\n\nefficients to describe seasonal variation of parameters in stochastic daily precipitation \nmodels. Journal of Applied Meteorology, 18:34-42, 1979. \n\n\f", "award": [], "sourceid": 1429, "authors": [{"given_name": "Peter", "family_name": "Williams", "institution": null}]}