How ICI-RAFT Works
Why Regional frequency analysis? How is regional quality measured?
The principles of regional frequency analysis can be summarized as “trading space for time” – lumping together spatially disparate data so that the resultant increase in error of estimation is less than the decrease in error brought about by increasing sample size. Hosking and Wallis’ RFA-LM method preserves the at-site mean, normalizes all at-site data before regionalization, and conducts the entire analysis on normalized data before multiplying the regional ‘growth curve’ (normalized quantile function) by that at-site mean at the very end to output at-site quantile estimates.
The RFA-LM method is designed with hydrological variables, specifically precipitation and streamflow quantile estimation, in mind. However the method is potentially applicable to the estimation of quantiles for many physical parameters – the data records of precipitation gauges, stream flow monitors, thermometers, anemometers, and so on. If a physical parameter can be measured across space and time, it is possible that RFA-LM provides a responsible framework in which the one can be traded for the other.
The key to keeping the increase in ‘space’ error lower in magnitude than the decrease in ‘time’ error is in grouping gauges or at-site data records, referred to hereafter as sites, which appear to share a common underlying distribution frequency. If a region is perfectly homogeneous, observed differences between sites reflect only randomness. In other words, we extend the above assumption – instead of each site being N iterations of a Monte Carlo simulation off of a unique distribution, a region with pooled sample size N* is modeled as N* Monte Carlo iterations drawn from one shared distribution. The degree to which a putative region fits this assumption is a quantifiable measure of homogeneity. Hosking and Wallis evaluated several different quantifications and developed a metric called the heterogeneity statistic or H. The statistic is constructed so that an H of less than 2 indicates acceptable homogeneity, but Hosking and Wallis showed that mildly heterogeneous regions can still achieve overall error reduction. Another statistic called D for discordancy measures how much each site’s first three L-moment ratios differ from the regional average; sites with D higher than a threshold are discordant.
How are regions constructed?
It might seem easy to construct regions – simply find the sites with similar probability distribution functions and group them. However, one-site statistics are unreliable – that is the motivation behind regionalization in the first place – and it is possible that multiple sites appear identical through random variance but have different underlying distributions. Better results are achieved by grouping sites using at-site characteristics such as location (latitude/longitude), elevation, mean annual precipitation, monthly totals, and so on. A clustering algorithm is built into ICI-RAFT, as described below, which can organize sites into regions based on any number of these statistics multiplied by analyst-selected