An update on multivariate return periods in hydrology

Many hydrological studies are devoted to the identification of events that are expected to occur on average within a certain time span. While this topic is well established in the univariate case, recent advances focus on a multivariate characterization of events based on copulas. Following a previous study, we show how the definition of the survival Kendall return period fits into the set of multivariate return periods. Moreover, we preliminary investigate the ability of the multivariate return period definitions to select maximal events from a time series. Starting from a rich simulated data set, we show how similar the selection of events from a data set is. It can be deduced from the study and theoretically underpinned that the strength of correlation in the sample influences the differences between the selection of maximal events.


Introduction
Studying extremes in hydrological multivariate time series often aims at getting an estimate of the size of events to 10 be expected in a period of 10, 50 or 100 years. This information is relevant for the construction of many hydrological structures such as dams and dykes. As most of these natural events are characterized by several variables (e.g. peak discharge, volume, duration, . . . ) and several locations, it is important to understand their dependence structure and which constellations result in an extreme event. Copulas allow to flexibly model the dependence between the variables and add different marginal distribution functions to build a probabilistic multivariate model. The natural 15 ordering in univariate time series does not extend to the multivariate case calling for different tools to identify multivariate extremes.
In a previous study (Gräler et al., 2013), the practical impact of different bivariate multivariate return period definitions have been studied based on a simulated data set. Meanwhile, an additional approach, the survival Kendall return period (SKRP), has been developed (Salvadori et al., 2013). Using the same data as before, the SKRP is 20 calculated and related to the previously studied return periods (AND, OR and Kendall return period). Currently, multivariate maxima are often selected based on a single driving variable (e.g. peak discharge) and the associated variables (e.g. volume and duration) are studied in a multivariate setting. However, this does not a priori reflect the joint extreme characteristic that is the actual focus of such a study. Different notions of maximality can be defined following the above return period definitions. These allow to calculate the empirical joint extremeness and to select the maxima of multivariate time series.

5
In this paper, we will only briefly quote the key concepts. The interested reader is referred to the predecessor of this paper, Gräler et al. (2013), for further details. The following section recalls the definitions of the different multivariate return periods under study and puts them into relation. In Section 3, the different maxima selection regimes are presented and their effect is studied. Section 4 provides a discussion and conclusions.
2 Multivariate return periods 10 The driving tool underlying the multivariate return period definitions are copulas. Copulas are multivariate distribution functions defined on the unit hypercube. Based on Sklar's Theorem (Sklar, 1959), they combine marginal solely determine the entire dependence structure. For a detailed introduction, see e.g. the book by Nelsen (2006).
Going from univariate to multivariate extremes is not immediate. One major constrain is the lack of a natural 15 ordering for problems of dimension d ≥ 2. Typical definitions of the multivariate joint return periods include the OR case corresponding to P (X 1 > x 1 ,...,X d > x d ) and the AND case defined for P ( Kendall return period (KRP) introduced by Salvadori et al. (2011) is an approach that shares a unique property with the univariate return periods: the critical layer separating safe from dangerous events is unique for every design return period. This is not the case for the OR and AND approaches where different regions of safe and dangerous 20 events exist for the same return period. The basis of the Kendall return period, the Kendall distribution function is the distribution function of the copula's mass below its level curves. Salvadori et al. (2013) present the survival Kendall return period (SKRP) to overcome limitations of the Kendall return period (KRP) described in Salvadori et al. (2011). The drawback of the latter is its unboundedness. The critical layer splits the region into safe and dangerous events in a way, such that one of the margins might tend to 25 infinity ( even though with very small probability). This limitation is overcome by the SKRP, as the critical layer is nicely bounded, as for the OR return period, but every point on the critical layer exhibits the same return period, as in the Kendall scenario. In a way, the SKRP combines the best of both worlds. Its mathematical definition reads withK the survival Kendall distribution function given by andĈ the survival copula andF i the marginal survival distribution functions. See Salvadori et al. (2013) for the full details.
In order to extend our previous study, we use the same data (simulated using the COSMO4SUB model (Grimaldi 5 et al., 2012), compare Section 4 in Gräler et al. (2013)) and adopt the same parametrization as in Gräler et al. (2013) to also calculate the SKRP for the bivariate approach. Higher dimensional approaches are out of the scope of this follow-up paper. The peak discharges Q p are said to follow a Weibull distribution while the associated volumes follow an exponential distribution. Recall that the selection of the annual maxima was done based on the peak discharges and the volumes are the ones corresponding to the same event, but as such not necessarily the largest one in the  In the previous section and study, the annual maxima were selected based on the maximum peak discharge and the volumes were only the corresponding, but not necessarily maximal ones. An alternate approach can be taken either based on the empirical copula or the adoption of multivariate distributions. For these, the same MRP definitions can be applied as quoted above and the largest values per year can be selected. In the following, we will follow this avenue and investigate the differences between these approaches where the copula C might be the empirical copula 20 or an appropriate family.
To study the impact of the aforementioned definitions, we use a second run of 500 simulated years of 5.625 minute 5 resolution discharge data that were aggregated to separate rainfall events. This is different from the previous data set where only annual maxima have been used. This second data set contains 12466 events. In order to reduce the effect of autocorrelation within this simulation, we only consider a random subset of 50 % of the data (autocorrelation plots indicate an uncorrelated time series, not shown here). We do not fit any parametric family, and solely use the empirical definitions of the above equations.

10
In our simulated data set, the largest event in a year often is the same for all four definitions. This is not too surprising, considering that there are on average less than 25 rainfall events in each year. What remains different, is how extreme the event is for each of the four notions. The left plot of Figure 2 illustrates the relationship between the annual maximum value of the different definitions for the studied data set. Identifying the single rainfall events and looking into the marginal distributions, visually reveals identical histograms for peak discharge as well as for volume for the four bivariate and respective univariate maxima selections.
An overlay shows only very little variations for discharge values and volumes. Larger values of the margins tend to even better coincide.
As this data set follows a very strong correlation, we draw a sample of a Gumbel copula with a moderate Kendall's 5 tau of 0.6, assign it to the same temporal structure as our previous data set and repeat the above analysis. The right plot of Figure 2 is based on the copula sample and shows a larger variation in the annual maximum values for the four approaches. As in the left plot, the OR and KRP as well as the AND and SKRP approaches seem to be much more alike than the other pairwise combinations. The selected margins show a little more variability, but the histograms remain hardly distinguishable. All the variation appears in the center of the distribution.
10 4 Discussion and conclusion The SKRP yields the most reliable separation into safe (sub-critical) and dangerous (super-critical) events. Nevertheless, the selection of a single design event, as often required by subsequent studies, remains an open question.
Here, we selected the most probable bivariate event, but any event along the critical layer separates the sub-and super-critical regions.

15
The differences between the four definitions of maximality were minor in the simulated rainfall time series, but this is also due to the very strong correlation. This strong dependence causes the copula to be close the upper The temporal structure was not changed, only the dependence structure to investigate the effect. A less dry study area with much more rainfall events in a year will further influence the selection. However, the large extreme values appear to be extreme in each of the definitions.
Here we use the raw definitions of multivariate return periods, but an alternative would be to investigate derived 10 measures. Requena et al. (2013) developed the routed return period where the water levels in a dam are used to characterize the return period of bivariate rainfall events. This idea could also be used in the same manner as presented in this paper to initially select the largest events from the original time series.
The investigated data set features a very long time series. Shorter time series might be more sensitive to changes of the maximum selection regime applied, as few events might have a strong influence on the selection of the marginal 15 distributions. The influence of these outer properties needs to be further investigated. An avenue of future research is to consider the joint extremeness for the selection of extremes to be fed into a point over threshold approach.