Small-scale (flash) flood early warning in the light of operational requirements: opportunities and limits with regard to user demands, driving data, and hydrologic modeling techniques

In recent years, the Free State of Saxony (Eastern Germany) was repeatedly hit by both extensive riverine flooding, as well as flash flood events, emerging foremost from convective heavy rainfall. Especially 10 after a couple of small-scale, yet disastrous events in 2010, preconditions, drivers, and methods for deriving flash flood related early warning products are investigated. This is to clarify the feasibility and the limits of envisaged early warning procedures for small catchments, hit by flashy heavy rain events. Early warning about potentially flash flood prone situations (i.e., with a suitable lead time with regard to required reaction-time needs of the stakeholders involved in flood risk management) needs to take into account not only hydrological, but also 15 meteorological, as well as communication issues. Therefore, we propose a threefold methodology to identify potential benefits and limitations in a real-world warning/reaction context. First, the user demands (with respect to desired/required warning products, preparation times, etc.) are investigated. Second, focusing on small catchments of some hundred square kilometers, two quantitative precipitation forecasts are verified. Third, considering the user needs, as well as the input parameter uncertainty (i.e., foremost emerging from an uncertain 20 QPF), a feasible, yet robust hydrological modeling approach is proposed on the basis of pilot studies, employing deterministic, data-driven, and simple scoring methods.

The survey results were evaluated using descriptive statistics and subgroup analyses by means of contingency 23 tables. Therefore, given answers were investigated in an user-group specific manner, i.e., more than one 24 variable is considered at a time (multivariate approach). A question to address was whether specific user 25 groups answered differently or not. Such an effect can be induced by strongly differing sizes of sub-samples or 26 indicate a truly diverse response behavior. The literature suggests χ 2 -based dependency measures to clarify 27 such questions (Sachs, 1999). For the present study, Cramér's V and χ 2 -based p-values were used.  Figure 1), issued by DWD's Regional Service Center in Leipzig. The two QPFs 33 are compared against a Quantitative Precipitation Estimate (QPE), emerging from rain gauge data, which 34 was spatially interpolated (Ordinary Kriging) to derive areal precipitation estimates. Additionally, weather 35 radar data (DWD's RADOLAN-RW product; Sacher et al., 2011) was employed as another QPE reference. A 36 comprehensive overview of the herein considered QPFs and QPEs is given in Table 1.  sums were chosen to accommodate the most coarse temporal resolution of the investigated products, given by 2 the Quantile-QPF. The QF features areal rainfall totals (for the 16 forecasting regions) for 0.9, 0.5, and 0.1 3 exceedance probability and two consecutive 6-hour and two further 12-hour intervals. The QF is updated twice 4 a day (at 06:00 and 18:00 UTC) and therefore is a rather general QPF product. However, a main task of the 5 herein presented verification was to evaluate the quality of this product against highly resolved NWP output 6 (i.e., COSMO-DE).  (cf. Figure 1): first, a semi-distributed deterministic model (DeHM), second, a data-driven, neural-network model 17 (DaHM) and, third, a simple classification model, based on the scoring of flood-relevant parameters (ScoHM).

18
Subsequently, the modeling concepts and their application (with regard to calibration, data assimilation, etc.) 19 are briefly described. Only snow-free conditions were regarded for model development and application.  to that, and depending on the considered lead time in the forecasting case, inputs that bear a rainfall forecast 37 were included, e.g., for forecasting Q t+6 , the input P t+6 is added, for Q t+12 , P t+6; t+12 , respectively, whereas 38 the P t+x values portray specific QPF lead times.

39
The Levenberg-Marquardt algorithm was applied for network training, whilst allowing the number of hidden 40 neurons range from 3 to 13. Event-wise masked hydrograph data and hourly areal rainfall were used for training. 15 training runs were evaluated for each specific hidden-neuron configuration and the best network was selected.  The scoring is carried out according to Table 2; baseline sub-scores and the SPI sub-score are mapped linearly, 9 according to the range of each respective morphological feature. For the remaining dynamic susceptibility 10 sub-scores, frequency analyses were applied to deliver specific percentiles that are in turn connected to specific 11 sub-score values, e.g., P-sums within the 75th-90th percentile-range of the data result in a sub-score of 1, etc.

12
The method requires only one effective parameter, namely the recession constant of the incorporated linear 13 reservoir, which was manually calibrated to a global value of 8 h.
14 In contrast to the DeHM and DaHM models, the ScoHM approach does not rely on observed flow data, 15 neither in the sense of directly including auto-correlative signals, as applies for the data-driven DaHM model (in 16 form of the Q t−x inputs), nor indirectly via data assimilation/state updating, as applies for the deterministic 17 DeHM model. Therefore, the ScoHM approach might offer a robustly transferable methodology towards 18 prediction in small, ungauged basins.   than 10 mm/6 h should be evaluated with caution, due to limited data sample sizes (i.e., less than 10 events 28 in the investigated period). Due to QF-product related conventions (areal precipitation sum < 4.5 mm/6 h 29 is set to zero), the exceedance frequencies of the QF remain constant for thresholds < 4.5 mm. Threshold 30 exceedances drawn from the QF's 50th and 10th percentile are generally more frequent than the observed ones 31 (i.e., from rain gauge data), whereas the 90th percentile underestimates observed frequencies.

32
Second, for a more in-depth view at the regarded QPFs, the contingency-based measures POD and FAR 33 were evaluated (Figures 3 and 4). Again, due to the product-specific convention of the QF, the results for

1
The three presented models (DeHM, DaHM, ScoHM) were applied for the three aforementioned pilot areas 2 (cf. Figure 1). The herein investigated QPEs (gridded rain gauge data and RADOLAN data) and QPFs 3 (COSMO-DE and QF; cf. Sections 2.2 and 3.2) were used as meteorological drivers (for the current state 4 of work, ScoHM was charged with rain gauge data only). Validation for the DeHM and DaHM models is 5 straightforward since modeled hydrographs are simply compared against observed ones. Model evaluation 6 is a bit more delicate for the ScoHM results, since the ScoHM output (i.e., dimensionless scores) does only 7 qualitatively correlate with observed flow values. Therefore, a quantile-mapping procedure (Piani et al., 2009) 8 was applied to relate thresholds of Q with corresponding total-score values.  For the smallest sub-catchment, Niederoderwitz (29 km 2 ), DeHM performs best; for the three larger  It can be seen from Figure 5 that QPE data delivers highest predictive skill; incorporating RADOLAN 9 and rain gauge data as precipitation inputs leads to similar skill. Predictive skill under QPF data (QF and 10 COSMO-DE) is generally lower. For different QPFs as drivers, resulting skills do not differ greatly. Apparently, 11 the observed differences in QPF quality (cf. Section 3.2) do not systematically impact hydrological model skill.

12
Finally, it is important to state that validation was carried out on the basis of hourly values; a more general 13 evaluation, e.g., comparing only the highest values within a specific temporal window (e.g., 6 hours), would 14 yield considerably higher skill scores.

Conclusions and Outlook
1 In this study, user demands, driving data, and hydrologic modeling techniques were evaluated within a real-word 2 application context in order to illustrate a way towards a flash flood early warning strategy for (sub-)mesoscale 3 catchments in Saxony. First, the results suggest that the majority of potential users of flood warnings would 4 be satisfied with forecasting lead times of up to 24 hours and that users are foremost interested in predicted 5 peak water/alarm levels (rather than peak timing). Second, on the basis of meteorological verification results, 6 highly resolved NWP data seem to offer the best predictive skill, compared to more general, areally integrated 7 products. Third, differences in the quality of meteorological driving data do not greatly influence hydrological 8 model skill. Fourth, a clear statement on the superiority of one hydrological model over another cannot be 9 made.

10
In fact, if simple classification models would be sufficient to satisfy warning needs (e.g., providing the 11 information whether or not a specific threshold is likely to be exceeded in the next forecasting interval), results

12
show that such a modeling approach (i.e., ScoHM) performs with favorable skill, compared to more sophisticated 13 modeling techniques, and without introducing cumbersome parameter estimation problems and limited (DeHM) 14 or even nonexistent (DaHM) regional transferability. However, overall forecasting skill always decreases with 15 increasing randomness of driving events and conditions, i.e., the more rare/focused/intense the flood-causing 16 processes and/or the longer the lead time, the smaller the chance of correct detection/warning. 17 Further research is currently carried out regarding the statewide implementation and comparative evaluation 18 of the herein considered hydrological modeling approaches. Meteorological verification will be carried out 19 for smaller spatio-temporal scales. ScoHM will be validated for QPF inputs. Generally, the set of QPFs will 20 be extended to DWD's 21-member ensemble product, COSMO-DE-EPS. Thus, allowing a comprehensive 21 probabilistic verification and validation. Another goal is evaluating model-specific extrapolation skill to propose 22 a feasible regionalization methodology for deriving threshold-based warnings for ungauged basins.