Modelling streambank erosion potential using maximum entropy in a central Appalachian watershed

We used maximum entropy to model streambank erosion potential (SEP) in a central Appalachian watershed to help prioritize sites for management. Model development included measuring erosion rates, application of a quantitative approach to locate Target Eroding Areas (TEAs), and creation of maps of boundary conditions. We successfully constructed a probability distribution of TEAs using the program Maxent. All model evaluation procedures indicated that the model was an excellent predictor, and that the major environmental variables controlling these processes were streambank slope, soil characteristics, bank position, and underlying geology. A classification scheme with low, moderate, and high levels of SEP derived from logistic model output was able to differentiate sites with low erosion potential from sites with moderate and high erosion potential. A major application of this type of modelling framework is to address uncertainty in stream restoration planning, ultimately helping to bridge the gap between restoration science and practice.


INTRODUCTION
A growing number of scientists agree that where we are doing restoration and the scale at which projects are implemented, are critical for effective restoration (Wohl et al., 2005).Predictive models and assessment tools that are currently used in restoration planning vary greatly in parameter selection and precision, which has implications for the scale and applications for which they are relevant (Merritt et al., 2003).While several of these models have been useful for watershed management (Rosgen, 2001;Simon et al., 2003), a niche remains for process models that provide a balance between high resolution prediction and broad-scale applicability.Objective decision support tools that incorporate geographic information system (GIS) and probability modelling could increase efficiency of restoration site selection and facilitate development of watershed-scale restoration plans (Wohl et al., 2005) by elucidating relative streambank erosion potential (SEP) within the context of the watershed.Recent improvements and availability of remote sensing data (Goetz, 2006) used in conjunction with Bayesian reasoning has the potential to improve the resolution and precision of SEP prediction over large spatial extents (Regmi et al., 2010).
We applied maximum entropy, a general purpose, machine learning method that enables prediction from incomplete information (Phillips et al., 2006), to estimate the spatial distribution of target eroding areas (TEAs) undergoing excessive streambank erosion.We then used this probability distribution to create a classification scheme for streambank erosion potential (SEP).We believe this approach has great potential for enhancing watershed management by helping identify sites with the greatest restoration potential, which is critical for long-term success (Wohl et al., 2005).

STUDY AREA AND METHODS
A model of SEP was constructed for a portion of the Cacapon River watershed within the larger Potomac River basin.The watershed drains about 2320 km 2 within Hardy, Hampshire, and Morgan counties, West Virginia.Climate is considered humid continental, characterized by hot summers, cold winters, and average annual precipitation near 900 mm.Streams in the study area typically flow through wide, slightly entrenched, shallow channels (Pitchford, 2012).We predicted SEP for 113 km of 1st-3rd order streams with median daily flowrates of 2-31 m 3 /s in the mainstem and 0.04-1.2m 3 /s in a representative tributary during the study period (available from the USGS at www.waterdata.usgs.gov).Elevation within the study area ranged within 210-423 m and the underlying geology consisted of alluvium (47%), shale (29%), sandstone (17%), and limestone (6%) (West Virginia Geological and Economic Survey, 2011).
Erosion rates were monitored at a total of 151 sites distributed among 30 stream reaches using erosion pin and streambank profile surveys (Hupp et al., 2009) during 2010-2011.Streambank migration rates were quantified using repeated measurements of 122 cm long, 0.95 cm diameter reinforcing rods to calculate an average migration rate for each site in m/year (Hupp et al., 2009).Also, repeated streambank profile surveys were conducted by measuring the horizontal distance from a level survey rod to the face of the streambank at 15 cm vertical increments to calculate a rate of change in sediment storage for each site.To determine which survey locations represent TEAs, we used cluster and outlier analysis (Anselin Local Moran's I) within the program ESRI® ArcMap TM 10.0 to determine locations of significant clustering of high erosion rates.Sites that had statistically significant clustering of high erosion rates (Z > 1.65; α < 0.1) using either survey method were considered to be TEAs.
Airborne Light Detection and Ranging (LiDAR) was flown over the study area in April 2010 by the West Virginia University Natural Resource Analysis Center (WVUNRAC).Data were captured at an altitude of 1676 m and a speed of 135 knots using an Optech Inc. (Ontario, Canada) ALTM3100 with a vertical accuracy of 15 cm.These data were post-processed to create models for bare ground and vegetation within the study area.LiDAR and other available data were used to create environmental layers to represent features associated with streambank erosion (Table 1).The computer program, Maxent, version 3.3.2,was used to model SEP by estimating the unknown distribution (π) over the set of pixels in the study area.Maxent assigned a probability of occurrence to each point (), that is approximated by solving for the entropy of  � using the equation: where ln is the natural logarithm, and  � is a positive value representing the probability of occurrence for the target phenomena that sums to one over the pixels in the study extent.
Thirty replicate bootstrap runs were conducted using 25% of the training sites that represent TEAs.Evaluation of model performance included a threshold-dependent, one-tailed binomial test on model omission and predicted area to determine if the maximum entropy distribution was predicting better than random.A threshold independent, area under curve (AUC) analysis was also used, where a value of <0.5 indicates the model predicts no better than random, 0.5-0.7 indicates fair predictive capacity, 0.7-0.9indicates a good predictive capacity, and values >0.9 are indicative of an excellent model (Phillips et al., 2006).The influence of environmental variables on the distribution of TEAs was also generated by Maxent to help assess the influence of each environmental variable on the prediction.
A final map was created from the logistic model output to represent three levels of SEP (i.e.low, moderate, and high).We conducted a one-way Analysis of Variance (ANOVA) to test the strength of our classification scheme, using normally distributed migration rates (m/year) as the dependent variable.A significant ANOVA was followed with a Tukey's Honest Significant Difference (HSD) post hoc test to compare migration rates between low, moderate, and high SEP.Significance for all tests was set at the α = 0.05 level.

RESULTS
Migration rates ranged from -0.11 to 0.95 m/year an average migration rate of 0.24 m/year (SE = 0.02).Net change in sediment storage ranged from a net loss of 3.04 m 2 /year to a net gain of 0.80 m 2 /year with an average net loss of 0.39 m 2 /year (SE = 0.06).A total of 29 TEAs were identified from the 151 monitored locations.Twenty-five TEAs were detected based on migration rates of 0.53-0.93m/year, and six TEAs were detected based on net sediment losses of 1.2-2.29 m 2 /year.
Nine of the original 14 environmental variables contributed unique information to the model and were used to create the final model of SEP.The average training AUC value for 30 model runs was 0.994 (SE = 0.0004), which indicated that the model had excellent predictive capacity.The binomial omission test was significant (P < 0.01, one tailed) for all data partitions at all selected threshold values indicating that the model predicted much better than random.The average logistic threshold for the minimum training presence (MTP) for all model runs was 0.209 (SE = 0.02).All logistic threshold values greater than the MTP were considered to have moderate or high SEP, which included 3.1% of the study extent.The most important environmental variables in the model were slope (32.7%), soil type (29.2%), bank stress index (20.6%),and underlying geology (8.7%) (Table 2).The logistic probability of a TEA increased with increasing slope up to approximately 25° and then declined as slope increased until an asymptote was reached just above a logistic probability of 50% (Fig. 1(a)).Potomac soils were associated with the highest probability followed closely by Fluvaquents and Philo-2 soils compared to other soil types (Table 3; Fig. 1(b)).Areas along the outside of meander bends had the highest probability of being a TEA followed by the inside of meander bends, and other levels of bank stress having similar probability (Fig. 1(c)).With regard to underlying geology, areas comprised of alluvium had the highest probability of being a TEA with areas containing sandstone, limestone, and shale units exhibiting respective decreases in probability (Fig. 1(d)).
Our classification scheme built from logistic model output shows that 96.9% (8.5 km 2 ) of the study extent was below the MTP, and therefore had low SEP (Fig. 2).Areas with moderate and high SEP made up 2.7% (0.24 km 2 ) and 0.3% (0.03 km 2 ) of the study extent, respectively.An ANOVA revealed that our classification scheme was a reliable predictor of streambank migration rate (F1,149 = 33.2;P < 0.001), as sites with low SEP, which averaged 0.22 m/year (SE = 0.02) were different from sites with moderate SEP with an average of 0.41 m/year (SE = 0.03) (P < 0.001), and from sites with high SEP, which averaged 0.45 m/year (SE = 0.04)(P < 0.001).

CONCLUSIONS AND PERSPECTIVES
All model validation procedures indicated that our model performed well and our classification scheme was useful for predicting SEP.Thus, we believe this approach could be applied in other watersheds to enhance management by providing a high resolution prediction over large spatial extents.The most important predictor was slope where bank slopes of 25° had the highest probability of being a TEA.Steeper slopes are common in the watershed, but are typically composed of shale and thus have greatly reduced erosion rates compared to alluvial reaches.Soil type was also important as Fluvaquents and Philo soils, which contain as much as 85% sand in deeper horizons (i.e.1-2 m) and Potomac soils, which contain as much as 100% sand (USDA, 2011), were associated with high SEP.Deeper soil horizons are often exposed in incised channels where soils with high sand content are very susceptible to fluvial erosion (Micheli & Kirchner, 2002;Simon et al., 2008;Pitchford, 2012).The outside of meander bends had higher SEP compared to other levels of bank stress.This was not surprising as these areas are exposed to the highest amount of shear stress (Bloom, 1998).With regard to underlying geology, alluvium had the highest SEP.This was also not surprising as alluvium is previously eroded material (Bloom, 1998).Overall, the influence of boundary conditions was in agreement with other studies that have shown streambank slope, soil characteristics, bank position, and underlying geology are important predictors of SEP (Simon et al., 2003).
Our classification scheme was effective for differentiating sites with low SEP from sites with moderate and high SEP, but could not distinguish between moderate and high levels of SEP.Although variability in streambank migration in these classes overlapped, a larger sample size among sites with high SEP would improve the ability to detect a difference.Overall, our results show that this approach has utility for gauging relative stability at the watershed (50-500 km 2 ), segment (100-10 000 m 2 ), and reach (10-1000 m 2 ) scale and could help prevent unnecessary construction in areas that are relatively stable, yet may appear to be degrading.Such areas can become a liability following restoration activity that results in reductions in flood plain roughness, which can cause bank failure (Smith & Prestegaard, 2005;Pitchford, 2012).
We created only three levels of SEP from our model, but we could have easily created more categories of SEP, or generated a continuous prediction to enhance relative comparisons within the watershed.This could be very insightful for prioritizing sites for management and could help avoid attempts to stabilize streambanks with low probability of success.For example, the model can help to differentiate between areas with clay soils positioned along straight reaches (lower SEP), which have higher probability of restoration success compared to areas with sandy soils on the outside of meander bends (high SEP).Although an area with high SEP may erode at higher rates, it may be too dynamic to attempt streambank stabilization.Other applications for the model include greater understanding of conditions associated with stable sites within the watershed, which could help to inform restoration design similar to the reference reach approach used in Natural Channel Design (NCD) (Rosgen, 1998).
Overall, we believe that this maximum entropy model of SEP is a great example of an assessment tool that could enhance watershed planning by helping to prioritize sites for management, assess the relative importance of boundary conditions, and identify characteristics associated with stable sites within the watershed.This type of process model is critical for bridging the gap between restoration science and practice, which will ultimately improve the success of watershed management initiatives worldwide.

Fig. 2
Fig. 2 Logistic output from a maximum entropy model of streambank erosion potential (SEP) in a portion of the Cacapon River Watershed, West Virginia.

Table 1
Environmental variables used to model streambank erosion potential (SEP) in the Cacapon River Watershed, West Virginia.

Table 2
Average percent contribution and permutation importance values for each predictor variable in a maximum entropy model of stream bank erosion potential (SEP).Permutation importance values for each variable are determined by randomly permuting the variable values among the training points and quantifying the ensuing decrease in training AUC.

Table 3
Percent sand, bulk density, soil erodibility for highly erodible soil types in the Cacapon River Watershed.Percent sand and bulk density are estimated ranges from 0-150 cm soil depth.Soil erodibility is an average value over 0-150 cm.