Satellite and gauge rainfall merging using geographically weighted regression

A residual-based rainfall merging scheme using geographically weighted regression (GWR) has been proposed. This method is capable of simultaneously blending various satellite rainfall data with gauge measurements and could describe the non-stationary influences of geographical and terrain factors on rainfall spatial distribution. Using this new method, an experimental study on merging daily rainfall from the Climate Prediction Center Morphing dataset (CMOROH) and gauge measurements was conducted for the Ganjiang River basin, in Southeast China. We investigated the capability of the merging scheme for daily rainfall estimation under different gauge density. Results showed that under the condition of sparse gauge density the merging rainfall scheme is remarkably superior to the interpolation using just gauge data.


INTRODUCTION
Satellites usually have near global coverage for remote rainfall monitoring and they are especially valuable for regions that lack adequate surface-based measuring techniques.At the same time, satellite rainfall datasets are usually free of charge and their availability is not limited by administration factors.Due to these advantages, in recent years significant developments have been achieved in the field of satellite rainfall estimation.However, satellite rainfall estimates have been produced at rather coarse spatial resolutions (0.04° × 0.04° to 0.25° × 0.25°).Moreover, satellite rainfall is usually very inaccurate compared with gauge measurements.Thus, the full utilization of satellite rainfall in hydrologic and water resources management applications has been hindered.
To overcome this dilemma and rationally utilize satellite rainfall information, recently great efforts have been dedicated to merging satellite and gauge rainfall data.Through blending the spatially continuous but coarse satellite rainfall with discrete but accurate gauge measurements, a new kind of rainfall with finer resolution can be generated.Because merging would offset the measurement errors of the two rainfall estimates, the quality of the combined rainfall may be improved to some degree.At present, various rainfall merging schemes have been developed for experimental or operational use, such as conditional merging (Sinclair and Pegram, 2005), Bayesian merging (Todini, 2001), statistical objective analysis (Pereira Filho, 2004).
Although various schemes have been developed, rainfall merging is still a complex and important issue.The results of rainfall merging are influenced by the kind of merging scheme, the quality of satellite rainfall data, the density of raingauges and so on.Motivated by this, the objective of this paper is to develop a residual-based method for merging satellite and raingauge rainfall using geographically weighted regression (GWR).Theoretically, this novel method is capable of simultaneously blending various satellite rainfall data with gauge measurements and could describe the non-stationary influences of geographical and terrain factors on rainfall spatial distribution.Using the proposed method, an experimental study on merging the rainfall from CMOROH (Joyce et al., 2004) and gauge measurements was conducted for the Ganjiang River basin, in southeast China.The capability of our merging scheme for constructing daily rainfall fields under different gauge densities is investigated and discussed.The accuracy gain achieved by rainfall merging relative to traditional interpolation merely only raingauge measurements is analysed.

Study area
The Ganjiang River Basin is located between 113°30′E−116°40′E and 24°29′N−29°11′N in Southeast China.With a drainage area of 83 374 km 2 , it is a major sub-catchment of Poyang Lake, the largest freshwater lake in China (Fig. 1).The study area is one of the typical rainstorm regions in China.Mean annual precipitation is about 1580 mm in this region.

Dataset
This study area has a dense raingauge network consisting of 325 stations (Fig. 1).These gauges are well-distributed spatially and their density is about one per 256 km 2 .The quality of all the rainfall measurements has been proven by strict quality checks and control.Using these observations, point-wise daily rainfall series were obtained for the period of 2003-2009.Satellite rainfall from CMORPH during the period of 2003-2009 was also collected.The spatial and temporal resolutions of CMORPH are0.25°×0.25°and half-hourly, respectively.The daily rainfall series is obtained by accumulating the rainfall of 48 half-hour episodes within a day.

METHODOLOGY GWR background
GWR is a type of regression model with spatially varying coefficients (Fotheringham et al., 2003).It enables a non-stationary relationship between the variables in the regression model.By calculating local statistics, spatial relationships can be identified and utilized for prediction.GWR also disaggregates spatial patterns in the model residuals and reduces the spatial autocorrelation.The basic formula of GWR is expressed as: where Yi and Xik are respectively the dependent and k-th independent variable at location i; ui and vi are the coordinates; βi0(ui, vi) is the intercept, βik(ui, vi) is the local regression parameter for Xik and εi is the residual, p is the number of independent variables and n is the number of observations.
Equation ( 1) could be rewritten using a matrix method: where ⊗ is the sign for logical multiplication;  is the error vector; X and β are two matrices consisting of independent variables and local regression coefficients, respectively: The number of unsolved parameters in equation ( 2) is n × (p + 1), which exceeds the number of observations.To solve this equation, GWR estimates the coefficients using local weighted leastsquares regression: where  �  is the coefficient vector for location i and Wi is the spatial weight matrix: The estimates calculated by  �  for the observation at location i are as follows: GWR assumes that observations closer together will have more impact on each other than on observations further apart.Hence, a distance decay kernel function is employed for the spatial weight matrix.When the distance between observations is greater than the kernel bandwidth, the weight rapidly approaches zero.In summary, the kernel function could be grouped into two types, namely the fixed and adaptive bandwidths.The former calculates a bandwidth that is held constant over space, whereas the latter can adapt bandwidth distance in relation to variable density; bandwidths are smaller where data are dense and larger when data are sparse.
In this study, the adaptive kernel bandwidth was used as sample densities varied spatially.The weight using the exponential distance decay function is calculated as: where wij is the weight of observation j for observation i; dij is the distance between observation i and j; dik is the distance between observation i and its k-th nearest neighbour.
The key step of calibrating GWR is to determining the optimal bandwidth distance (i.e.dik).In this paper, it was determined automatically using the corrected Akaike information criterion (AICC) (Fotheringham et al., 2003).

GWR based merging
A residual-based analysis is proposed for merging satellite and raingauge rainfall.It estimates a preliminary rainfall field, known as the background field, using satellite rainfall and estimates the residual field using residuals at observed points considering the influences of some related variables.The merged field is then given by the combination of the predicted error field and background field.
Taking P B and P O as the notation for the background field and observed field respectively, the relationship between them and the true field P T is expressed as: =   +   (10) where e B and e O represent the background and observation errors.Here, the expection of e B and e O is denoted using µB and µO, and the variation is denoted by   2 and   2 , respectively.Under the assumption of µO equal to zero and   2 much larger than   2 , the following equation can be derived: Equation ( 11) implies that the residual field could be approximated by the difference between the observation and background fields.However, considering P O is just known at limited locations, it is required to estimate e B at those locations without gauge observations.Assuming that background errors are generally correlated in space, this issue can be resolved through local interpolation using some nearby values with observations.
Based on the framework of the residual-based merging, this paper proposed the merging scheme based on GWR.This method has three main steps.First, to construct a background using GWR, we describe the relationship between the background rainfall and the satellite estimates at any place using local regression: where    is the estimate corresponding to the k-th kind of satellite rainfall at location i; bi0 is the intercept, bik is the local regression parameter.bi0 and bik are both probably non-stationary in space.
Secondly, also using GWR, at those locations without observations e B is calculated.We assumed that the relationship between e B and geographic factors including coordinates u, v and elevation z could be described using a locally non-stationary regression equation: Thirdly, the merged field P M can be obtained by combining the background field and the estimated residual: After combination of the similar items, equation ( 14) can be rewritten as: Equation ( 15) is a general form for the rainfall merging scheme based on GWR.As a regression model, the number of satellite rainfall in the merging method is theoretically limitless.Thus, this proposed method is capable of simultaneously blending multiple kinds of satellite rainfall data with gauge measurements.At the same time, equation ( 15) describes the nonstationary influences of geographical and terrain factors on the rainfall spatial distribution.Although, the gauge rainfall observations are not seen directly in the regression model, their effect on the merged results is indirectly reflected via the spatially varying regression coefficients derived by equation (5).

Performance assessment
After the coefficients in equation ( 15) are optimized using AICC, the merged rainfall at any location within the study area can be estimated.We divided all the rainfall data from 325 gauges in the Ganjiang River basin into two parts.One part was selected as the calibration data for the GWR merging model and the remainder was used for validation.Then, the merged rainfall was compared with the validation data and two performance indices: the mean absolute error (MAE) and spatial correlation coefficient (CC) were calculated.For one day, the two indices are calculated as follows: where nv is the number of observation for validation, and  �   and  �   stand for the average values at the validation locations.
To explore the accuracy gained by merging surface measurements with CMORPH rainfall relative to traditional interpolation using only gauge measurement, two kinds of daily rainfall fields were generated.The first was generated by the GWR based merging scheme both using CMORPH and gauge rainfall as the data sources, whereas the second was generated by GWR interpolation using only the same gauge rainfall (see equation ( 18)).Here, we use GWR-M and GWR-I to denote the two rainfall field construction methods, respectively.Then, under different calibration gauge data, MAE and CC for GWR-M and GWR-I were calculated respectively.
To evaluate the performance improvement gained by GWR-M relative to GWR-I, we further calculate the ratio of MAE and CC between the two methods: where MAEM and MAEI mean the MAE for GWR-M and GWR-I, respectively, and similarly for CCM and CCI.When RMAE and RCC are positive, the error magnitude of the estimated rainfall field produced by the merging scheme is lower than with interpolation and the spatial structure is also raised.

RESULT AND DISCUSSION
For the Ganjiang River basin, an experimental study on CMORPH and gauge rainfall merging was conducted.Using GWR-M and GWR-I, two sets of daily rainfall fields were generated.The rainfall fields are all at a spatial resolution of 1 km × 1 km.To investigate the accuracy gain by GWR-M relative to GWR-I under different gauge densities, we gradually changed the number of raingauges data for model calibration and calculated the accuracy indices using the validation data.
Table 1 shows the results for GWR-M and GWR-I.In Table 1, MAE and CC for the two kinds of daily rainfall fields are mean values for the 2557 days from 2003 to 2009; the raingauge relative density (denoted using Rd) means the number of calibration raingauges divided by the total 325 gauges over the study area.For example, when Rd is 2/3, it means that daily rainfall data from 2/3 of the 325 gauges were selected for calibration while the other 1/3 were used for validation.It is seen from Table 1 that both accuracy indices for GWR-M and GWR-I are improved as Rd increases.This phenomenon is easy to recognize because the efficient information provided by surface measurements for the analysed rainfall fields is approximately proportional to the raingauge density.However, the change ratios of MAE and CC with Rd are not even.When Rd is less than 1/5, MAE and CC are rather sensitive to the increasing of Rd.However, when Rd exceeds 1/5, the changing traits for MAE and CC are reversed.

Fig. 1
Fig. 1 Sketch map of the location of the study area and the raingauges' distribution.

Table 1
Accuracy indices for daily rainfall fields generated by GWR-M and GWR-I during 2003-2009 in the Ganjiang River basin.