This vignette will demonstrate the main functionality of
remotePARTS
by working through a real remote sensing data
set. The example follows the more detailed development of the methods in
Ives et al. (2021, Remote Sensing of Environment, https://doi.org/10.1016/j.rse.2021.112678).
First, install remotePARTS
:
or
Then, load the package.
This vignette will use a few tidyverse
packages for
visualizing the data:
remotePARTS
ships with a data object. This data set
contains NDVI values derived from NASA’s MODIS satellite for the US
State of Alaska. The object is ndvi_AK10000
and, due to
package size limitations, this data set is a random sampling of 10,000
pixels from the full Alaska data set. This is just for the sake of this
vignette: remotePARTS
can easily handle the full map (and
much larger maps).
For this vignette, we will also create a smaller 3000 pixel subsample
ndvi_AK3000
for demonstrative purposes:
The ndvi_AK10000
object is a data.frame
with 37 columns: lng
and lat
are longitude and
latitude, respectively; AR_coef
and CLS_coef
are pre-calculated coefficient estimates of the temporal ndvi trends,
from pixel-level time series analyses (AR REML and conditional least
squares, respectively). These coefficient estimates have been
standardized by the mean ndvi value for the pixel over that time period;
land
is a factor representing land cover classes; and the
remaining 32 columns – of the form ndviYYYY
– contain the
NDVI values from 1982 to 2013. These data already have rare land classes
(those that occur in less than 2% of pixels) removed and contain no
missing values. remotePARTS
does not fully
support missing data, so it is recommended that missing observations are
removed in pre-processing.
str(ndvi_AK10000)
## 'data.frame': 10000 obs. of 37 variables:
## $ lng : num -151 -145 -144 -163 -145 ...
## $ lat : num 63.3 64.7 62.7 67.7 65.8 ...
## $ AR_coef : num 0.01071 0.003586 0.002488 0.000373 0.002505 ...
## $ CLS_coef: num 0.00529 0.001537 0.002276 0.000476 0.001514 ...
## $ land : Factor w/ 10 levels "Evergr needle",..: 6 7 7 6 7 7 6 6 7 7 ...
## $ ndvi1982: num 1.88 7.68 8.46 6.85 10.25 ...
## $ ndvi1983: num 1.74 8.14 7.99 6.97 10.35 ...
## $ ndvi1984: num 1.96 8.47 8.04 6.67 10.29 ...
## $ ndvi1985: num 1.74 7.72 7.96 6.9 10.02 ...
## $ ndvi1986: num 1.92 8.31 8.64 6.43 10.39 ...
## $ ndvi1987: num 2.04 8.89 8.4 7 10.47 ...
## $ ndvi1988: num 2.13 8.5 8.32 7.12 10.05 ...
## $ ndvi1989: num 1.93 8.99 8.79 6.24 10.16 ...
## $ ndvi1990: num 1.73 8.66 8.45 7.46 10.31 ...
## $ ndvi1991: num 1.99 8.74 8.34 6.86 10.28 ...
## $ ndvi1992: num 1.74 7.99 7.94 6.53 9.77 ...
## $ ndvi1993: num 1.98 8.62 7.93 6.97 10.35 ...
## $ ndvi1994: num 2.04 8.59 8.26 7.15 10.25 ...
## $ ndvi1995: num 1.9 8.1 8.55 7.08 10.18 ...
## $ ndvi1996: num 2.09 8.61 8.23 7.08 10.33 ...
## $ ndvi1997: num 1.81 8.77 8.99 7.24 10.23 ...
## $ ndvi1998: num 1.95 7.9 8.57 7.45 9.99 ...
## $ ndvi1999: num 1.75 8.14 8.61 6.75 10.04 ...
## $ ndvi2000: num 1.71 7.5 8.37 6.7 9.92 ...
## $ ndvi2001: num 1.57 7.76 8.29 6.66 9.69 ...
## $ ndvi2002: num 1.72 7.83 8.62 7.09 9.76 ...
## $ ndvi2003: num 1.95 7.94 8.35 6.87 9.76 ...
## $ ndvi2004: num 2.2 8.37 8.83 7.3 11.74 ...
## $ ndvi2005: num 2.18 8.45 8.63 7.18 11.62 ...
## $ ndvi2006: num 2.06 8.32 8.36 6.74 11.59 ...
## $ ndvi2007: num 2.39 8.53 8.63 7.47 11.91 ...
## $ ndvi2008: num 2.36 8.6 8.41 7 11.8 ...
## $ ndvi2009: num 2.35 10.02 9.27 6.59 11.62 ...
## $ ndvi2010: num 2.95 11.09 9.26 7.09 11.46 ...
## $ ndvi2011: num 2.5 8.9 8.94 7.05 12.15 ...
## $ ndvi2012: num 2.71 8.71 8.39 6.21 10.97 ...
## $ ndvi2013: num 2.54 9 8.95 6.93 10.38 ...
For this demonstration, we are interested in asking the following questions:
Is NDVI in Alaska increasing over time?
Are Alaska’s NDVI time trends associated with land cover classes? and
Do Alaska’s NDVI time trends differ with latitude?
The figure below shows a temporal cross-section of these data for 1982, 1998, and 2013.
tibble(ndvi_AK10000) |>
pivot_longer(cols = c("ndvi1982", "ndvi1998", "ndvi2013")) |>
ggplot(aes(x = lng, y = lat, col = value, fill = value)) +
geom_tile() +
facet_wrap(~ gsub("ndvi", "", name), ncol = 3) +
scale_color_viridis_c(option = "magma") +
scale_fill_viridis_c(option = "magma") +
labs(x = "Longitude", y = "Latitude", col = "ndvi", fill = "ndvi")
The following figure shows how Alaska’s three primary land cover classes are distributed.
ndvi_AK10000 |>
ggplot(aes(x = lng, y = lat, col = land, fill = land)) +
geom_tile() +
scale_color_viridis_d(direction = -1, end = .9) +
scale_fill_viridis_d(direction = -1, end = .9) +
labs(y = "Latitude", x = "Longitude", col = "Land cover",
fill = "Land cover")
Use help("ndvi_AK10000")
to see documentation for these
data.
When using remotePARTS
, the data are assumed to follow
the general stochastic process of the form
\[y(t) = X(t)\beta + \varepsilon(t)\] where
\(y(t)\) is a response variable of interest at time \(t\)
\(\beta\) is a vector of coefficients for the predictor variables \(X(t)\) on \(y(t)\)
\(\varepsilon(t)\) is an temporal autoregressive process: \(\varepsilon(t) = \rho \varepsilon(t - 1) + \delta(t)\)
at any time step, \(\delta(t)\) is spatially autocorrelated according to covariance matrix \(\Sigma\): \(\delta(t) \sim N(0, \Sigma)\)
there is no temporal dependence in \(\delta(t)\) (i.e., \(\delta(t_i)\) is independent of \(\delta(t_j)\) except when \(t_i\) = \(t_j\))
The first step in a typical remotePARTS
workflow is to
obtain pixel-level estimates of time-series coefficients. In our
example, we are interested in estimating the time trends in NDVI for
each pixel \(i\), represented by \(\beta_1\) in the regression model
\[y_i(t) = \beta_0 + \beta_1 t + \varepsilon_i(t)\]
where the random errors \(\varepsilon_i(t)\) follow an AR(1) process:
\[\varepsilon_i(t) = b\varepsilon_i(t - 1) + \delta_i(t)\] \[\delta_i(t) \sim N(0 , \sigma)\]
We will use the function fitAR_map()
to estimate \(\beta_1\), which fits pixel-level AR(1)
models to a data frame of pixels and estimates coefficients using
restricted maximum likelihood (REML). To do so, we must extract only our
NDVI columns as the matrix Y
. We’ll do this by matching all
column names containing “ndvi” and slicing the data.frame:
We also need a 2-column coordinate matrix coords
:
# coords <- as.matrix(ndvi_AK3000[, c("lng", "lat")])
coords <- ndvi_AK3000 |> select("lng", "lat") |> as.matrix()
Y
and coords
are then passed to
fitAR_map()
with default settings:
Coefficient estimates can be obtained from ARfit
with
coefficients()
. The first column is the estimate of \(\beta_0\), \(\hat{\beta_0}\), and the second is \(\hat{\beta_1}\).
coefficients(ARfit) |> head()
## (Intercept) t
## [1,] 1.703446 0.021923615
## [2,] 7.980618 0.030464243
## [3,] 8.146784 0.021130056
## [4,] 6.883592 0.002582853
## [5,] 10.081968 0.026466877
## [6,] 7.628119 -0.010395964
These time-series analyses calculate the time trend in the raw NDVI
data. In most situations it makes sense to ask if there are time trends
in the relative NDVI values, that is, changes in NDVI relative to the
mean value of NDVI in a pixel. Scaling the trend in NDVI relative to the
mean gives assessments of the proportional change in NDVI.
These trends in proportional NDVI are calculated be dividing \(\hat{\beta_1}\) by the mean. The values of
the trend coefficients are contained in ARfit$coefficients
,
and since the coefficients for the trend are in the column of the
coefficients matrix named t
, the scaling is performed
as
These scaled values of the time trend are then stored in the
ndvi_AK3000
data frame.
Below is an image of the estimated coefficients (pre-calculated and
scaled) for the full ndvi_AK10000
. From this, it appears
that northern latitudes may be greening faster than more southern
latitudes.
ndvi_AK10000 %>%
filter(AR_coef <= 0.06) |>
ggplot(aes(x = lng, y = lat, col = AR_coef, fill = AR_coef)) +
geom_tile() +
scale_color_viridis_c(option = "magma") +
scale_fill_viridis_c(option = "magma") +
guides(fill = "none") +
labs(
y = "Latitude", x = "Longitude", col = expression(beta[1]),
fill = expression(beta[1])
)
fitAR_map
and its conditional least-squares counterpart,
fitCLS_map
, are wrappers for the functions
fitAR
and fitCLS
which conduct individual time
series analysis. If the user wants, these can be applied on a
pixel-by-pixel basis to the data to allow greater flexibility. Both AR
REML and CLS methods account for temporal autocorrelation in the time
series. See function documentation for more details (i.e.,
help("fitAR_map")
).
Now that we’ve reduced the temporal component of each time series to a single value (i.e., estimates of \(\beta_1\)) while accounting for temporal autocorrelation, we can focus on the spatial component of our problem.
The first step is to calculate the distances among pixels as a
distance matrix D
. Here, we’ll calculate relative distances
with distm_scaled()
from our coordinate matrix.
distm_scaled()
scales distances across the spatial
domain so that the greatest distance between two pixels is 1. Because
distm_scaled()
uses geosphere::distm()
, it
treats coordinates as degrees longitude and latitude on a sphere and
calculates distances accordingly.
Next, we will estimate the expected correlation between the random
errors of our spatial response variable (estimates of \(\beta_1\)) based on their distances. To do
so, we need a spatial covariance function. In this example, we will use
an exponential covariance: \(V =
\exp\big(\frac{-D}{r}\big)\) where \(r\) is a parameter that dictates the range
of spatial autocorrelation. The function covar_exp()
implements exponential covariance.
The below figure is a semivariogram of covar_exp
for two
values of r
.
ggplot() +
xlim(0, 1) +
geom_function(
fun = covar_exp, args = list(r = 0.05), aes(col = "0.05"), linewidth = 1
) +
geom_function(
fun = covar_exp, args = list(r = 0.1), aes(col = "0.1"), linewidth = 1
) +
geom_function(
fun = covar_exp, args = list(r = 0.2), aes(col = "0.2"), linewidth = 1
) +
labs(x = "distance", y = "covar_exp(distance, r)", col = "r") +
theme_bw() +
theme(
legend.position = "inside", legend.position.inside = c(1, 1),
legend.justification = c(1, 1), legend.background = element_blank()
) +
scale_color_viridis_d(end = 0.95)
If we know the range parameter \(r\), we can calculate the covariance
V
from distances D
with
covar_exp()
:
V
represents the correlation among points if all
variation is accounted for. However, it is safest to assume that there
is some additional source of unexplained and unmeasured variance (i.e.,
a nugget \(\eta\)).
Here is a semivariogram, with and without a nugget component for
covar_exp
:
ggplot() +
xlim(0, 1) +
geom_function(
fun = function(x, n)(1-n)*covar_exp(x, 0.2),
args = list(n = 0), aes(col = "0"), linewidth = 1
) +
geom_function(
fun = function(x, n)(1-n)*covar_exp(x, 0.2),
args = list(n = 0.3), aes(col = "0.3"), linewidth = 1
) +
geom_function(
fun = function(x, n)(1-n)*covar_exp(x, 0.2),
args = list(n = 0.5), aes(col = "0.5"), linewidth = 1
) +
labs(x = "distance", y = "covar_exp(distance, r = 0.2)", col = expression(eta)) +
theme_bw() +
theme(
legend.position = "inside", legend.position.inside = c(1, 1),
legend.justification = c(1, 1), legend.background = element_blank()
) +
scale_color_viridis_d(end = 0.95)
We assume that the covariance structure among pixels is given by \(\Sigma = \eta I + (1-\eta)V\) where \(I\) is the identity matrix:
See help("covar_exp")
for a description of the all
covariance functions provided by remotePARTS
and for more
information regarding their use.
To test spatial hypotheses with remotePARTS
, we use a
generalized least-squares regression model (GLS):
\[\theta = X\alpha_{gls} + \gamma \] where
\(\theta\) is a vector of response values
\(\alpha_{gls}\) is a vector of the effects of predictor variables \(X\) on \(\theta\)
the error term \(\gamma\) is spatially autocorrelated according to \(\Sigma_\gamma\): \(\gamma \sim N(0, \Sigma_\gamma)\)
\(\theta\) will usually be a regression parameter. For example, if we’re interested in understanding trends in NDVI over time, we would use the pixel-level regression coefficient for the effect of time on NDVI (i.e., \(\theta = \hat\beta\)).
If the parameters that govern the spatial autocorrelation \(\Sigma_\gamma\) are known, a GLS can be fit
with fitGLS()
. Here, we will fit the GLS by providing (i) a
model formula, (ii) a data source, (iii) our V
matrix,
which was pre-calculated with a spatial parameter \(r = 0.1\) and (iv) a nugget of \(\eta = 0.2\).
The specific task in this examples is to estimate the effect of land
cover class on our time trend. Because land
is a factor,
we’ll also specify a no-intercept model to make interpretation
clearer.
The GLS fitting process requires an inversion of V
. This
means that even with only the 3000-pixel subset, it will take a few
minutes to finish the computations on most computers (but see the
ncores
and parallel
arguments).
fitGLS
adds the nugget to V
internally. If
we wanted to do this ourselves, we could pass the covariance matrix
Sigma
which already contains a nugget component and then
set the nugget
argument of fitGLS
to 0:
The estimates for our land class effects can be extracted with
coefficients()
.
coefficients(GLS.0)
## landShrubland landSavanna landGrassland
## 0.0008646565 0.0002554862 0.0011294576
The full model fit object looks like this:
GLS.0
##
## Call:
## fitGLS(formula = AR_coef ~ 0 + land, data = ndvi_AK3000, V = V,
## nugget = nugget)
##
## t-tests:
## Est t.stat pval.t
## landShrubland 0.0008647 0.5065 0.6126
## landSavanna 0.0002555 0.1493 0.8813
## landGrassland 0.0011295 0.6648 0.5062
##
## F-test:
## model df_F SSE MSE logLik Fstat pval_F
## 1 AR_coef ~ 0 + land 2 0.1128 3.764e-05 12808 4.461 0.01163
## 2 AR_coef ~ 1 2997 0.1131 3.775e-05 12802 NA NA
Although none of the three land cover classes show a statistically significant time trend in (proportional) NDVI, the F-test is significant. This means that including the model including land cover explains the data better than than the intercept-only model: overall, land cover explains significant variation in temporal trends of NDVI.
In practice, we rarely know the values of the parameters that govern spatial autocorrelation (e.g., the range and nugget) in advance. Therefore, these parameters will need to be estimated for most data.
The spatial parameters of a covariance function (e.g.,
covar_exp
) can be estimated from residuals of pixel-level
time-series models (see Ives et al. RSE, 2021). Although we conducted
the time-series analyses as though each pixel was independent (with
fitAR_map()
), they are, in fact, dependent. Specifically,
the correlation of the residuals from the pixel-level analyses is
roughly proportional to the spatial autocorrelation of the residuals of
the spatial model, \(\gamma\), if all
of the variation in \(\gamma\) is due
to the spatiotemporal variation produced by \(\varepsilon_i(t)\). Therefore, we can
estimate the range parameter for V as \(V_{ij} \approx \text{cor}\big(\varepsilon_i(t),
\varepsilon_j(t)\big)\)
The function fitCor()
performs the estimation of spatial
parameters. We will pass to this function (i) the time-series residuals
for our map, extracted from the time-series analysis output object with
residuals()
, (ii) the coordinate matrix
coords
, (iii) the covariance function
covar_exp()
, and (iv) a list specifying that we should
start the search for the optimal range parameter at 0.1. For this
example, we will also specify fit.n = 3000
, which ensures
that all 3000 pixels are used to estimate spatial parameters.
corfit <- fitCor(
resids = residuals(ARfit), coords = coords, covar_FUN = "covar_exp",
start = list(range = 0.1), fit.n = 3000
)
(range.opt = corfit$spcor)
## range
## 0.1083729
By default, fitCor()
uses distm_scaled()
to
calculate distances from the coordinate matrix, but any function that
returns a distance matrix can be specified with the
distm_FUN
argument. It is important to scale the parameter
values appropriately, accounting for your distances. For example, if we
instead use distm_km()
to calculate distance in km instead
of relative distances, we would need to scale our starting range
parameter by the maximum distance in km of our map:
max.dist <- max(distm_km(coords))
corfit.km <- fitCor(
resids = residuals(ARfit), coords = coords, covar_FUN = "covar_exp",
start = list(range = max.dist*0.1), distm_FUN = "distm_km", fit.n = 3000
)
Depending on the covariance function used, not all parameters will
need scaling. For example, covar_exppow()
is an
exponential-power covariance function and takes a range and shape
parameter, but only the range parameter should scale with distance. See
?fitCor()
for more details.
After we’ve obtained our range parameter estimate, we can use it to
re-calculate the V
matrix:
Similar to finding the optimal spatial parameters, the nugget can be
estimated by selecting a nugget that maximizes the likelihood of the GLS
given the data. fitGLS()
will find this maximum-likelihood
nugget when nugget = NA
is specified. This type of
optimization requires fitting multiple GLS models, which means it will
be much slower than our call to fitGLS()
with a known
nugget.
In addition to our original arguments, we’ll also explicitly set
no.F = FALSE
so that F-tests are calculated. For the
F-tests, the default reduced model is the intercept-only model, although
it is also possible to specify alternative models as a formula in the
formula0
option.
GLS.opt <- fitGLS(
formula = AR_coef ~ 0 + land, data = ndvi_AK3000, V = V.opt,
nugget = NA, no.F = FALSE
)
(nug.opt = GLS.opt$nugget)
## [1] 0.1342314
coefficients(GLS.opt)
## landShrubland landSavanna landGrassland
## 0.0009201230 0.0003899361 0.0011467324
Let’s compare our GLS from earlier with this one with optimized parameters:
(mod_compare <- tribble(
~mod, ~range, ~nugget, ~logLik, ~MSE,
"GLS.0", r, GLS.0$nugget, GLS.0$logLik, GLS.0$MSE,
"GLS.opt", range.opt, GLS.opt$nugget, GLS.opt$logLik, GLS.opt$MSE
))
## # A tibble: 2 × 5
## mod range nugget logLik MSE
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 GLS.0 0.1 0.2 12808. 0.0000376
## 2 GLS.opt 0.108 0.134 12809. 0.0000500
In this example, logLik
for GLS.opt
is not
functionally different than logLik
for GLS.0
.
This indicates that using the values of range
= 0.1 and
nugget
= 0.2 gives a similar likelihood than the optimal
model when range
value is calculated from
covar_exp()
, GLS.opt
. The key difference
between these models is that the MSE is higher for GLS.opt
,
indicative of the increased uncertainty.
It is also possible to simultaneously estimate spatial parameters and
the nugget without using the time-series residuals. This is done by
finding the set of parameters (e.g., range
and
nugget
) that maximizes the likelihood of a GLS given the
data. This task is computationally slower than optimizing
nugget
alone with fitGLS()
and therefore will
take some time to run.
fitopt <- fitGLS_opt(
formula = AR_coef ~ 0 + land, data = ndvi_AK3000,
coords = ndvi_AK3000[, c("lng", "lat")],
covar_FUN = "covar_exp",
start = c(range = .1, nugget = .2),
method = "BFGS", # use BFGS algorightm (see ?stats::optim())
control = list(reltol = 1e-5) # lower the convergence tolerance (see ?stats::optim())
)
fitopt$opt$par
# range nugget
# 0.02497874 0.17914929
fitopt$GLS$logLik
# [1] 12824.77
fitopt$GLS$MSE
# [1] 2.475972e-05
Because fitGLS_opt()
does not require time series
residuals, it is possible to use fitGLS_opt()
for
statistical problems involving only spatial variables. In other words,
rather than \(\theta\) being limited to
spatially distributed trend coefficients, it can also be any spatial
variable.
When time-series residuals are available, we recommend estimating
spatial parameters with fitCor()
and fitGLS()
,
rather than fitGLS_opt()
. The former typically has better
statistical performance than the latter. See ?fitGLS_opt()
for more information about this function and its use.
Ultimately, the purpose of the tools provided by
remotePARTS
is map-level hypotheses testing. In this
example, we will test 3 hypotheses using 3 different GLS models.
If we want to test the hypothesis that “there was a trend in Alaska
NDVI
from 1982-2013”, we can fit an intercept-only GLS model:
(GLS.int <- fitGLS(
AR_coef ~ 1, data = ndvi_AK3000, V = V.opt, nugget = nug.opt, no.F = TRUE
))
##
## Call:
## fitGLS(formula = AR_coef ~ 1, data = ndvi_AK3000, V = V.opt,
## nugget = nug.opt, no.F = TRUE)
##
## t-tests:
## Est t.stat pval.t
## (Intercept) 0.000972 0.4564 0.6481
##
## Model statistics:
## SSE MSE logLik
## 1 0.1503 5.011e-05 12806
We can see from the t-test that the intercept is not statistically different from zero: there is no map-level temporal trend in NDVI across the entire data set.
If we want to test the hypothesis that “trends in Alaskan NDVI differ
by land- cover class”, we can use GLS.opt
from earlier:
GLS.opt <- fitGLS(
formula = AR_coef ~ 0 + land, data = ndvi_AK3000, V = V.opt,
nugget = NA, no.F = FALSE
)
GLS.opt
##
## Call:
## fitGLS(formula = AR_coef ~ 0 + land, data = ndvi_AK3000, V = V.opt,
## nugget = NA, no.F = FALSE)
##
## t-tests:
## Est t.stat pval.t
## landShrubland 0.0009201 0.4306 0.6668
## landSavanna 0.0003899 0.1822 0.8554
## landGrassland 0.0011467 0.5385 0.5903
##
## F-test:
## model df_F SSE MSE logLik Fstat pval_F
## 1 AR_coef ~ 0 + land 2 0.1499 5.003e-05 12809 3.263 0.03841
## 2 AR_coef ~ 1 2997 0.1503 5.014e-05 12805 NA NA
The t-tests show that the trend in NDVI, for all land cover classes, was not statistically different from zero: NDVI did not exhibit a trend in any land cover class. The F-test (ANOVA table), however, provides evidence that land class collectively explain some variation in NDVI trends.
Finally, to test the hypothesis that “temporal trends in NDVI differ with latitude”, we can regress the AR coefficient on latitude:
(GLS.lat <- fitGLS(
AR_coef ~ 1 + lat, data = ndvi_AK3000, V = V.opt, nugget = nug.opt,
no.F = FALSE
))
##
## Call:
## fitGLS(formula = AR_coef ~ 1 + lat, data = ndvi_AK3000, V = V.opt,
## nugget = nug.opt, no.F = FALSE)
##
## t-tests:
## Est t.stat pval.t
## (Intercept) -8.617e-04 -0.04362 0.9652
## lat 2.986e-05 0.09337 0.9256
##
## F-test:
## model df_F SSE MSE logLik Fstat pval_F
## 1 AR_coef ~ 1 + lat 1 0.1503 5.012e-05 12806 0.008719 0.9256
## 2 AR_coef ~ 1 2998 0.1503 5.012e-05 12806 NA NA
The t-tests show that temporal trends in NDVI did not differ with latitude. The p-value from the F-test is equivalent to that of the t-test p-value for effect of latitude because they are testing the same hypothesis.
We can see from these hypothesis tests that, at least among the 3000-pixel sub-sample of Alaska, the answer to all three questions that we posed is no: there is no statistical evidence for a greening trend in Alaska, nor strong differences among land cover classes or latitude.
Until now, we have limited our analyses to the 3000-pixel subset of
Alaska, ndvi_AK3000
. Calls to fitGLS()
involve
inverting V
, and the computational complexity scales with
\(N^3\) where \(N\) is the number of pixels in the map. We
have used the data set ndvi_AK3000
up to now because the
computation time for the analyses is reasonable. Even with only 3000
pixels, we must handle distance and covariance matrices that each
contain \(3,000 \times 3,000 =
9,000,000\) elements. This is approaching the upper size limit
for obtaining relatively fast results on most personal consumer
hardware.
By contrast, the covariance matrix for the larger
ndvi_AK10000
data set would have \(10,000 \times 10,000 = 100,000,000\)
elements which creates a computationally infeasible problem for a
typical computer.
For these reasons, fitGLS_paritition
may be the most
useful function in the remotePARTS
package. This function
can perform the GLS analysis on the full ndvi_AK10000
data
set. In fact, ndvi_AK10000
is quite small in comparison to
many remote sensing data sets, any of which could be analyzed with
fitGLS_partition()
.
fitGLS_parition()
conducts GLS analyses on partitions of
the data and then combines the results from the partitions to give
overall statistical results. Specifically, this process (1) breaks the
data into smaller and more manageable pieces (partitions), (2) conducts
GLS on each partition, (3) calculates cross-partition statistics from
pairs of partitions, and (4) summarizes the results with statistical
tests that account for correlations among partitions. We will use the
full ndvi_AK10000
data set to demonstrate
fitGLS_parition()
.
We have already performed the time-series analyses on the full data
set. These are in the AR_coef
column of
ndvi_AK10000
.
Step (1) is to divide pixels into partitions, which is done with the
function sample_parition()
. Passing the number of pixels in
our map and the argument partsize = 1500
will result in a
partition matrix with 1,500 rows and 6 columns. Columns of the resulting
partition matrix pm
each contain a random sample of 1,500
pixels. Each of these 6 samples (partitions) are non-overlapping,
containing no repeats. Setting npart = NA
will
automatically give the maximum number of partitions possible (i.e.,
6).
pm <- sample_partitions(npix = nrow(ndvi_AK10000), partsize = 1500, npart = NA)
dim(pm)
## [1] 1500 6
Once we have our partition matrix, fitGLS_parititon()
performs steps (2)-(4) of the analyses. The input is similar to
fitGLS
. For this example, we specify (i) a formula, (ii)
the data as a data.frame (AK10000
), (iii) the partition
matrix (pm
), (iv) the covariance function
(covar_FUN
), (v) a list of spatial parameters
(covar.pars
) including our optimized range parameter, and a
nugget
. If nugget
is specified, this value is
used for the calculations, while if nugget
=
NA
(the default) it is estimated for each partition
separately.
Although the compute time is much faster than if we needed to invert
the full covariance matrix V
, this example still takes a
long time to fit (but scales roughly linearly with number of
partitions). Therefore, we have saved the output of this code as an R
object partGLS.ndviAK
so that you can look at its output
without having to execute the function.
The model was fit with this code:
partGLS_ndviAK <- fitGLS_partition(
formula = AR_coef ~ 0 + land, data = AK10000,
partmat = pm, covar_FUN = "covar_exp",
covar.pars = list(range = range.opt),
nugget = nug.opt, ncores = 4
)
and can be loaded with
Here are the t-tests, that show that land cover class does not significantly affect NDVI trend:
partGLS_ndviAK$overall$t.test
## $p.t
## Est SE t.stat pval.t
## landShrubland 0.0013094359 0.001899589 0.6893260 0.4906359
## landSavanna 0.0004379666 0.001902064 0.2302586 0.8178960
## landGrassland 0.0016210956 0.001894683 0.8556027 0.3922404
##
## $covar_coef
## [,1] [,2] [,3]
## [1,] 3.623978e-06 3.569499e-06 3.561642e-06
## [2,] 3.566697e-06 3.633439e-06 3.537596e-06
## [3,] 3.562415e-06 3.540957e-06 3.605232e-06
It is highly recommended that users read the full documentation
(help("fitGLS_parition")
).
Here is the p-value for the chisqr test of the partitioned GLS
This again, indicates that the model which includes land cover
classes better explains variation in NDVI trends than the intercept-only
model. The p-value is much lower than the p-value from the F-test
conducted by fitGLS()
. This is likely due to outliers in
the data, which should be removed before conducting any real analysis.
One simple way to filter potential trend outliers would be to remove any
pixels whose time-series coefficient standard errors are unusually large
(e.g., SE > 4*mean(SE)
).
Our partitioned GLS has returned the same conclusions as our previous GLS analysis. No map-level trend in NDVI was found within any of the land cover classes.
This was only one of our three hypotheses tested earlier. The
remaining two can easily be tested with fitGLS_partition()
,
as they were with fitGLS
, by changing the formula argument
accordingly.