Another important goal is to detect stochastic dependence between points in a point pattern.
The terms inhibited and clustered are analogous, respectively, to “negatively correlated” and “positively correlated”. They do not imply any particular kind of stochastic dependence and they do not explain how the pattern was generated.
Dependence between points is sometimes called “interaction”, but this term is dangerous because it suggests a particular mechanism for the dependence.
The (Ripley) \(K\)-function assumes the point process has constant intensity \(\lambda\). It is defined so that, for a typical random point in the point process, the number of other random points lying closer than a distance \(r\) has expected value \(\lambda \, K(r)\).
For a completely random (homogeneous Poisson) process, \(K(r) = \pi r^2\). An inhibited process will usually have \(K(r) < \pi r^2\), while a clustered process will have \(K(r) > \pi r^2\), for appropriate values of \(r\).
An estimate of the \(K\) function
can be computed for a point pattern dataset X
by typing
K <- Kest(X)
.
The pair correlation function \(g(r)\) can be defined as \(g(r) = K^\prime(r)/(2\pi r)\) where \(K^\prime(r)\) is the derivative of the \(K\) function. The pair correlation function can be interpreted as the probability that two points in the point process will be separated by a distance equal to \(r\), normalised by the corresponding probability for a completely random (Poisson) process.
For a completely random (homogeneous Poisson) process, \(g(r) = 1\). An inhibited process will usually have \(g(r) < 1\), while a clustered process will have \(g(r) > 1\), for appropriate values of \(r\).
An estimate of the pair correlation function can be computed for a
point pattern dataset X
by typing
g <- pcf(X)
.
plot(redwood)
A cluster process is generated in two stages.
In a Thomas cluster process,
Here are simulated realisations of a Thomas process:
plot(rThomas(kappa=10, sigma=0.2, mu=5, nsim=12),
main="", main.panel="")
Maximum likelihood fitting of cluster processes is difficult because
the likelihood is quite complicated. However, the \(K\)-function of such cluster processes is
known analytically, so the model can be fitted by the method of moments
(matching the model’s theoretical \(K\)-function to the empirical \(K\)-function of the data). This is
performed by the spatstat
function kppm
.
fitT <- kppm(redwood ~ 1, "Thomas")
fitT
## Stationary cluster point process model
## Fitted to point pattern dataset 'redwood'
## Fitted by minimum contrast
## Summary statistic: K-function
##
## Uniform intensity: 62
##
## Cluster model: Thomas process
## Fitted cluster parameters:
## kappa scale
## 23.54856848 0.04705148
## Mean cluster size: 2.632856 points
plot(simulate(fitT, nsim=12))
## Generating 12 simulations... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.
## Done.
kppm(redwood ~ x+y, "Thomas")
## Inhomogeneous cluster point process model
## Fitted to point pattern dataset 'redwood'
## Fitted by minimum contrast
## Summary statistic: inhomogeneous K-function
##
## Log intensity: ~x + y
##
## Fitted trend coefficients:
## (Intercept) x y
## 3.95144951 0.29770284 -0.04607577
##
## Cluster model: Thomas process
## Fitted cluster parameters:
## kappa scale
## 22.97487360 0.04624891
## Mean cluster size: [pixel image]
A Cox process is formed in two steps:
In a log-Gaussian Cox process, the random function \(\Lambda(u)\) is such that \(\log \Lambda(u)\) is a Gaussian random function.
These models can be fitted by the same technique:
kppm(redwood ~ x+y, "LGCP")
## Inhomogeneous Cox point process model
## Fitted to point pattern dataset 'redwood'
## Fitted by minimum contrast
## Summary statistic: inhomogeneous K-function
##
## Log intensity: ~x + y
##
## Fitted trend coefficients:
## (Intercept) x y
## 3.95144951 0.29770284 -0.04607577
##
## Cox model: log-Gaussian Cox process
## Covariance model: exponential
## Fitted covariance parameters:
## var scale
## 1.09373417 0.09795447
## Fitted mean of log of random intensity: [pixel image]
Example of Poisson model for bei
data:
bei_model <- ppm(bei ~ grad, data = bei.extra)
coef(summary(bei_model))
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -5.391053 0.03001787 -5.449887 -5.332219 *** -179.5948
## grad 5.026710 0.24534296 4.545847 5.507573 *** 20.4885
Are the points clustering more than expected only from log-linear
dependence on grad
?
It looks like it from the pair correlation function:
pcf_bei_model <- pcfinhom(bei, lambda = bei_model)
plot(pcf_bei_model)
And also from the \(K\)-function:
Kbei_model <- Kinhom(bei, lambda = bei_model, correction = "border")
plot(Kbei_model)
Often central \(L\)-function is used:
plot(Kbei_model, sqrt(./pi)-r ~ r)
Monte Carlo significance envelopes for a pointwise test:
e <- envelope(bei_model, fun = Linhom, correction = "border", nsim = 99,
savefuns = TRUE, savepatterns = TRUE)
## Generating 99 simulated realisations of fitted Poisson model ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
##
## Done.
plot(e, .-r~r, ylim = c(-5, 20))
Above 99 simulations from the fitted Poisson model are generated and
then Linhom()
is calculated for these and the data using
leave-one-out KDE for lambda
. It may be more appropriate to
use the intensity model (log-linear dependence on grad
),
but this is much slower. The command (not executed) is (without saving
intermediate data):
envelope(bei_model, fun = Linhom, correction = "border", nsim = 99, lambda = bei_model)
It may be tempting to use the faster:
lam <- predict(bei_model, at = "points")
envelope(bei_model, fun = Linhom, correction = "border", nsim = 99, lambda = lam)
But this is wrong due to violation of symmetry principle between data and simulations. There is still another worry about the composite hypothesis, where we have used data to estimate the parameters of the model, but that is yet another story…
Adding global = TRUE
and setting ginterval
gives us simple global fixed width envelopes:
e_global <- envelope(e, global = TRUE, ginterval = c(0,125))
plot(e_global, .-r~r)
More sophisticated global envelopes are available in the R package
GET
(see references therein for details).
library(GET)
These envelopes are calculated based on a global ranking of the data
curve in the set of curves calculated from simulations and data. Each
curve has a global rank based on the most extreme pointwise rank it has
acheived. Many curves typically have the same rank so ties need to be
broken which can be done in various ways. Due to the ties many
simulations are recommended (e.g. 1999). The package GET
can reuse envelope
objects from spatstat
if
intermediate calculations (simulations and function values) have been
saved:
bei_test <- global_envelope_test(e)
plot(bei_test)
The test both gives a p-value and a graphical interpretation significance envelopes at the \(\alpha\)-level. If the data curve ever exits these envelopes the test is significant at the \(\alpha\)-level (and vice-versa). Values leading to the rejection of the null hypothesis (at the default \(\alpha=0.05\) level) are marked by red.
An example from the package vignettes testing a simple hypothesis:
X <- unmark(spruces)
plot(X, main = "")
nsim <- 1999 # Number of simulations
env <- envelope(X, fun = "Lest", nsim = nsim,
savefuns = TRUE, # save the functions
correction = "translate", # edge correction for L
transform = expression(.-r), # centering
simulate = expression(runifpoint(ex = X)), # Simulate CSR
verbose = FALSE)
res <- global_envelope_test(env, type = "erl")
plot(res)
cset <- crop_curves(env, r_min = 1, r_max = 7)
res <- global_envelope_test(cset, type = "erl")
plot(res)