Dependence between points

Another important goal is to detect stochastic dependence between points in a point pattern.

The homogeneous Poisson process is regarded as completely random (Complete Spatial Randomness, CSR).
A point process is called inhibited if the distances between points are typically larger than expected for a random pattern with the same intensity.
A point process is called clustered if the distances between points are typically smaller than would be expected for a random pattern with the same intensity.

The terms inhibited and clustered are analogous, respectively, to “negatively correlated” and “positively correlated”. They do not imply any particular kind of stochastic dependence and they do not explain how the pattern was generated.

Dependence between points is sometimes called “interaction”, but this term is dangerous because it suggests a particular mechanism for the dependence.

Exploratory tools

K-function

The (Ripley) \(K\)-function assumes the point process has constant intensity \(\lambda\). It is defined so that, for a typical random point in the point process, the number of other random points lying closer than a distance \(r\) has expected value \(\lambda \, K(r)\).

For a completely random (homogeneous Poisson) process, \(K(r) = \pi r^2\). An inhibited process will usually have \(K(r) < \pi r^2\), while a clustered process will have \(K(r) > \pi r^2\), for appropriate values of \(r\).

An estimate of the \(K\) function can be computed for a point pattern dataset X by typing K <- Kest(X).

pair correlation function

The pair correlation function \(g(r)\) can be defined as \(g(r) = K^\prime(r)/(2\pi r)\) where \(K^\prime(r)\) is the derivative of the \(K\) function. The pair correlation function can be interpreted as the probability that two points in the point process will be separated by a distance equal to \(r\), normalised by the corresponding probability for a completely random (Poisson) process.

For a completely random (homogeneous Poisson) process, \(g(r) = 1\). An inhibited process will usually have \(g(r) < 1\), while a clustered process will have \(g(r) > 1\), for appropriate values of \(r\).

An estimate of the pair correlation function can be computed for a point pattern dataset X by typing g <- pcf(X).

Explicit Models for clustered data

plot(redwood)

Cluster processes

A cluster process is generated in two stages.

a point pattern of “parent” points \(X\) is generated;
around each parent point \(x_i\), a finite pattern of “offspring” points \(y_{i1}, \ldots, y_{in_i}\) is generated;
the offspring of all parents are collected together into a single point pattern \(Y\).

In a Thomas cluster process,

the parents are a homogeneous Poisson process with intensity \(\kappa\);
each parent has a Poisson number (with mean \(\mu\)) of offspring, which are displaced from the parent by independent Gaussian vectors with standard deviation \(\sigma\).

Here are simulated realisations of a Thomas process:

plot(rThomas(kappa=10, sigma=0.2, mu=5, nsim=12),
     main="", main.panel="")

Maximum likelihood fitting of cluster processes is difficult because the likelihood is quite complicated. However, the \(K\)-function of such cluster processes is known analytically, so the model can be fitted by the method of moments (matching the model’s theoretical \(K\)-function to the empirical \(K\)-function of the data). This is performed by the spatstat function kppm.

fitT <- kppm(redwood ~ 1, "Thomas")
fitT

## Stationary cluster point process model
## Fitted to point pattern dataset 'redwood'
## Fitted by minimum contrast
##  Summary statistic: K-function
## 
## Uniform intensity:   62
## 
## Cluster model: Thomas process
## Fitted cluster parameters:
##       kappa       scale 
## 23.54856848  0.04705148 
## Mean cluster size:  2.632856 points

plot(simulate(fitT, nsim=12))

## Generating 12 simulations... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,  12.
## Done.

kppm(redwood ~ x+y, "Thomas")

## Inhomogeneous cluster point process model
## Fitted to point pattern dataset 'redwood'
## Fitted by minimum contrast
##  Summary statistic: inhomogeneous K-function
## 
## Log intensity:  ~x + y
## 
## Fitted trend coefficients:
## (Intercept)           x           y 
##  3.95144951  0.29770284 -0.04607577 
## 
## Cluster model: Thomas process
## Fitted cluster parameters:
##       kappa       scale 
## 22.97487360  0.04624891 
## Mean cluster size:  [pixel image]

Cox processes

A Cox process is formed in two steps:

a random function \(\Lambda(u)\) is generated;
Given the realisation of the random function, a Poisson point process is generated with intensity function \(\Lambda(u)\).

In a log-Gaussian Cox process, the random function \(\Lambda(u)\) is such that \(\log \Lambda(u)\) is a Gaussian random function.

These models can be fitted by the same technique:

kppm(redwood ~ x+y, "LGCP")

## Inhomogeneous Cox point process model
## Fitted to point pattern dataset 'redwood'
## Fitted by minimum contrast
##  Summary statistic: inhomogeneous K-function
## 
## Log intensity:  ~x + y
## 
## Fitted trend coefficients:
## (Intercept)           x           y 
##  3.95144951  0.29770284 -0.04607577 
## 
## Cox model: log-Gaussian Cox process
##  Covariance model: exponential
## Fitted covariance parameters:
##        var      scale 
## 1.09373417 0.09795447 
## Fitted mean of log of random intensity: [pixel image]

Checking for dependence between points

Example of Poisson model for bei data:

bei_model <- ppm(bei ~ grad, data = bei.extra)
coef(summary(bei_model))

##              Estimate       S.E.   CI95.lo   CI95.hi Ztest      Zval
## (Intercept) -5.391053 0.03001787 -5.449887 -5.332219   *** -179.5948
## grad         5.026710 0.24534296  4.545847  5.507573   ***   20.4885

Are the points clustering more than expected only from log-linear dependence on grad?

It looks like it from the pair correlation function:

pcf_bei_model <- pcfinhom(bei, lambda = bei_model)
plot(pcf_bei_model)

And also from the \(K\)-function:

Kbei_model <- Kinhom(bei, lambda = bei_model, correction = "border")
plot(Kbei_model)

Often central \(L\)-function is used:

plot(Kbei_model, sqrt(./pi)-r ~ r)

Pointwise significance envelopes

Monte Carlo significance envelopes for a pointwise test:

e <- envelope(bei_model, fun = Linhom, correction = "border", nsim = 99,
              savefuns = TRUE, savepatterns = TRUE)

## Generating 99 simulated realisations of fitted Poisson model  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.
## 
## Done.

plot(e, .-r~r, ylim = c(-5, 20))

Above 99 simulations from the fitted Poisson model are generated and then Linhom() is calculated for these and the data using leave-one-out KDE for lambda. It may be more appropriate to use the intensity model (log-linear dependence on grad), but this is much slower. The command (not executed) is (without saving intermediate data):

envelope(bei_model, fun = Linhom, correction = "border", nsim = 99, lambda = bei_model)

It may be tempting to use the faster:

lam <- predict(bei_model, at = "points")
envelope(bei_model, fun = Linhom, correction = "border", nsim = 99, lambda = lam)

But this is wrong due to violation of symmetry principle between data and simulations. There is still another worry about the composite hypothesis, where we have used data to estimate the parameters of the model, but that is yet another story…

Simple global envelopes

Adding global = TRUE and setting ginterval gives us simple global fixed width envelopes:

e_global <- envelope(e, global = TRUE, ginterval = c(0,125))

plot(e_global, .-r~r)

Global rank envelopes

More sophisticated global envelopes are available in the R package GET (see references therein for details).

library(GET)

These envelopes are calculated based on a global ranking of the data curve in the set of curves calculated from simulations and data. Each curve has a global rank based on the most extreme pointwise rank it has acheived. Many curves typically have the same rank so ties need to be broken which can be done in various ways. Due to the ties many simulations are recommended (e.g. 1999). The package GET can reuse envelope objects from spatstat if intermediate calculations (simulations and function values) have been saved:

bei_test <- global_envelope_test(e)

plot(bei_test)

The test both gives a p-value and a graphical interpretation significance envelopes at the \(\alpha\)-level. If the data curve ever exits these envelopes the test is significant at the \(\alpha\)-level (and vice-versa). Values leading to the rejection of the null hypothesis (at the default \(\alpha=0.05\) level) are marked by red.

An example from the package vignettes testing a simple hypothesis:

X <- unmark(spruces)
plot(X, main = "")

nsim <- 1999 # Number of simulations
env <- envelope(X, fun = "Lest", nsim = nsim,
                savefuns = TRUE, # save the functions
                correction = "translate", # edge correction for L
                transform = expression(.-r), # centering
                simulate = expression(runifpoint(ex = X)), # Simulate CSR
                verbose = FALSE)
res <- global_envelope_test(env, type = "erl")
plot(res)

cset <- crop_curves(env, r_min = 1, r_max = 7)
res <- global_envelope_test(cset, type = "erl")
plot(res)

Notes for session 3

Ege Rubak (and Adrian Baddeley)

August 2022