Introduction

This tutorial is for the 2021 OpenGeoHub summer school. It is written with very brief explanations of the content. Please watch the recorded tutorial presentation for detailed explanations of the content.

To follow along you need spatstat installed along with its dependencies. You can install the development version like this (but the CRAN version is fine for this tutorial):

options(repos = c(
    spatstat = 'https://spatstat.r-universe.dev',
    CRAN = 'https://cloud.r-project.org'))
install.packages("spatstat", dependencies = TRUE)

Now load spatstat and the needed sub-packages:

library(spatstat)
## Loading required package: spatstat.data
## Loading required package: spatstat.geom
## spatstat.geom 2.2-2.002
## Loading required package: spatstat.core
## Loading required package: nlme
## Loading required package: rpart
## spatstat.core 2.3-0.003
## Loading required package: spatstat.linnet
## spatstat.linnet 2.3-0
## 
## spatstat 2.2-0.002       (nickname: 'That's not important right now') 
## For an introduction to spatstat, type 'beginner'

Point pattern data

Planar coordinates (x,y) are assumed almost everywhere in spatstat, so geographic (lng,lat) coordinates must be projected. You can use sf and https://projectionwizard.org for this (in rare cases sp and maptools may be useful).

As a first example consider the built-in Chorley-Ribble cancer data:

plot(chorley, cols = c("red", "blue"))

It is stored as planar point pattern ppp with many S3 methods available for plotting, summarising, subsetting, etc.

class(chorley)
## [1] "ppp"
methods(class = "ppp")
##   [1] [                     [<-                   affine               
##   [4] anyDuplicated         as.data.frame         as.im                
##   [7] as.layered            as.owin               as.ppp               
##  [10] auc                   berman.test           boundingbox          
##  [13] boundingcentre        boundingcircle        boundingradius       
##  [16] by                    cdf.test              circumradius         
##  [19] closepairs            closing               connected            
##  [22] coords                coords<-              crossdist            
##  [25] crosspairs            cut                   density              
##  [28] densityAdaptiveKernel densityfun            densityHeat          
##  [31] densityVoronoi        dilation              distfun              
##  [34] distmap               domain                duplicated           
##  [37] edit                  envelope              erosion              
##  [40] fardist               flipxy                Frame<-              
##  [43] has.close             head                  identify             
##  [46] intensity             is.connected          is.empty             
##  [49] is.marked             is.multitype          kppm                 
##  [52] lurking               markformat            marks                
##  [55] marks<-               multiplicity          nnclean              
##  [58] nncross               nndensity             nndist               
##  [61] nnfun                 nnwhich               nobjects             
##  [64] npoints               opening               pairdist             
##  [67] pcf                   periodify             pixellate            
##  [70] plot                  ppm                   print                
##  [73] quadrat.test          quadratcount          quantess             
##  [76] rebound               relevel               relrisk              
##  [79] rescale               rhohat                roc                  
##  [82] rotate                round                 rounding             
##  [85] rshift                scalardilate          scanmeasure          
##  [88] sdr                   segregation.test      sharpen              
##  [91] shift                 Smooth                Smoothfun            
##  [94] split                 split<-               subset               
##  [97] summary               superimpose           tail                 
## [100] text                  unique                uniquemap            
## [103] unitname              unitname<-            unmark               
## [106] unstack               Window                Window<-             
## see '?methods' for accessing help and source code
summary(chorley)
## Marked planar point pattern:  1036 points
## Average intensity 3.287268 points per square km
## 
## *Pattern contains duplicated points*
## 
## Coordinates are given to 1 decimal place
## i.e. rounded to the nearest multiple of 0.1 km
## 
## Multitype:
##        frequency proportion intensity
## larynx        58 0.05598456 0.1840363
## lung         978 0.94401540 3.1032320
## 
## Window: polygonal boundary
## single connected closed polygon with 131 vertices
## enclosing rectangle: [343.45, 366.45] x [410.41, 431.79] km
##                      (23 x 21.38 km)
## Window area = 315.155 square km
## Unit of length: 1 km
## Fraction of frame area: 0.641

The spatial region where the points were observed (observation window, owin) is part of the ppp object. There are many polygonal operations available to manipulate these (handled by polyclip package). E.g:

W <- Window(chorley)
Wplus <- dilation(W, 2)
Wminus <- erosion(W, 2)
plot(Wplus, main = "Dilation and erosion")
plot(W, add = TRUE)
plot(Wminus, add = TRUE)

Case study: Murchison gold deposits

  • Objective: Determine gold prospectivity from predictors which are easier to observe than gold deposits directly.
murchison
## List of spatial objects
## 
## gold:
## Planar point pattern: 255 points
## window: rectangle = [352782.9, 682589.6] x [6699742, 7101484] metres
## 
## faults:
## planar line segment pattern: 3252 line segments
## window: rectangle = [352782.9, 682589.6] x [6699742, 7101484] metres
## 
## greenstone:
## window: polygonal boundary
## enclosing rectangle: [352782.9, 681699.6] x [6706467, 7100804] metres
plot(murchison, equal.scales = TRUE, main = "")

Rescaling to km:

gold <- rescale(murchison$gold, 1000, "km")
faults <- rescale(murchison$faults, 1000, "km")
gs <- rescale(murchison$greenstone, 1000, "km")

Distance to fault lines (resulting raster image is called an im in spatstat):

dfaults <- distmap(faults, eps = 1) # 1km resolution
plot(dfaults)
plot(faults, add = TRUE, col = "white")
plot(gold, add = TRUE, cols = "gold")

Relationship between gold intensity and fault line distance

Dividing space according to distance (tessellation according to quantiles):

qt <- quantess(Window(gold), dfaults, 4)
plot(qt, las = 1) # horizontal labels on axis (colourmap)

Deposits in each band around fault lines:

gold4 <- quadratcount(gold, tess = qt)
gold4
## tile
## [9e-05,6.7]  (6.7,18.1] (18.1,42.8]  (42.8,130] 
##         230          25           0           0
barplot(intensity(gold4))

gold8 <- quadratcount(gold, tess = quantess(Window(gold), dfaults, 8))
gold8
## tile
## [9e-05,2.79]   (2.79,6.7]   (6.7,11.8]  (11.8,18.1]  (18.1,27.9]  (27.9,42.8] 
##          160           70           16            9            0            0 
##  (42.8,62.9]   (62.9,130] 
##            0            0
barplot(intensity(gold8))

Kernel smoothed version (estimate of true functional relationship between covariate and intensity with pointwise CI):

rh <- rhohat(gold, dfaults)
plot(rh, legendpos = "topright", xlim = c(0,30))

Predictions from this relationship compared to kernel smoothed intensity:

pred_list <- solist(
  rhohat = predict(rh),
  kernel = density(gold, sigma = 5),
  adaptive = adaptive.density(gold, method = "kernel")
)
plot(pred_list, equal.ribbon = TRUE)

ROC curve

Another view is to compare fault distances for observed gold deposits with fault distances for uniformly random points.

First we need to extract values of the distmap at gold locations (we can use [ to extract a spatial subset):

dgold <- dfaults[gold]
head(dgold)
## [1] 2.1222522 0.9270924 1.0846818 3.0811428 3.2944522 4.6082032

More precise values are obtained by using distfun():

dfaults_fun <- distfun(faults)
dfaults_fun
## Distance function for line segment pattern
## planar line segment pattern: 3252 line segments
## window: rectangle = [352.7829, 682.5896] x [6699.742, 7101.484] km
dgold <- dfaults_fun(gold)
head(dgold)
## [1] 1.905984 1.045784 1.510148 3.191979 3.537323 4.442322

Comparing histograms:

rand <- runifpoint(ex = gold)
drand <- dfaults_fun(rand)
hist(drand, col = rgb(.8,0,0,.1), ylim = c(0,150))
hist_dgold <- hist(dgold, plot = FALSE)
plot(hist_dgold, col = rgb(0,0,.8,.1), add = TRUE)

In terms of the ROC:

plot(roc(gold, dfaults_fun, high = FALSE))

Interpretation: If you explore the 10% of the region that is closest to fault lines you find ~60% of the gold deposits. At 20% it is ~80%.

Poisson models

Log-linear models for the intensity follow the syntax of lm/glm:

(model_dist <- ppm(gold ~ dfaults_fun, eps=1))
## Nonstationary Poisson process
## 
## Log intensity:  ~dfaults_fun
## 
## Fitted trend coefficients:
## (Intercept) dfaults_fun 
##  -4.2995065  -0.2686372 
## 
##               Estimate       S.E.   CI95.lo    CI95.hi Ztest      Zval
## (Intercept) -4.2995065 0.08636136 -4.468772 -4.1302413   *** -49.78507
## dfaults_fun -0.2686372 0.02059161 -0.308996 -0.2282784   *** -13.04595

The fitted model is of the form \[ \lambda(u) = \exp( \alpha + \beta d(u) ). \] That is, the estimated intensity of gold deposits in the immediate vicinity of a geological fault is about \(\exp(\hat{\alpha}) = 0.0135753\) and decreases by a factor of \(\exp(\hat{\beta}) = 0.7644205\) for every additional kilometre away from a fault.

Use effectfun() to see the effect of the distance covariate on the intensity:

ef_dist <- effectfun(model_dist, "dfaults_fun", se.fit = TRUE)
plot(ef_dist)

Use simulate() to generate realizations from the model:

s_dist <- simulate(model_dist, nsim = 8)
## Generating 8 simulated patterns ...1, 2, 3, 4, 5, 6, 7,  8.
s_dist[[9]] <- gold
names(s_dist)[9] <- "gold"
plot(s_dist)

Models with two covariates:

(model_add <- ppm(gold ~ dfaults_fun + gs, eps=1))
## Nonstationary Poisson process
## 
## Log intensity:  ~dfaults_fun + gs
## 
## Fitted trend coefficients:
## (Intercept) dfaults_fun      gsTRUE 
##  -6.5926759  -0.1113602   2.8573081 
## 
##               Estimate       S.E.    CI95.lo     CI95.hi Ztest       Zval
## (Intercept) -6.5926759 0.21460153 -7.0132871 -6.17206459   *** -30.720545
## dfaults_fun -0.1113602 0.01870201 -0.1480155 -0.07470495   ***  -5.954451
## gsTRUE       2.8573081 0.20316989  2.4591024  3.25551372   ***  14.063639
(model_int <- ppm(gold ~ dfaults_fun * gs, eps=1))
## Nonstationary Poisson process
## 
## Log intensity:  ~dfaults_fun * gs
## 
## Fitted trend coefficients:
##        (Intercept)        dfaults_fun             gsTRUE dfaults_fun:gsTRUE 
##         -6.0868735         -0.1993346          2.2033422          0.1504900 
## 
##                      Estimate       S.E.     CI95.lo    CI95.hi Ztest
## (Intercept)        -6.0868735 0.24487048 -6.56681086 -5.6069362   ***
## dfaults_fun        -0.1993346 0.03852808 -0.27484821 -0.1238209   ***
## gsTRUE              2.2033422 0.26101768  1.69175696  2.7149274   ***
## dfaults_fun:gsTRUE  0.1504900 0.04492641  0.06243581  0.2385441   ***
##                          Zval
## (Intercept)        -24.857523
## dfaults_fun         -5.173747
## gsTRUE               8.441352
## dfaults_fun:gsTRUE   3.349699

Formal test between nested models:

anova(model_add, model_int, test = "Chi")
## Analysis of Deviance Table
## 
## Model 1: ~dfaults_fun + gs    Poisson
## Model 2: ~dfaults_fun * gs    Poisson
##   Npar Df Deviance  Pr(>Chi)    
## 1    3                          
## 2    4  1    13.15 0.0002874 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Simulations from interaction model:

s_int <- simulate(model_int, nsim = 8)
## Generating 8 simulated patterns ...1, 2, 3, 4, 5, 6, 7,  8.
s_int[[9]] <- gold
names(s_int)[9] <- "gold"
plot(s_int)

Model diagnostics

The classical tools from GLMs have extensions to point process models:

lev <- leverage(model_int)
inf <- influence(model_int)
dfb <- dfbetas(model_int)

High leverage of areas within greenstone far from faults. I.e. adding and deleting points in this area has a large effect on the fitted model.

plot(lev, showcut = FALSE)
plot(faults, add = TRUE, col = "white")
plot(gs, add = TRUE, border = "green", )

Most influential data points are in greenstone far from fault lines:

plot(inf)
plot(faults, add = TRUE, col = "grey")
plot(gs, add = TRUE, border = "green", )

The sign of the effect on each parameter can be found by studying dfbetas():

plot(dfb)

Other models and simulation

Log-Gaussian Cox Process

This uses the RandomFields package internally, so it must be installed.

model_clust <- kppm(gold ~ dfaults*gs, clusters = "LGCP", eps = 2, method = "clik2")

Notice how the standard errors now are larger than the Poisson case:

coef(summary(model_clust))
## Warning in psib.kppm(object): The model is not a cluster process
##                  Estimate       S.E.     CI95.lo    CI95.hi Ztest       Zval
## (Intercept)    -6.0869644 0.28180025 -6.63928275 -5.5346461   *** -21.600280
## dfaults        -0.1987300 0.04043559 -0.27798231 -0.1194777   ***  -4.914730
## gsTRUE          2.1863779 0.31663988  1.56577516  2.8069807   ***   6.904935
## dfaults:gsTRUE  0.1564274 0.05321685  0.05212434  0.2607306    **   2.939435
coef(summary(model_int))
##                      Estimate       S.E.     CI95.lo    CI95.hi Ztest
## (Intercept)        -6.0868735 0.24487048 -6.56681086 -5.6069362   ***
## dfaults_fun        -0.1993346 0.03852808 -0.27484821 -0.1238209   ***
## gsTRUE              2.2033422 0.26101768  1.69175696  2.7149274   ***
## dfaults_fun:gsTRUE  0.1504900 0.04492641  0.06243581  0.2385441   ***
##                          Zval
## (Intercept)        -24.857523
## dfaults_fun         -5.173747
## gsTRUE               8.441352
## dfaults_fun:gsTRUE   3.349699

Simulations from the model:

s_clust <- simulate(model_clust, nsim = 8)
## Generating 8 simulations... 1, 2, 3, 4, 5, 6, 7,  8.
## Done.
s_clust[[9]] <- gold
names(s_clust)[9] <- "gold"
plot(s_clust)

Gibbs models for patterns with inhibition

  • Synthetic example:
X <- rStraussHard(beta = 200, gamma = 0.4, R = 0.06, H = 0.03, nsim = 9)
plot(X)

  • Model fitting using so-called pseudo-likelihood:
(model_gibbs <- ppm(X[[1]] ~ 1, interaction = StraussHard(r = 0.06, hc = 0.03)))
## Stationary Strauss - hard core process
## 
## First order term:  beta = 248.8166
## 
## Interaction distance:    0.06
## Hard core distance:  0.03
## Fitted interaction parameter gamma:   0.3527016
## 
## Relevant coefficients:
## Interaction 
##   -1.042133 
## 
## For standard errors, type coef(summary(x))

Ripley’s K-function is the normalized expected number of further points within distance r given that you are at a point of the process:

plot(Kest(X[[1]]))

The pair-correlation function is the derivative and has the advantage of not being cumulative:

plot(pcf(X[[1]]))