This tutorial is for the 2021 OpenGeoHub summer school. It is written with very brief explanations of the content. Please watch the recorded tutorial presentation for detailed explanations of the content.
To follow along you need spatstat
installed along with its dependencies. You can install the development version like this (but the CRAN version is fine for this tutorial):
options(repos = c(
spatstat = 'https://spatstat.r-universe.dev',
CRAN = 'https://cloud.r-project.org'))
install.packages("spatstat", dependencies = TRUE)
Now load spatstat and the needed sub-packages:
library(spatstat)
## Loading required package: spatstat.data
## Loading required package: spatstat.geom
## spatstat.geom 2.2-2.002
## Loading required package: spatstat.core
## Loading required package: nlme
## Loading required package: rpart
## spatstat.core 2.3-0.003
## Loading required package: spatstat.linnet
## spatstat.linnet 2.3-0
##
## spatstat 2.2-0.002 (nickname: 'That's not important right now')
## For an introduction to spatstat, type 'beginner'
Planar coordinates (x,y) are assumed almost everywhere in spatstat
, so geographic (lng,lat) coordinates must be projected. You can use sf
and https://projectionwizard.org for this (in rare cases sp
and maptools
may be useful).
As a first example consider the built-in Chorley-Ribble cancer data:
plot(chorley, cols = c("red", "blue"))
It is stored as planar point pattern ppp
with many S3 methods available for plotting, summarising, subsetting, etc.
class(chorley)
## [1] "ppp"
methods(class = "ppp")
## [1] [ [<- affine
## [4] anyDuplicated as.data.frame as.im
## [7] as.layered as.owin as.ppp
## [10] auc berman.test boundingbox
## [13] boundingcentre boundingcircle boundingradius
## [16] by cdf.test circumradius
## [19] closepairs closing connected
## [22] coords coords<- crossdist
## [25] crosspairs cut density
## [28] densityAdaptiveKernel densityfun densityHeat
## [31] densityVoronoi dilation distfun
## [34] distmap domain duplicated
## [37] edit envelope erosion
## [40] fardist flipxy Frame<-
## [43] has.close head identify
## [46] intensity is.connected is.empty
## [49] is.marked is.multitype kppm
## [52] lurking markformat marks
## [55] marks<- multiplicity nnclean
## [58] nncross nndensity nndist
## [61] nnfun nnwhich nobjects
## [64] npoints opening pairdist
## [67] pcf periodify pixellate
## [70] plot ppm print
## [73] quadrat.test quadratcount quantess
## [76] rebound relevel relrisk
## [79] rescale rhohat roc
## [82] rotate round rounding
## [85] rshift scalardilate scanmeasure
## [88] sdr segregation.test sharpen
## [91] shift Smooth Smoothfun
## [94] split split<- subset
## [97] summary superimpose tail
## [100] text unique uniquemap
## [103] unitname unitname<- unmark
## [106] unstack Window Window<-
## see '?methods' for accessing help and source code
summary(chorley)
## Marked planar point pattern: 1036 points
## Average intensity 3.287268 points per square km
##
## *Pattern contains duplicated points*
##
## Coordinates are given to 1 decimal place
## i.e. rounded to the nearest multiple of 0.1 km
##
## Multitype:
## frequency proportion intensity
## larynx 58 0.05598456 0.1840363
## lung 978 0.94401540 3.1032320
##
## Window: polygonal boundary
## single connected closed polygon with 131 vertices
## enclosing rectangle: [343.45, 366.45] x [410.41, 431.79] km
## (23 x 21.38 km)
## Window area = 315.155 square km
## Unit of length: 1 km
## Fraction of frame area: 0.641
The spatial region where the points were observed (observation window, owin
) is part of the ppp
object. There are many polygonal operations available to manipulate these (handled by polyclip
package). E.g:
W <- Window(chorley)
Wplus <- dilation(W, 2)
Wminus <- erosion(W, 2)
plot(Wplus, main = "Dilation and erosion")
plot(W, add = TRUE)
plot(Wminus, add = TRUE)
murchison
## List of spatial objects
##
## gold:
## Planar point pattern: 255 points
## window: rectangle = [352782.9, 682589.6] x [6699742, 7101484] metres
##
## faults:
## planar line segment pattern: 3252 line segments
## window: rectangle = [352782.9, 682589.6] x [6699742, 7101484] metres
##
## greenstone:
## window: polygonal boundary
## enclosing rectangle: [352782.9, 681699.6] x [6706467, 7100804] metres
plot(murchison, equal.scales = TRUE, main = "")
Rescaling to km:
gold <- rescale(murchison$gold, 1000, "km")
faults <- rescale(murchison$faults, 1000, "km")
gs <- rescale(murchison$greenstone, 1000, "km")
Distance to fault lines (resulting raster image is called an im
in spatstat
):
dfaults <- distmap(faults, eps = 1) # 1km resolution
plot(dfaults)
plot(faults, add = TRUE, col = "white")
plot(gold, add = TRUE, cols = "gold")
Dividing space according to distance (tessellation according to quantiles):
qt <- quantess(Window(gold), dfaults, 4)
plot(qt, las = 1) # horizontal labels on axis (colourmap)
Deposits in each band around fault lines:
gold4 <- quadratcount(gold, tess = qt)
gold4
## tile
## [9e-05,6.7] (6.7,18.1] (18.1,42.8] (42.8,130]
## 230 25 0 0
barplot(intensity(gold4))
gold8 <- quadratcount(gold, tess = quantess(Window(gold), dfaults, 8))
gold8
## tile
## [9e-05,2.79] (2.79,6.7] (6.7,11.8] (11.8,18.1] (18.1,27.9] (27.9,42.8]
## 160 70 16 9 0 0
## (42.8,62.9] (62.9,130]
## 0 0
barplot(intensity(gold8))
Kernel smoothed version (estimate of true functional relationship between covariate and intensity with pointwise CI):
rh <- rhohat(gold, dfaults)
plot(rh, legendpos = "topright", xlim = c(0,30))
Predictions from this relationship compared to kernel smoothed intensity:
pred_list <- solist(
rhohat = predict(rh),
kernel = density(gold, sigma = 5),
adaptive = adaptive.density(gold, method = "kernel")
)
plot(pred_list, equal.ribbon = TRUE)
Another view is to compare fault distances for observed gold deposits with fault distances for uniformly random points.
First we need to extract values of the distmap at gold locations (we can use [
to extract a spatial subset):
dgold <- dfaults[gold]
head(dgold)
## [1] 2.1222522 0.9270924 1.0846818 3.0811428 3.2944522 4.6082032
More precise values are obtained by using distfun()
:
dfaults_fun <- distfun(faults)
dfaults_fun
## Distance function for line segment pattern
## planar line segment pattern: 3252 line segments
## window: rectangle = [352.7829, 682.5896] x [6699.742, 7101.484] km
dgold <- dfaults_fun(gold)
head(dgold)
## [1] 1.905984 1.045784 1.510148 3.191979 3.537323 4.442322
Comparing histograms:
rand <- runifpoint(ex = gold)
drand <- dfaults_fun(rand)
hist(drand, col = rgb(.8,0,0,.1), ylim = c(0,150))
hist_dgold <- hist(dgold, plot = FALSE)
plot(hist_dgold, col = rgb(0,0,.8,.1), add = TRUE)
In terms of the ROC:
plot(roc(gold, dfaults_fun, high = FALSE))
Interpretation: If you explore the 10% of the region that is closest to fault lines you find ~60% of the gold deposits. At 20% it is ~80%.
Log-linear models for the intensity follow the syntax of lm
/glm
:
(model_dist <- ppm(gold ~ dfaults_fun, eps=1))
## Nonstationary Poisson process
##
## Log intensity: ~dfaults_fun
##
## Fitted trend coefficients:
## (Intercept) dfaults_fun
## -4.2995065 -0.2686372
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -4.2995065 0.08636136 -4.468772 -4.1302413 *** -49.78507
## dfaults_fun -0.2686372 0.02059161 -0.308996 -0.2282784 *** -13.04595
The fitted model is of the form \[ \lambda(u) = \exp( \alpha + \beta d(u) ). \] That is, the estimated intensity of gold deposits in the immediate vicinity of a geological fault is about \(\exp(\hat{\alpha}) = 0.0135753\) and decreases by a factor of \(\exp(\hat{\beta}) = 0.7644205\) for every additional kilometre away from a fault.
Use effectfun()
to see the effect of the distance covariate on the intensity:
ef_dist <- effectfun(model_dist, "dfaults_fun", se.fit = TRUE)
plot(ef_dist)
Use simulate()
to generate realizations from the model:
s_dist <- simulate(model_dist, nsim = 8)
## Generating 8 simulated patterns ...1, 2, 3, 4, 5, 6, 7, 8.
s_dist[[9]] <- gold
names(s_dist)[9] <- "gold"
plot(s_dist)
Models with two covariates:
(model_add <- ppm(gold ~ dfaults_fun + gs, eps=1))
## Nonstationary Poisson process
##
## Log intensity: ~dfaults_fun + gs
##
## Fitted trend coefficients:
## (Intercept) dfaults_fun gsTRUE
## -6.5926759 -0.1113602 2.8573081
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -6.5926759 0.21460153 -7.0132871 -6.17206459 *** -30.720545
## dfaults_fun -0.1113602 0.01870201 -0.1480155 -0.07470495 *** -5.954451
## gsTRUE 2.8573081 0.20316989 2.4591024 3.25551372 *** 14.063639
(model_int <- ppm(gold ~ dfaults_fun * gs, eps=1))
## Nonstationary Poisson process
##
## Log intensity: ~dfaults_fun * gs
##
## Fitted trend coefficients:
## (Intercept) dfaults_fun gsTRUE dfaults_fun:gsTRUE
## -6.0868735 -0.1993346 2.2033422 0.1504900
##
## Estimate S.E. CI95.lo CI95.hi Ztest
## (Intercept) -6.0868735 0.24487048 -6.56681086 -5.6069362 ***
## dfaults_fun -0.1993346 0.03852808 -0.27484821 -0.1238209 ***
## gsTRUE 2.2033422 0.26101768 1.69175696 2.7149274 ***
## dfaults_fun:gsTRUE 0.1504900 0.04492641 0.06243581 0.2385441 ***
## Zval
## (Intercept) -24.857523
## dfaults_fun -5.173747
## gsTRUE 8.441352
## dfaults_fun:gsTRUE 3.349699
Formal test between nested models:
anova(model_add, model_int, test = "Chi")
## Analysis of Deviance Table
##
## Model 1: ~dfaults_fun + gs Poisson
## Model 2: ~dfaults_fun * gs Poisson
## Npar Df Deviance Pr(>Chi)
## 1 3
## 2 4 1 13.15 0.0002874 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Simulations from interaction model:
s_int <- simulate(model_int, nsim = 8)
## Generating 8 simulated patterns ...1, 2, 3, 4, 5, 6, 7, 8.
s_int[[9]] <- gold
names(s_int)[9] <- "gold"
plot(s_int)
The classical tools from GLMs have extensions to point process models:
lev <- leverage(model_int)
inf <- influence(model_int)
dfb <- dfbetas(model_int)
High leverage of areas within greenstone far from faults. I.e. adding and deleting points in this area has a large effect on the fitted model.
plot(lev, showcut = FALSE)
plot(faults, add = TRUE, col = "white")
plot(gs, add = TRUE, border = "green", )
Most influential data points are in greenstone far from fault lines:
plot(inf)
plot(faults, add = TRUE, col = "grey")
plot(gs, add = TRUE, border = "green", )
The sign of the effect on each parameter can be found by studying dfbetas()
:
plot(dfb)
This uses the RandomFields
package internally, so it must be installed.
model_clust <- kppm(gold ~ dfaults*gs, clusters = "LGCP", eps = 2, method = "clik2")
Notice how the standard errors now are larger than the Poisson case:
coef(summary(model_clust))
## Warning in psib.kppm(object): The model is not a cluster process
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -6.0869644 0.28180025 -6.63928275 -5.5346461 *** -21.600280
## dfaults -0.1987300 0.04043559 -0.27798231 -0.1194777 *** -4.914730
## gsTRUE 2.1863779 0.31663988 1.56577516 2.8069807 *** 6.904935
## dfaults:gsTRUE 0.1564274 0.05321685 0.05212434 0.2607306 ** 2.939435
coef(summary(model_int))
## Estimate S.E. CI95.lo CI95.hi Ztest
## (Intercept) -6.0868735 0.24487048 -6.56681086 -5.6069362 ***
## dfaults_fun -0.1993346 0.03852808 -0.27484821 -0.1238209 ***
## gsTRUE 2.2033422 0.26101768 1.69175696 2.7149274 ***
## dfaults_fun:gsTRUE 0.1504900 0.04492641 0.06243581 0.2385441 ***
## Zval
## (Intercept) -24.857523
## dfaults_fun -5.173747
## gsTRUE 8.441352
## dfaults_fun:gsTRUE 3.349699
Simulations from the model:
s_clust <- simulate(model_clust, nsim = 8)
## Generating 8 simulations... 1, 2, 3, 4, 5, 6, 7, 8.
## Done.
s_clust[[9]] <- gold
names(s_clust)[9] <- "gold"
plot(s_clust)
X <- rStraussHard(beta = 200, gamma = 0.4, R = 0.06, H = 0.03, nsim = 9)
plot(X)
(model_gibbs <- ppm(X[[1]] ~ 1, interaction = StraussHard(r = 0.06, hc = 0.03)))
## Stationary Strauss - hard core process
##
## First order term: beta = 248.8166
##
## Interaction distance: 0.06
## Hard core distance: 0.03
## Fitted interaction parameter gamma: 0.3527016
##
## Relevant coefficients:
## Interaction
## -1.042133
##
## For standard errors, type coef(summary(x))
Ripley’s K-function is the normalized expected number of further points within distance r given that you are at a point of the process:
plot(Kest(X[[1]]))
The pair-correlation function is the derivative and has the advantage of not being cumulative:
plot(pcf(X[[1]]))
Default simulation under homogeneous Poisson:
env_gibbs_wrong <- envelope(X[[1]], Kest)
## Generating 99 simulations of CSR ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
##
## Done.
plot(env_gibbs_wrong, .-pi*r^2 ~ r)
Simulation from the model:
env_gibbs_right <- envelope(model_gibbs, Kest)
## Generating 99 simulated realisations of fitted Gibbs model ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
##
## Done.
plot(env_gibbs_right, .-pi*r^2 ~ r)
Street crimes in Chicago:
chicago
## Point pattern on linear network
## 116 points
## Multitype, with possible types:
## assault burglary cartheft damage robbery theft trespass
## Linear network with 338 vertices and 503 lines
## Enclosing window: rectangle = [0.3894, 1281.9863] x [153.1035, 1276.5602] feet
plot(chicago)
Intensity of crimes:
lam <- density(chicago, sigma = 100)
plot(lam)
lam2 <- densityVoronoi(chicago, nrep = 50, f = .2)
plot(lam2)
Dirichlet tesselation of gold pattern:
gold_tess <- dirichlet(gold)
plot(gold_tess)
Dirichlet tessellation on network for Chicago crimes:
XX <- chicago
Window(XX) <- grow.rectangle(Window(chicago), 1) ## Needed due to a minor bug
chicago_tess <- lineardirichlet(XX)
plot(chicago_tess)