We can formulate a parametric model for the intensity and fit it to
the point pattern data, using the spatstat
function
ppm
(point process model).
In its simplest form, ppm
fits a Poisson point
process model to the point pattern data.
The homogeneous Poisson process with intensity \(\lambda > 0\) in two-dimensional space is characterised by the following properties:
Here are some realisations of the homogeneous Poisson process with intensity 100 (points per unit area):
plot(rpoispp(100, nsim=12), main="", main.panel="")
The inhomogeneous Poisson process with intensity function \(\lambda(u)\) is characterised by the following properties:
Here are some realisations of the inhomogeneous Poisson process with intensity function \(\lambda((x,y)) = 100 x\) on the unit square \([0,1]^2\) (why is this important?):
lam <- function(x,y) { 100 * x}
plot(rpoispp(lam, nsim=12), main="", main.panel="")
ppm
can fit a Poisson point process model to
the point pattern data by maximum likelihood.
A Poisson point process is completely specified by its intensity function. So the procedure for formulating a Poisson model is simply to write a mathematical expression for the intensity function.
In ppm
the intensity is assumed to be a
loglinear function of the parameters.
That is, \[\log\lambda(u) = \beta_1 Z_1(u) +
\ldots + \beta_p Z_p(u)\] where \(\beta_1, \ldots, \beta_p\) are parameters
to be estimated, and \(Z_1, \ldots,
Z_p\) are spatial covariates.
To fit this model to a point pattern dataset X
, we
type
ppm(X ~ Z1 + Z2 + .. Zp)
where Z1, Z2, ..., Zp
are pixel images or functions.
Important notes:
The model is expressed in terms of the log of the intensity.
The covariates \(Z_1(u), \ldots, Z_p(u)\) (called the “canonical covariates”) can be anything; they are not necessarily the same as the original variables that we were given; they could be transformations and combinations of the original variables.
The Poisson process with intensity function \(\lambda_\theta(u)\), controlled by a parameter vector \(\theta\), has log-likelihood \[ \log L(\theta) = \sum_{i=1}^n \log \lambda_\theta(x_i) - \int_W \lambda_\theta(u) \, {\rm d} u. \] The value of \(\theta\) which maximises \(\log L(\theta)\) is taken as the parameter estimate \(\hat\theta\).
From \(\hat\theta\) we can compute the fitted intensity \(\hat\lambda(u) = \lambda_{\hat\theta}(u)\) and hence we can generate simulated realisations.
Using the likelihood we are able to compute confidence intervals, perform analysis of deviance, conduct hypothesis tests, etc.
Example: Murchison gold data
Using the Murchison data from before,
X <- murchison$gold
L <- murchison$faults
X <- rescale(X, 1000, "km")
L <- rescale(L, 1000, "km")
D <- distfun(L)
fit <- ppm(X ~ D)
The formula implies that the model is \[\log\lambda(u) = \beta_0 + \beta_1 D(u)\] where \(D(u)\) is the distance covariate (distance from location \(u\) to nearest geological fault) and \(\beta_0, \beta_1\) are the regression coefficients. In other words, the model says that the intensity of gold deposits is an exponentially decreasing function of distance from the nearest fault.
The result of ppm
is a fitted model object of class
"ppm"
. There are many methods for this class:
fit
## Nonstationary Poisson process
##
## Log intensity: ~D
##
## Fitted trend coefficients:
## (Intercept) D
## -4.3412775 -0.2607664
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -4.3412775 0.08556260 -4.5089771 -4.1735779 *** -50.73802
## D -0.2607664 0.02018789 -0.3003339 -0.2211988 *** -12.91697
coef(fit)
## (Intercept) D
## -4.3412775 -0.2607664
confint(fit)
## 2.5 % 97.5 %
## (Intercept) -4.5089771 -4.1735779
## D -0.3003339 -0.2211988
anova(fit, test="Chi")
## Analysis of Deviance Table
## Terms added sequentially (first to last)
##
## Df Deviance Npar Pr(>Chi)
## NULL 1
## D 1 590.92 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(predict(fit))
plot(simulate(fit, drop=TRUE))
plot(L, add=TRUE, col=3)
To visualise the intensity of the model as a function of one of the
covariates, we can use the command effectfun
:
plot(effectfun(fit, "D"), xlim=c(0, 20))
In a multi-type point pattern the points have marks which are categorical values:
mucosa
## Marked planar point pattern: 965 points
## Multitype, with levels = ECL, other
## window: rectangle = [0, 1] x [0, 0.81] units
plot(mucosa, cols=c(2,3))
We can fit a Poisson model in which the intensity depends on the type
of point, using the variable name marks
in the model
formula.
model0 <- ppm(mucosa ~ marks)
model0
## Stationary multitype Poisson process
##
## Possible marks: 'ECL' and 'other'
##
## Log intensity: ~marks
##
## Intensities:
## beta_ECL beta_other
## 109.8765 1081.4815
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) 4.699357 0.1059998 4.491602 4.907113 *** 44.33365
## marksother 2.286730 0.1112542 2.068675 2.504784 *** 20.55409
coef(model0)
## (Intercept) marksother
## 4.699357 2.286730
plot(predict(model0), equal.ribbon=TRUE)
In the formula, the marks
variable is a categorical
variable. The effect of the model formula mucosa ~ marks
is
to estimate a different intensity for each level, that is, a different
intensity for each type of point. The model formula
mucosa ~ marks
is equivalent to saying that the intensity
of the points of type \(i\) is \[
\lambda_i(u) = \alpha_i
\] for each \(i = 1, 2, \ldots\)
where \(\alpha_1, \alpha_2, \ldots\)
are the different constant intensities to be estimated. The actual
printed output will depend on the convention for handling
“contrasts” in linear models.
The marks
variable can be combined with other
explanatory variables (names x
and y
refer to
the cartesian coordinates):
model1 <- ppm(mucosa ~ marks + y)
model1
## Nonstationary multitype Poisson process
##
## Possible marks: 'ECL' and 'other'
##
## Log intensity: ~marks + y
##
## Fitted trend coefficients:
## (Intercept) marksother y
## 5.131273 2.286730 -1.156055
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) 5.131273 0.1164479 4.903039 5.3595066 *** 44.064966
## marksother 2.286730 0.1112542 2.068675 2.5047840 *** 20.554089
## y -1.156055 0.1406821 -1.431787 -0.8803236 *** -8.217504
coef(model1)
## (Intercept) marksother y
## 5.131273 2.286730 -1.156055
plot(predict(model1))
The model formula ~marks + y
states that \[
\log \lambda_i((x,y)) = \gamma_i + \beta y
\] where \(\gamma_1, \gamma_2,
\ldots\) and \(\beta\) are
parameters. That is, the dependence on the \(y\) coordinate has the same “slope”
coefficient \(\beta\) for each type of
point, but different types of points have different abundance
overall.
plot(effectfun(model1, "y", marks="other"), log(.y) ~ .x, ylim=c(4,8), col=2, main="")
plot(effectfun(model1, "y", marks="ECL"), add=TRUE, col=3, log(.y) ~ .x)
legend("bottomleft", lwd=c(1,1), col=c(2,3), legend=c("other", "ECL"))
model2 <- ppm(mucosa ~ marks * y)
model2
## Nonstationary multitype Poisson process
##
## Possible marks: 'ECL' and 'other'
##
## Log intensity: ~marks * y
##
## Fitted trend coefficients:
## (Intercept) marksother y marksother:y
## 5.884603 1.452251 -3.862202 2.938790
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) 5.884603 0.1635842 5.563984 6.205222 *** 35.972936
## marksother 1.452251 0.1749458 1.109364 1.795139 *** 8.301149
## y -3.862202 0.5616953 -4.963104 -2.761299 *** -6.875973
## marksother:y 2.938790 0.5804890 1.801053 4.076528 *** 5.062611
coef(model2)
## (Intercept) marksother y marksother:y
## 5.884603 1.452251 -3.862202 2.938790
plot(predict(model2))
The model formula ~marks * y
states that \[
\log \lambda_i((x,y)) = \gamma_i + \beta_i y
\] where \(\gamma_1, \gamma_2,
\ldots\) and \(\beta_1,\beta_2,
\ldots\) are parameters. The intensity may depend on the \(y\) coordinate in a completely different
way for different types of points.
plot(effectfun(model2, "y", marks="other"), log(.y) ~ .x, col=2, ylim=c(2,8), main="")
plot(effectfun(model2, "y", marks="ECL"), add=TRUE, col=3, log(.y) ~ .x)
legend("bottomleft", lwd=c(1,1), col=c(2,3), legend=c("other", "ECL"))
When we have fitted a point process model to a multi-type point pattern, we can compute ratios of the intensities of different types. This is automated in relrisk.ppm:
plot(relrisk(model2, casecontrol=FALSE))
One way to test for segregation is to compare two models, with the null model stating that there is no segregation:
nullmodel <- ppm(mucosa ~ marks + polynom(x, y, 2))
altmodel <- ppm(mucosa ~ marks * polynom(x, y, 2))
anova(nullmodel, altmodel, test="Chi")
## Analysis of Deviance Table
##
## Model 1: ~marks + (x + y + I(x^2) + I(x * y) + I(y^2)) Poisson
## Model 2: ~marks * (x + y + I(x^2) + I(x * y) + I(y^2)) Poisson
## Npar Df Deviance Pr(>Chi)
## 1 7
## 2 12 5 44.834 1.568e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1