We can formulate a parametric model for the intensity and fit it to
the point pattern data, using the spatstat
function
ppm
(point process model).
In its simplest form, ppm
fits a Poisson point
process model to the point pattern data.
The homogeneous Poisson process with intensity \(\lambda > 0\) in two-dimensional space is characterised by the following properties:
Here are some realisations of the homogeneous Poisson process with intensity 100 (points per unit area):
plot(rpoispp(100, nsim=12), main="", main.panel="")
The inhomogeneous Poisson process with intensity function \(\lambda(u)\) is characterised by the following properties:
Here are some realisations of the inhomogeneous Poisson process with intensity function \(\lambda((x,y)) = 100 x\) on the unit square \([0,1]^2\) (why is this important?):
lam <- function(x,y) { 100 * x}
plot(rpoispp(lam, nsim=12), main="", main.panel="")
ppm
can fit a Poisson point process model to
the point pattern data by maximum likelihood.
A Poisson point process is completely specified by its intensity function. So the procedure for formulating a Poisson model is simply to write a mathematical expression for the intensity function.
In ppm
the intensity is assumed to be a
loglinear function of the parameters.
That is, \[\log\lambda(u) = \beta_1 Z_1(u) +
\ldots + \beta_p Z_p(u)\] where \(\beta_1, \ldots, \beta_p\) are parameters
to be estimated, and \(Z_1, \ldots,
Z_p\) are spatial covariates.
To fit this model to a point pattern dataset X
, we
type
ppm(X ~ Z1 + Z2 + .. Zp)
where Z1, Z2, ..., Zp
are pixel images or functions.
Important notes:
The model is expressed in terms of the log of the intensity.
The covariates \(Z_1(u), \ldots, Z_p(u)\) (called the “canonical covariates”) can be anything; they are not necessarily the same as the original variables that we were given; they could be transformations and combinations of the original variables.
The Poisson process with intensity function \(\lambda_\theta(u)\), controlled by a parameter vector \(\theta\), has log-likelihood \[ \log L(\theta) = \sum_{i=1}^n \log \lambda_\theta(x_i) - \int_W \lambda_\theta(u) \, {\rm d} u. \] The value of \(\theta\) which maximises \(\log L(\theta)\) is taken as the parameter estimate \(\hat\theta\).
From \(\hat\theta\) we can compute the fitted intensity \(\hat\lambda(u) = \lambda_{\hat\theta}(u)\) and hence we can generate simulated realisations.
Using the likelihood we are able to compute confidence intervals, perform analysis of deviance, conduct hypothesis tests, etc.
Example: Murchison gold data
Using the Murchison data from before,
X <- murchison$gold
L <- murchison$faults
X <- rescale(X, 1000, "km")
L <- rescale(L, 1000, "km")
D <- distfun(L)
fit <- ppm(X ~ D)
The formula implies that the model is \[\log\lambda(u) = \beta_0 + \beta_1 D(u)\] where \(D(u)\) is the distance covariate (distance from location \(u\) to nearest geological fault) and \(\beta_0, \beta_1\) are the regression coefficients. In other words, the model says that the intensity of gold deposits is an exponentially decreasing function of distance from the nearest fault.
The result of ppm
is a fitted model object of class
"ppm"
. There are many methods for this class:
fit
## Nonstationary Poisson process
##
## Log intensity: ~D
##
## Fitted trend coefficients:
## (Intercept) D
## -4.3412775 -0.2607664
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -4.3412775 0.08556260 -4.5089771 -4.1735779 *** -50.73802
## D -0.2607664 0.02018789 -0.3003339 -0.2211988 *** -12.91697
coef(fit)
## (Intercept) D
## -4.3412775 -0.2607664
confint(fit)
## 2.5 % 97.5 %
## (Intercept) -4.5089771 -4.1735779
## D -0.3003339 -0.2211988
anova(fit, test="Chi")
## Analysis of Deviance Table
## Terms added sequentially (first to last)
##
## Df Deviance Npar Pr(>Chi)
## NULL 1
## D 1 590.92 2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(predict(fit))
plot(simulate(fit, drop=TRUE))
plot(L, add=TRUE, col=3)