Parametric modelling of intensity

We can formulate a parametric model for the intensity and fit it to the point pattern data, using the spatstat function ppm (point process model).

In its simplest form, ppm fits a Poisson point process model to the point pattern data.

Poisson point process

The homogeneous Poisson process with intensity \(\lambda > 0\) in two-dimensional space is characterised by the following properties:

  • for any region \(B\), the random number \(n(X \cap B)\) of points falling in \(B\) follows a Poisson distribution;
  • for any region \(B\), the expected number of points falling in \(B\) is \(E[n(X \cap B)] = \lambda \, \mbox{area}(B)\);
  • for any region \(B\), given that \(n(X \cap B) = n\), the \(n\) points are independent and uniformly distributed inside \(B\);
  • for any disjoint regions \(B_1,\ldots, B_m\), the numbers \(n(X \cap B_1), \ldots, n(X \cap B_m)\) of points falling in each region are independent random variables.

Here are some realisations of the homogeneous Poisson process with intensity 100 (points per unit area):

plot(rpoispp(100, nsim=12), main="", main.panel="")

The inhomogeneous Poisson process with intensity function \(\lambda(u)\) is characterised by the following properties:

  • for any region \(B\), the random number \(n(X \cap B)\) of points falling in \(B\) follows a Poisson distribution;
  • for any region \(B\), the expected number of points falling in \(B\) is \[ E[n(X \cap B)] = \int_B \lambda(u) \, {\rm d}u; \]
  • for any region \(B\), given that \(n(X \cap B) = n\), the \(n\) points are independent and identically distributed inside \(B\) with probability density \(f(u)= \lambda(u)/\Lambda\), where \(\Lambda = \int_B \lambda(u) \, {\rm d}u\);
  • for any disjoint regions \(B_1,\ldots, B_m\), the numbers \(n(X \cap B_1), \ldots, n(X \cap B_m)\) of points falling in each region are independent random variables.

Here are some realisations of the inhomogeneous Poisson process with intensity function \(\lambda((x,y)) = 100 x\) on the unit square \([0,1]^2\) (why is this important?):

lam <- function(x,y) { 100 * x}
plot(rpoispp(lam, nsim=12), main="", main.panel="")

Loglinear model for intensity

ppm can fit a Poisson point process model to the point pattern data by maximum likelihood.

A Poisson point process is completely specified by its intensity function. So the procedure for formulating a Poisson model is simply to write a mathematical expression for the intensity function.

In ppm the intensity is assumed to be a loglinear function of the parameters. That is, \[\log\lambda(u) = \beta_1 Z_1(u) + \ldots + \beta_p Z_p(u)\] where \(\beta_1, \ldots, \beta_p\) are parameters to be estimated, and \(Z_1, \ldots, Z_p\) are spatial covariates.

To fit this model to a point pattern dataset X, we type

ppm(X ~ Z1 + Z2 + .. Zp)

where Z1, Z2, ..., Zp are pixel images or functions.

Important notes:

  1. The model is expressed in terms of the log of the intensity.

  2. The covariates \(Z_1(u), \ldots, Z_p(u)\) (called the “canonical covariates”) can be anything; they are not necessarily the same as the original variables that we were given; they could be transformations and combinations of the original variables.

Fit by maximum likelihood

The Poisson process with intensity function \(\lambda_\theta(u)\), controlled by a parameter vector \(\theta\), has log-likelihood \[ \log L(\theta) = \sum_{i=1}^n \log \lambda_\theta(x_i) - \int_W \lambda_\theta(u) \, {\rm d} u. \] The value of \(\theta\) which maximises \(\log L(\theta)\) is taken as the parameter estimate \(\hat\theta\).

From \(\hat\theta\) we can compute the fitted intensity \(\hat\lambda(u) = \lambda_{\hat\theta}(u)\) and hence we can generate simulated realisations.

Using the likelihood we are able to compute confidence intervals, perform analysis of deviance, conduct hypothesis tests, etc.

Example: Murchison gold data

Using the Murchison data from before,

X <- murchison$gold
L <- murchison$faults
X <- rescale(X, 1000, "km")
L <- rescale(L, 1000, "km")
D <- distfun(L)
fit <- ppm(X ~ D)

The formula implies that the model is \[\log\lambda(u) = \beta_0 + \beta_1 D(u)\] where \(D(u)\) is the distance covariate (distance from location \(u\) to nearest geological fault) and \(\beta_0, \beta_1\) are the regression coefficients. In other words, the model says that the intensity of gold deposits is an exponentially decreasing function of distance from the nearest fault.

The result of ppm is a fitted model object of class "ppm". There are many methods for this class:

fit
## Nonstationary Poisson process
## 
## Log intensity:  ~D
## 
## Fitted trend coefficients:
## (Intercept)           D 
##  -4.3412775  -0.2607664 
## 
##               Estimate       S.E.    CI95.lo    CI95.hi Ztest      Zval
## (Intercept) -4.3412775 0.08556260 -4.5089771 -4.1735779   *** -50.73802
## D           -0.2607664 0.02018789 -0.3003339 -0.2211988   *** -12.91697
coef(fit)
## (Intercept)           D 
##  -4.3412775  -0.2607664
confint(fit)
##                  2.5 %     97.5 %
## (Intercept) -4.5089771 -4.1735779
## D           -0.3003339 -0.2211988
anova(fit, test="Chi")
## Analysis of Deviance Table
## Terms added sequentially (first to last)
## 
##      Df Deviance Npar  Pr(>Chi)    
## NULL                1              
## D     1   590.92    2 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(predict(fit))

plot(simulate(fit, drop=TRUE))
plot(L, add=TRUE, col=3)