Parametric modelling of intensity

We can formulate a parametric model for the intensity and fit it to the point pattern data, using the spatstat function ppm (point process model).

In its simplest form, ppm fits a Poisson point process model to the point pattern data.

Poisson point process

The homogeneous Poisson process with intensity \(\lambda > 0\) in two-dimensional space is characterised by the following properties:

  • for any region \(B\), the random number \(n(X \cap B)\) of points falling in \(B\) follows a Poisson distribution;
  • for any region \(B\), the expected number of points falling in \(B\) is \(E[n(X \cap B)] = \lambda \, \mbox{area}(B)\);
  • for any region \(B\), given that \(n(X \cap B) = n\), the \(n\) points are independent and uniformly distributed inside \(B\);
  • for any disjoint regions \(B_1,\ldots, B_m\), the numbers \(n(X \cap B_1), \ldots, n(X \cap B_m)\) of points falling in each region are independent random variables.

Here are some realisations of the homogeneous Poisson process with intensity 100 (points per unit area):

plot(rpoispp(100, nsim=12), main="", main.panel="")

The inhomogeneous Poisson process with intensity function \(\lambda(u)\) is characterised by the following properties:

  • for any region \(B\), the random number \(n(X \cap B)\) of points falling in \(B\) follows a Poisson distribution;
  • for any region \(B\), the expected number of points falling in \(B\) is \[ E[n(X \cap B)] = \int_B \lambda(u) \, {\rm d}u; \]
  • for any region \(B\), given that \(n(X \cap B) = n\), the \(n\) points are independent and identically distributed inside \(B\) with probability density \(f(u)= \lambda(u)/\Lambda\), where \(\Lambda = \int_B \lambda(u) \, {\rm d}u\);
  • for any disjoint regions \(B_1,\ldots, B_m\), the numbers \(n(X \cap B_1), \ldots, n(X \cap B_m)\) of points falling in each region are independent random variables.

Here are some realisations of the inhomogeneous Poisson process with intensity function \(\lambda((x,y)) = 100 x\) on the unit square \([0,1]^2\) (why is this important?):

lam <- function(x,y) { 100 * x}
plot(rpoispp(lam, nsim=12), main="", main.panel="")

Loglinear model for intensity

ppm can fit a Poisson point process model to the point pattern data by maximum likelihood.

A Poisson point process is completely specified by its intensity function. So the procedure for formulating a Poisson model is simply to write a mathematical expression for the intensity function.

In ppm the intensity is assumed to be a loglinear function of the parameters. That is, \[\log\lambda(u) = \beta_1 Z_1(u) + \ldots + \beta_p Z_p(u)\] where \(\beta_1, \ldots, \beta_p\) are parameters to be estimated, and \(Z_1, \ldots, Z_p\) are spatial covariates.

To fit this model to a point pattern dataset X, we type

ppm(X ~ Z1 + Z2 + .. Zp)

where Z1, Z2, ..., Zp are pixel images or functions.

Important notes:

  1. The model is expressed in terms of the log of the intensity.

  2. The covariates \(Z_1(u), \ldots, Z_p(u)\) (called the “canonical covariates”) can be anything; they are not necessarily the same as the original variables that we were given; they could be transformations and combinations of the original variables.

Fit by maximum likelihood

The Poisson process with intensity function \(\lambda_\theta(u)\), controlled by a parameter vector \(\theta\), has log-likelihood \[ \log L(\theta) = \sum_{i=1}^n \log \lambda_\theta(x_i) - \int_W \lambda_\theta(u) \, {\rm d} u. \] The value of \(\theta\) which maximises \(\log L(\theta)\) is taken as the parameter estimate \(\hat\theta\).

From \(\hat\theta\) we can compute the fitted intensity \(\hat\lambda(u) = \lambda_{\hat\theta}(u)\) and hence we can generate simulated realisations.

Using the likelihood we are able to compute confidence intervals, perform analysis of deviance, conduct hypothesis tests, etc.

Example: Murchison gold data

Using the Murchison data from before,

X <- murchison$gold
L <- murchison$faults
X <- rescale(X, 1000, "km")
L <- rescale(L, 1000, "km")
D <- distfun(L)
fit <- ppm(X ~ D)

The formula implies that the model is \[\log\lambda(u) = \beta_0 + \beta_1 D(u)\] where \(D(u)\) is the distance covariate (distance from location \(u\) to nearest geological fault) and \(\beta_0, \beta_1\) are the regression coefficients. In other words, the model says that the intensity of gold deposits is an exponentially decreasing function of distance from the nearest fault.

The result of ppm is a fitted model object of class "ppm". There are many methods for this class:

fit
## Nonstationary Poisson process
## 
## Log intensity:  ~D
## 
## Fitted trend coefficients:
## (Intercept)           D 
##  -4.3412775  -0.2607664 
## 
##               Estimate       S.E.    CI95.lo    CI95.hi Ztest      Zval
## (Intercept) -4.3412775 0.08556260 -4.5089771 -4.1735779   *** -50.73802
## D           -0.2607664 0.02018789 -0.3003339 -0.2211988   *** -12.91697
coef(fit)
## (Intercept)           D 
##  -4.3412775  -0.2607664
confint(fit)
##                  2.5 %     97.5 %
## (Intercept) -4.5089771 -4.1735779
## D           -0.3003339 -0.2211988
anova(fit, test="Chi")
## Analysis of Deviance Table
## Terms added sequentially (first to last)
## 
##      Df Deviance Npar  Pr(>Chi)    
## NULL                1              
## D     1   590.92    2 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(predict(fit))

plot(simulate(fit, drop=TRUE))
plot(L, add=TRUE, col=3)

To visualise the intensity of the model as a function of one of the covariates, we can use the command effectfun:

plot(effectfun(fit, "D"), xlim=c(0, 20))

Intensity depending on marks

In a multi-type point pattern the points have marks which are categorical values:

mucosa
## Marked planar point pattern: 965 points
## Multitype, with levels = ECL, other 
## window: rectangle = [0, 1] x [0, 0.81] units
plot(mucosa, cols=c(2,3))

We can fit a Poisson model in which the intensity depends on the type of point, using the variable name marks in the model formula.

model0 <- ppm(mucosa ~ marks)
model0
## Stationary multitype Poisson process
## 
## Possible marks: 'ECL' and 'other'
## 
## Log intensity:  ~marks
## 
## Intensities:
##   beta_ECL beta_other 
##   109.8765  1081.4815 
## 
##             Estimate      S.E.  CI95.lo  CI95.hi Ztest     Zval
## (Intercept) 4.699357 0.1059998 4.491602 4.907113   *** 44.33365
## marksother  2.286730 0.1112542 2.068675 2.504784   *** 20.55409
coef(model0)
## (Intercept)  marksother 
##    4.699357    2.286730
plot(predict(model0), equal.ribbon=TRUE)

In the formula, the marks variable is a categorical variable. The effect of the model formula mucosa ~ marks is to estimate a different intensity for each level, that is, a different intensity for each type of point. The model formula mucosa ~ marks is equivalent to saying that the intensity of the points of type \(i\) is \[ \lambda_i(u) = \alpha_i \] for each \(i = 1, 2, \ldots\) where \(\alpha_1, \alpha_2, \ldots\) are the different constant intensities to be estimated. The actual printed output will depend on the convention for handling “contrasts” in linear models.

The marks variable can be combined with other explanatory variables (names x and y refer to the cartesian coordinates):

model1 <- ppm(mucosa ~ marks + y)
model1
## Nonstationary multitype Poisson process
## 
## Possible marks: 'ECL' and 'other'
## 
## Log intensity:  ~marks + y
## 
## Fitted trend coefficients:
## (Intercept)  marksother           y 
##    5.131273    2.286730   -1.156055 
## 
##              Estimate      S.E.   CI95.lo    CI95.hi Ztest      Zval
## (Intercept)  5.131273 0.1164479  4.903039  5.3595066   *** 44.064966
## marksother   2.286730 0.1112542  2.068675  2.5047840   *** 20.554089
## y           -1.156055 0.1406821 -1.431787 -0.8803236   *** -8.217504
coef(model1)
## (Intercept)  marksother           y 
##    5.131273    2.286730   -1.156055
plot(predict(model1))

The model formula ~marks + y states that \[ \log \lambda_i((x,y)) = \gamma_i + \beta y \] where \(\gamma_1, \gamma_2, \ldots\) and \(\beta\) are parameters. That is, the dependence on the \(y\) coordinate has the same “slope” coefficient \(\beta\) for each type of point, but different types of points have different abundance overall.

plot(effectfun(model1, "y", marks="other"), log(.y) ~ .x, ylim=c(4,8), col=2, main="")
plot(effectfun(model1, "y", marks="ECL"), add=TRUE, col=3, log(.y) ~ .x)
legend("bottomleft", lwd=c(1,1), col=c(2,3), legend=c("other", "ECL"))

model2 <- ppm(mucosa ~ marks * y)
model2
## Nonstationary multitype Poisson process
## 
## Possible marks: 'ECL' and 'other'
## 
## Log intensity:  ~marks * y
## 
## Fitted trend coefficients:
##  (Intercept)   marksother            y marksother:y 
##     5.884603     1.452251    -3.862202     2.938790 
## 
##               Estimate      S.E.   CI95.lo   CI95.hi Ztest      Zval
## (Intercept)   5.884603 0.1635842  5.563984  6.205222   *** 35.972936
## marksother    1.452251 0.1749458  1.109364  1.795139   ***  8.301149
## y            -3.862202 0.5616953 -4.963104 -2.761299   *** -6.875973
## marksother:y  2.938790 0.5804890  1.801053  4.076528   ***  5.062611
coef(model2)
##  (Intercept)   marksother            y marksother:y 
##     5.884603     1.452251    -3.862202     2.938790
plot(predict(model2))

The model formula ~marks * y states that \[ \log \lambda_i((x,y)) = \gamma_i + \beta_i y \] where \(\gamma_1, \gamma_2, \ldots\) and \(\beta_1,\beta_2, \ldots\) are parameters. The intensity may depend on the \(y\) coordinate in a completely different way for different types of points.

plot(effectfun(model2, "y", marks="other"), log(.y) ~ .x, col=2, ylim=c(2,8), main="")
plot(effectfun(model2, "y", marks="ECL"), add=TRUE, col=3, log(.y) ~ .x)
legend("bottomleft", lwd=c(1,1), col=c(2,3), legend=c("other", "ECL"))

Parametric estimation of spatially-varying probability

When we have fitted a point process model to a multi-type point pattern, we can compute ratios of the intensities of different types. This is automated in relrisk.ppm:

plot(relrisk(model2, casecontrol=FALSE))

Parametric test for segregation

One way to test for segregation is to compare two models, with the null model stating that there is no segregation:

nullmodel <- ppm(mucosa ~ marks + polynom(x, y, 2))
altmodel <- ppm(mucosa ~ marks * polynom(x, y, 2))
anova(nullmodel, altmodel, test="Chi")
## Analysis of Deviance Table
## 
## Model 1: ~marks + (x + y + I(x^2) + I(x * y) + I(y^2))    Poisson
## Model 2: ~marks * (x + y + I(x^2) + I(x * y) + I(y^2))    Poisson
##   Npar Df Deviance  Pr(>Chi)    
## 1    7                          
## 2   12  5   44.834 1.568e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1