R Tutorial : Further testing (Spatial Statistics in R)

Показать описание

---

One problem with the quadrat test is that you have to choose a set of sub-regions:

Too few, and you don't have many numbers for the statistical test to have much power.

Too many, and your regions don't have many points in them and you lose power too.

There are alternative tests that don't depend on arbitrary subdivisions, instead of relying on estimating properties of the spatial point process from the events.

One such property is the nearest-neighbour distribution. Look at each event in your pattern and find the distance to the nearest event.

Do this for every event, and plot a histogram to give an estimate of the probability density function of the nearest neighbour distribution.

So here, A's nearest neighbour is B, B's nearest neighbour is A, and C's nearest neighbour is B. That gives us three distances.

Do this for every event, and plot a histogram to give an estimate of the probability density function of the nearest neighbour distribution.

The corresponding cumulative distribution function, the probability of finding a nearest neighbour within a distance d, is named "G". For a completely spatially random process, the theoretical form of G can be worked out exactly, so for a process with "lambda" events per unit area:

It looks like this. spatstat uses the Gest() function to estimate G, given a ppp object and an optional distance vector. The lines in this plot include two corrections for edge effects. Events near the edge of the window have less area in which their nearest neighbour might have been, and this will bias the estimator. So Gest() has a "correction" argument to choose these edge-correction algorithms.

Another useful function of spatial point processes is the "Ripley's reduced second-moment measure" - better known as the "K" function.

K is the number of expected events to be found at a given distance from an event, scaled by the intensity. To estimate it for some given distance d, visit each event in turn and count the number of other events in a circle of radius d.

Take the average. That's K(d).

Do that for a number of values of d and you can plot the function. For a completely spatially random process this gives K of d equals pi times d squared, the area of a circle of radius d. But how much variation can be expected from an estimate of K so that it can be used as a test for complete spatial randomness?

Well, first create 99 completely spatially random point patterns in your window and compute the 99 estimates of K. You can plot them:

Now compute K for the data and plot it over the simulation estimates.

Is it bigger than the simulations at any point? That's an indication of clustering at that scale. Since the data at that point outranks all ninety-nine simulations you can say it rejects the null hypothesis at a p-value of point zero one. Using ninety-nine simulations just makes the division easy, since you divide by ninety-nine plus one to get the p-value.

This sort of test, where you generate replicates from your hypothesis, compute test statistics or functions, and comparing with the statistic from the data, is a type of Monte-Carlo test. It's commonly used in spatial statistics because working out the theoretical distribution of things can be somewhat tricky.

In the next few exercises you'll explore the nearest neighbour distribution and do some Monte-Carlo testing.

#DataCamp #RTutorial #SpatialStatisticsinR