Skip to content
Snippets Groups Projects
Commit 311c6185 authored by lea.douchet_ird.fr's avatar lea.douchet_ird.fr
Browse files

Local Moran poisson dist

parent 74607a99
No related branches found
No related tags found
No related merge requests found
Pipeline #1065 passed
......@@ -4,7 +4,7 @@ bibliography: references.bib
# Basic statistics for spatial analysis
This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data.
This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data. If you wish to go further into these analysis and their limitations you can consult the tutorial "[Spatial Epidemiology](https://mkram01.github.io/EPI563-SpatialEPI/index.html)" from M. Kramer from which the statistical analysis of his section were adapted.
## Import and visualize epidemiological data
......@@ -222,20 +222,56 @@ For each district $i$, the Moran's statistics is :
$$I_i = \frac{(Y_i-\bar{Y})}{\sum_{i=1}^N(Y_i-\bar{Y})^2}\sum_{j=1}^Nw_{ij}(Y_j - \bar{Y}) \text{ with } I = \sum_{i=1}^NI_i/N$$
:::
The `localmoran()`function from the package `spdep` treats the variable of interest as if it was normally distributed. In some cases, this assumption could be reasonable for incidence rate, especially when the areal units of analysis have sufficiently large population count suggesting that the values have similar level of variances.
The `localmoran()`function from the package `spdep` treats the variable of interest as if it was normally distributed. In some cases, this assumption could be reasonable for incidence rate, especially when the areal units of analysis have sufficiently large population count suggesting that the values have similar level of variances. Unfortunately, the local moran's test has not been implemented for poisson distribution (population not large enough in some districts) in `spdep` package. However Bivand **et al.** [@bivand2008applied] provided some code to manual perform the analysis using poisson distribution and was further implemented in the course "[Spatial Epidemiology](https://mkram01.github.io/EPI563-SpatialEPI/index.html)" .
```{r LocalMoransI, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
lm_test <- localmoran(x = district$incidence,
listw = q_listw)
# Step 1 - Create the standardized deviation of observed from expected
sd_lm <- (district$cases - district$expected) / sqrt(district$expected)
# Step 2 - Create a spatially lagged version of standardized deviation of neighbors
wsd_lm <- lag.listw(q_listw, sd_lm)
# Step 3 - the local Moran's I is the product of step 1 and step 2
district$I_lm <- sd_lm * wsd_lm
# Step 4 - setup parameters for simulation of the null distribution
# Specify number of simulations to run
nsim <- 499
# Specify dimensions of result based on number of regions
N <- length(district$expected)
# Create a matrix of zeros to hold results, with a row for each county, and a column for each simulation
sims <- matrix(0, ncol = nsim, nrow = N)
# Step 5 - Start a for-loop to iterate over simulation columns
for(i in 1:nsim){
y <- rpois(N, lambda = district$expected) # generate a random event count, given expected
sd_lmi <- (y - district$expected) / sqrt(district$expected) # standardized local measure
wsd_lmi <- lag.listw(q_listw, sd_lmi) # standardized spatially lagged measure
sims[, i] <- sd_lmi * wsd_lmi # this is the I(i) statistic under this iteration of null
}
summary(lm_test)
hist(sims[1,])
# Step 6 - For each county, test where the observed value ranks with respect to the null simulations
xrank <- apply(cbind(district$I_lm, sims), 1, function(x) rank(x)[1])
# Step 7 - Calculate the difference between observed rank and total possible (nsim)
diff <- nsim - xrank
diff <- ifelse(diff > 0, diff, 0)
# Step 8 - Assuming a uniform distribution of ranks, calculate p-value for observed
# given the null distribution generate from simulations
district$pval_lm <- punif((diff + 1) / (nsim + 1))
```
`localmoran()`returns for each district fives outputs : the local moran indice $I_i$, the is the expected value of the Moran's I statistic under the null hypothesis $E_i$, the variance of each local moran's I statistics Var.Ii, the standardized deviation of the local Moran's I statistic Z.Ii and Pr(Z\>0) is the p-value.
For each district, we obtain a p-value based on permutations process
A conventional way of plotting these results is to classify the district into 5 classes based on local Moran's I outputs. Classification is performed based on a comparison of the scaled incidence in the district compared to the scaled weighted averaged incidence of it neighboring districts (computed with `lag.listw()`) :
A conventional way of plotting these results is to classify the districts into 5 classes based on local Moran's I outputs. The classification of cluster that are significantly autocorrelated to their neighbors is performed based on a comparison of the scaled incidence in the district compared to the scaled weighted averaged incidence of it neighboring districts (computed with `lag.listw()`) :
- Districts that have higher-than-average rates in both index regions and their neighbors and showing statistically significant positive values for the local $I_i$ statistic are defined as __High-High__ (hotspot of the disease)
......@@ -248,8 +284,6 @@ A conventional way of plotting these results is to classify the district into 5
- Districts with non-significant values for the $I_i$ statistic are defined as __Non-significant__.
In this example, we adjusted for multiple testing using bonferroni correction, i.e. the p-value are multiplied by the number of tests.
```{r LocalMoransI_plt, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
# create lagged local raw_rate - in other words the average of the queen neighbors value
......@@ -257,8 +291,8 @@ In this example, we adjusted for multiple testing using bonferroni correction, i
district$lag_std <- scale(lag.listw(q_listw, var = district$incidence))
district$incidence_std <- scale(district$incidence)
# adjust for pvalues
district$lm_pv <- p.adjust(lm_test[,5], "bonferroni")
# extract pvalues
# district$lm_pv <- lm_test[,5]
# Classify local moran's outputs
district$lm_class <- NA
......@@ -266,7 +300,7 @@ district$lm_class[district$incidence_std >=0 & district$lag_std >=0] <- 'High-Hi
district$lm_class[district$incidence_std <=0 & district$lag_std <=0] <- 'Low-Low'
district$lm_class[district$incidence_std <=0 & district$lag_std >=0] <- 'Low-High'
district$lm_class[district$incidence_std >=0 & district$lag_std <=0] <- 'High-Low'
district$lm_class[district$lm_pv >= 0.5] <- 'Non-significant'
district$lm_class[district$pval_lm >= 0.05] <- 'Non-significant'
district$lm_class <- factor(district$lm_class, levels=c("High-High", "Low-Low", "High-Low", "Low-High", "Non-significant") )
......@@ -277,7 +311,7 @@ mf_map(x = district,
cex = 2,
col_na = "white",
#val_order = c("High-High", "Low-Low", "High-Low", "Low-High", "Non-significant") ,
pal = c("#6D0026" , "#7FABD3" , "white") , # "blue", "#FF755F",
pal = c("#6D0026" , "blue", "white") , # "#FF755F","#7FABD3" ,
leg_title = "Clusters")
mf_layout(title = "Cluster using Local moran'I statistic")
......@@ -286,6 +320,8 @@ mf_layout(title = "Cluster using Local moran'I statistic")
```
### Spatial scan statistics
While Moran's indice focuses on testing for autocorrelation between neighboring polygons (under the null assumption of spatial independance), the spatial scan statistic aims at identifying an abnormal higher risk in a given region compared to the risk outside of this region (under the null assumption of homogeneous distribution). The conception of a cluster is therefore different between the two methods.
......@@ -410,3 +446,8 @@ In this example, the expected number of cases was defined using the population c
In addition, this cluster analysis was performed solely using the spatial scan but you should keep in mind that this method of cluster detection can be implemented for spatio-temporal data as well where the cluster definition is an abnormal number of cases in a delimited spatial area and during a given period of time. The windows of observation are therefore defined for a different center, radius and period of time. You should look at the function `scan_ep_poisson()` function in the package `scanstatistic` [@scanstatistics] for this analysis.
:::
This diff is collapsed.
public/07-basic_statistics_files/figure-html/LocalMoransI-1.png

15.7 KiB

public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png

42.4 KiB | W: | H:

public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png

43.1 KiB | W: | H:

public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png
public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png
public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png
public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png
  • 2-up
  • Swipe
  • Onion skin
public/07-basic_statistics_files/figure-html/MoransI-1.png

16.5 KiB | W: | H:

public/07-basic_statistics_files/figure-html/MoransI-1.png

16.5 KiB | W: | H:

public/07-basic_statistics_files/figure-html/MoransI-1.png
public/07-basic_statistics_files/figure-html/MoransI-1.png
public/07-basic_statistics_files/figure-html/MoransI-1.png
public/07-basic_statistics_files/figure-html/MoransI-1.png
  • 2-up
  • Swipe
  • Onion skin
public/07-basic_statistics_files/figure-html/inc_visualization-1.png

349 KiB | W: | H:

public/07-basic_statistics_files/figure-html/inc_visualization-1.png

52.4 KiB | W: | H:

public/07-basic_statistics_files/figure-html/inc_visualization-1.png
public/07-basic_statistics_files/figure-html/inc_visualization-1.png
public/07-basic_statistics_files/figure-html/inc_visualization-1.png
public/07-basic_statistics_files/figure-html/inc_visualization-1.png
  • 2-up
  • Swipe
  • Onion skin
public/07-basic_statistics_files/figure-html/kd_test-1.png

15.3 KiB | W: | H:

public/07-basic_statistics_files/figure-html/kd_test-1.png

15.4 KiB | W: | H:

public/07-basic_statistics_files/figure-html/kd_test-1.png
public/07-basic_statistics_files/figure-html/kd_test-1.png
public/07-basic_statistics_files/figure-html/kd_test-1.png
public/07-basic_statistics_files/figure-html/kd_test-1.png
  • 2-up
  • Swipe
  • Onion skin
This diff is collapsed.
......@@ -443,4 +443,12 @@ Library},
year = {2017},
date = {2017},
url = {https://CRAN.R-project.org/package=rnaturalearth}
}
@book{bivand2008applied,
title={Applied spatial data analysis with R},
author={Bivand, Roger S and Pebesma, Edzer J and G{\'o}mez-Rubio, Virgilio and Pebesma, Edzer Jan},
volume={747248717},
year={2008},
publisher={Springer}
}
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment