Local Moran poisson dist

311c6185 · lea.douchet_ird.fr · 74607a99 · 311c6185 · 311c6185 · 311c6185
Commit 311c6185 authored 2 years ago by lea.douchet_ird.fr
--- a/07-basic_statistics.qmd
+++ b/07-basic_statistics.qmd
@@ -4,7 +4,7 @@ bibliography: references.bib

 # Basic statistics for spatial analysis

-This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data.
+This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data. If you wish to go further into these analysis and their limitations you can consult the tutorial "[Spatial Epidemiology](https://mkram01.github.io/EPI563-SpatialEPI/index.html)" from M. Kramer from which the statistical analysis of his section were adapted. 

 ## Import and visualize epidemiological data

@@ -222,20 +222,56 @@ For each district $i$, the Moran's statistics is :
 $$I_i = \frac{(Y_i-\bar{Y})}{\sum_{i=1}^N(Y_i-\bar{Y})^2}\sum_{j=1}^Nw_{ij}(Y_j - \bar{Y}) \text{ with }  I = \sum_{i=1}^NI_i/N$$
 :::

-The `localmoran()`function from the package `spdep` treats the variable of interest as if it was normally distributed. In some cases, this assumption could be reasonable for incidence rate, especially when the areal units of analysis have sufficiently large population count suggesting that the values have similar level of variances.
+The `localmoran()`function from the package `spdep` treats the variable of interest as if it was normally distributed. In some cases, this assumption could be reasonable for incidence rate, especially when the areal units of analysis have sufficiently large population count suggesting that the values have similar level of variances. Unfortunately, the local moran's test has not been implemented for poisson distribution (population not large enough in some districts) in `spdep` package. However Bivand **et al.** [@bivand2008applied] provided some code to manual perform the analysis using poisson distribution and was further implemented in the course "[Spatial Epidemiology](https://mkram01.github.io/EPI563-SpatialEPI/index.html)" .
+
+

 ```{r LocalMoransI, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}

-lm_test <- localmoran(x = district$incidence,
-                  listw = q_listw) 
+# Step 1 - Create the standardized deviation of observed from expected
+sd_lm <- (district$cases - district$expected) / sqrt(district$expected)
+
+# Step 2 - Create a spatially lagged version of standardized deviation of neighbors
+wsd_lm <- lag.listw(q_listw, sd_lm)
+
+# Step 3 - the local Moran's I is the product of step 1 and step 2
+district$I_lm <- sd_lm * wsd_lm
+
+# Step 4 - setup parameters for simulation of the null distribution
+
+# Specify number of simulations to run
+nsim <- 499
+
+# Specify dimensions of result based on number of regions
+N <- length(district$expected)
+
+# Create a matrix of zeros to hold results, with a row for each county, and a column for each simulation
+sims <- matrix(0, ncol = nsim, nrow = N)
+
+# Step 5 - Start a for-loop to iterate over simulation columns
+for(i in 1:nsim){
+  y <- rpois(N, lambda = district$expected) # generate a random event count, given expected
+  sd_lmi <- (y - district$expected) / sqrt(district$expected) # standardized local measure
+  wsd_lmi <- lag.listw(q_listw, sd_lmi) # standardized spatially lagged measure
+  sims[, i] <- sd_lmi * wsd_lmi # this is the I(i) statistic under this iteration of null
+}

-summary(lm_test)
+hist(sims[1,])
+# Step 6 - For each county, test where the observed value ranks with respect to the null simulations
+xrank <- apply(cbind(district$I_lm, sims), 1, function(x) rank(x)[1])

+# Step 7 - Calculate the difference between observed rank and total possible (nsim)
+diff <- nsim - xrank
+diff <- ifelse(diff > 0, diff, 0)
+
+# Step 8 - Assuming a uniform distribution of ranks, calculate p-value for observed
+# given the null distribution generate from simulations
+district$pval_lm <- punif((diff + 1) / (nsim + 1))
 ```

-`localmoran()`returns for each district fives outputs : the local moran indice $I_i$, the is the expected value of the Moran's I statistic under the null hypothesis $E_i$, the variance of each local moran's I statistics Var.Ii, the standardized deviation of the local Moran's I statistic Z.Ii and Pr(Z\>0) is the p-value.
+For each district, we obtain a p-value based on permutations process

-A conventional way of plotting these results is to classify the district into 5 classes based on local Moran's I outputs. Classification is performed based on a comparison of the scaled incidence in the district compared to the scaled weighted averaged incidence of it neighboring districts (computed with `lag.listw()`) :
+A conventional way of plotting these results is to classify the districts into 5 classes based on local Moran's I outputs. The classification of cluster that are significantly autocorrelated to their neighbors is performed based on a comparison of the scaled incidence in the district compared to the scaled weighted averaged incidence of it neighboring districts (computed with `lag.listw()`) :

 -    Districts that have higher-than-average rates in both index regions and their neighbors and showing statistically significant positive values for the local $I_i$ statistic are defined as __High-High__ (hotspot of the disease)

@@ -248,8 +284,6 @@ A conventional way of plotting these results is to classify the district into 5
 -   Districts with non-significant values for the $I_i$ statistic are defined as __Non-significant__.


-In this example, we adjusted for multiple testing using bonferroni correction, i.e. the p-value are multiplied by the number of tests. 
-
 ```{r LocalMoransI_plt, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}

 # create lagged local raw_rate - in other words the average of the queen neighbors value
@@ -257,8 +291,8 @@ In this example, we adjusted for multiple testing using bonferroni correction, i
 district$lag_std   <- scale(lag.listw(q_listw, var = district$incidence))
 district$incidence_std <- scale(district$incidence)

-# adjust for pvalues
-district$lm_pv <- p.adjust(lm_test[,5], "bonferroni")
+# extract pvalues
+# district$lm_pv <- lm_test[,5]

 # Classify local moran's outputs
 district$lm_class <- NA
@@ -266,7 +300,7 @@ district$lm_class[district$incidence_std >=0 & district$lag_std >=0] <- 'High-Hi
 district$lm_class[district$incidence_std <=0 & district$lag_std <=0] <- 'Low-Low'
 district$lm_class[district$incidence_std <=0 & district$lag_std >=0] <- 'Low-High'
 district$lm_class[district$incidence_std >=0 & district$lag_std <=0] <- 'High-Low'
-district$lm_class[district$lm_pv >= 0.5] <- 'Non-significant'
+district$lm_class[district$pval_lm >= 0.05] <- 'Non-significant'

 district$lm_class <- factor(district$lm_class, levels=c("High-High", "Low-Low", "High-Low",  "Low-High", "Non-significant") )

@@ -277,7 +311,7 @@ mf_map(x = district,
       cex = 2,
       col_na = "white",
       #val_order = c("High-High", "Low-Low", "High-Low",  "Low-High", "Non-significant") ,
-       pal = c("#6D0026" , "#7FABD3" , "white") , # "blue", "#FF755F",
+       pal = c("#6D0026" , "blue",  "white") , # "#FF755F","#7FABD3" ,
       leg_title = "Clusters")

 mf_layout(title = "Cluster using Local moran'I statistic")
@@ -286,6 +320,8 @@ mf_layout(title = "Cluster using Local moran'I statistic")

 ```

+
+
 ### Spatial scan statistics

 While Moran's indice focuses on testing for autocorrelation between neighboring polygons (under the null assumption of spatial independance), the spatial scan statistic aims at identifying an abnormal higher risk in a given region compared to the risk outside of this region (under the null assumption of homogeneous distribution). The conception of a cluster is therefore different between the two methods.
@@ -410,3 +446,8 @@ In this example, the expected number of cases was defined using the population c

 In addition, this cluster analysis was performed solely using the spatial scan but you should keep in mind that this method of cluster detection can be implemented for spatio-temporal data as well where the cluster definition is an abnormal number of cases in a delimited spatial area and during a given period of time. The windows of observation are therefore defined for a different center, radius and period of time. You should look at the function `scan_ep_poisson()` function in the package `scanstatistic` [@scanstatistics] for this analysis.
 :::
+
+
+
+
+
--- a/public/07-basic_statistics.html
+++ b/public/07-basic_statistics.html
--- a/public/07-basic_statistics_files/figure-html/LocalMoransI-1.png
+++ b/public/07-basic_statistics_files/figure-html/LocalMoransI-1.png
--- a/public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png
+++ b/public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png
--- a/public/07-basic_statistics_files/figure-html/MoransI-1.png
+++ b/public/07-basic_statistics_files/figure-html/MoransI-1.png
--- a/public/07-basic_statistics_files/figure-html/inc_visualization-1.png
+++ b/public/07-basic_statistics_files/figure-html/inc_visualization-1.png
--- a/public/07-basic_statistics_files/figure-html/kd_test-1.png
+++ b/public/07-basic_statistics_files/figure-html/kd_test-1.png
--- a/public/search.json
+++ b/public/search.json
--- a/references.bib
+++ b/references.bib
@@ -443,4 +443,12 @@ Library},
 	year = {2017},
 	date = {2017},
 	url = {https://CRAN.R-project.org/package=rnaturalearth}
+}
+
+@book{bivand2008applied,
+  title={Applied spatial data analysis with R},
+  author={Bivand, Roger S and Pebesma, Edzer J and G{\'o}mez-Rubio, Virgilio and Pebesma, Edzer Jan},
+  volume={747248717},
+  year={2008},
+  publisher={Springer}
 }
\ No newline at end of file