tes hypothesis of kulldorff

e849094c · lea.douchet_ird.fr · 07cd4eca · e849094c · e849094c · 07cd4eca
Commit e849094c authored 2 years ago by lea.douchet_ird.fr
--- a/07-basic_statistics.qmd
+++ b/07-basic_statistics.qmd
@@ -43,7 +43,7 @@ mf_map(x = cases, lwd = .5, col = "#990000", pch = 20, add = TRUE)

 ```

-In epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, we cannot precisely tell if this observation represents an event of interest (e.g., illness, death, ...) or a person at risk (e.g., a participant that may or may not experience the disease). Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appear as great areal units for cases aggregation since they make available data on population count and structures. In this study, we will use the district as the areal unit of the study.
+In epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, we cannot precisely tell if this observation represents an event of interest (e.g., illness, death, ...) or a person at risk (e.g., a participant that may or may not experience the disease). If you can consider that the population at risk is uniformly distributed in small area (a city for example), this is likely not the case at a country scale. Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appear as great areal units for cases aggregation since they make available data on population count and structures. In this study, we will use the district as the areal unit of the study.

 ```{r district_aggregate, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
 # Aggregate cases over districts
@@ -213,7 +213,7 @@ plot(m_test)

 The Moran's statistics is here $I =$ `r signif(m_test$t0, 2)`. When comparing its value to the H0 distribution (built under `r m_test$R` simulations), the probability of observing such a I value under the null hypothesis, i.e. the distribution of cases is spatially independent, is $p_{value} =$ `r signif(( 1+ (sum((-abs(as.numeric(m_test$t0-mean(m_test$t))))>as.numeric(m_test$t-mean(m_test$t)))) + (sum(abs(as.numeric(m_test$t0-mean(m_test$t)))<as.numeric(m_test$t-mean(m_test$t)))) )/(m_test$R+1), 2)`. We therefore reject H0 with error risk of $\alpha = 5\%$. The distribution of cases is therefore autocorrelated across districts in Cambodia.

-#### Moran's I local test
+#### The Local Moran's I LISA test

 The global Moran's test provides us a global statistical value informing whether autocorrelation occurs over the territory but does not inform on where does these correlations occurs, i.e., what is the locations of the clusters. To identify such cluster, we can decompose the Moran's I statistic to extract local information of the level of correlation of each district and its neighbors. This is called the Local Moran's I LISA statistic. Because the Local Moran's I LISA statistic test each district for autocorrelation independently, concern is raised about multiple testing limitations that increase the Type I error ($\alpha$) of the statistical tests. The use of local test should therefore be study in light of explore and describes clusters once the global test detected autocorrelation.

@@ -228,7 +228,6 @@ $$I_i = \frac{(Y_i-\bar{Y})}{\sum_{i=1}^N(Y_i-\bar{Y})^2}\sum_{j=1}^Nw_{ij}(Y_j
 The `localmoran()`function from the package `spdep` treats the variable of interest as if it was normally distributed. In some cases, this assumption could be reasonable for incidence rate, especially when the areal units of analysis have sufficiently large population count suggesting that the values have similar level of variances. Unfortunately, the local Moran’s test has not been implemented for Poisson distribution (population not large enough in some districts) in `spdep` package. However, Bivand **et al.** [@bivand2008applied] provided some code to manual perform the analysis using Poisson distribution and was further implemented in the course "[Spatial Epidemiology](https://mkram01.github.io/EPI563-SpatialEPI/index.html)”.


-
 ```{r LocalMoransI, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}

 # Step 1 - Create the standardized deviation of observed from expected
@@ -259,7 +258,6 @@ for(i in 1:nsim){
  sims[, i] <- sd_lmi * wsd_lmi # this is the I(i) statistic under this iteration of null
 }

-hist(sims[1,])
 # Step 6 - For each county, test where the observed value ranks with respect to the null simulations
 xrank <- apply(cbind(district$I_lm, sims), 1, function(x) rank(x)[1])

@@ -319,8 +317,6 @@ mf_map(x = district,

 mf_layout(title = "Cluster using Local Moran's I statistic")

-
-
 ```


@@ -329,16 +325,35 @@ mf_layout(title = "Cluster using Local Moran's I statistic")

 While Moran's indices focus on testing for autocorrelation between neighboring polygons (under the null assumption of spatial independence), the spatial scan statistic aims at identifying an abnormal higher risk in a given region compared to the risk outside of this region (under the null assumption of homogeneous distribution). The conception of a cluster is therefore different between the two methods.

-The function `kulldorff` from the package `SpatialEpi` [@SpatialEpi] is a simple tool to implement spatial-only scan statistics. Briefly, the kulldorff scan statistics scan the area for clusters using several steps:
+The function `kulldorff` from the package `SpatialEpi` [@SpatialEpi] is a simple tool to implement spatial-only scan statistics. 
+
+::: callout-note
+##### Kulldorf test
+
+Under the kulldorff test, the statistics hypotheses are:
+
+-   **H0**: the risk is constant over the area, i.e., there is a spatial homogeneity of the incidence.
+
+-   **H1**: a particular window have higher incidence than the rest of the area , i.e., there is a spatial heterogeneity of incidence.
+
+:::
+
+
+Briefly, the kulldorff scan statistics scan the area for clusters using several steps:

 1.  It create a circular window of observation by defining a single location and an associated radius of the windows varying from 0 to a large number that depends on population distribution (largest radius could include 50% of the population).

 2.  It aggregates the count of events and the population at risk (or an expected count of events) inside and outside the window of observation.

-3.  Finally, it computes the likelihood ratio to test whether the risk is equal inside versus outside the windows (H0) or greater inside the observed window
+3.  Finally, it computes the likelihood ratio and test whether the risk is equal inside versus outside the windows (H0) or greater inside the observed window (H1). The H0 distribution is estimated by simulating the distribution of counts under the null hypothesis (homogeneous risk).

 4.  These 3 steps are repeated for each location and each possible windows-radii.

+
+While we test the significance of a large number of observation windows, one can raise concern about multiple testing and Type I error. This approach however suggest that we are not interest in a set of signifiant cluster but only in a most-likely cluster. This **a priori** restriction eliminate concern for multpile comparison since the test is simplified to a statistically significance of one single most-likely cluster.
+
+Because we tested all-possible locations and window-radius, we can also choose to look at secondary clusters. In this case, you should keep in mind that increasing the number of secondary cluster you select, increases the risk for Type I error.
+
 ```{r spatialEpi, eval = TRUE, echo = TRUE, nm = TRUE, class.output="code-out", warning=FALSE, message=FALSE}

 library("SpatialEpi")
@@ -370,7 +385,8 @@ kd_Wfever <- kulldorff(district_xy,

 ```

-All outputs are saved into an R object, here called `kd_Wfever`. Unfortunately, the package did not develop any summary and visualization of the results but we can explore the output object.
+The function plot the histogram of the distribution of log-likelihood ratio simulated under the null hypothesis that is estimated based on Monte Carlo simulations. The observed value of the most significant cluster identified from all possible scans is compared to the distribution to determine significance. All outputs are saved into an R object, here called `kd_Wfever`. Unfortunately, the package did not develop any summary and visualization of the results but we can explore the output object.
+

 ```{r kd_outputs, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=6, class.output="code-out", warning=FALSE, message=FALSE}
 names(kd_Wfever)
@@ -447,7 +463,7 @@ mf_layout(title = "Cluster using kulldorf scan statistic")

 In this example, the expected number of cases was defined using the population count but note that standardization over other variables as age could also be implemented with the `strata` parameter in the `kulldorff()` function.

-In addition, this cluster analysis was performed solely using the spatial scan but you should keep in mind that this method of cluster detection can be implemented for spatio-temporal data as well where the cluster definition is an abnormal number of cases in a delimited spatial area and during a given period of time. The windows of observation are therefore defined for a different center, radius and period of time. You should look at the function `scan_ep_poisson()` function in the package `scanstatistic` [@scanstatistics] for this analysis.
+In addition, this cluster analysis was performed solely using the spatial scan but you should keep in mind that this method of cluster detection can be implemented for spatio-temporal data as well where the cluster definition is an abnormal number of cases in a delimited spatial area and during a given period of time. The windows of observation are therefore defined for a different center, radius and time-period. You should look at the function `scan_ep_poisson()` function in the package `scanstatistic` [@scanstatistics] for this analysis.
 :::



--- a/public/07-basic_statistics.html
+++ b/public/07-basic_statistics.html
--- a/public/07-basic_statistics_files/figure-html/LocalMoransI-1.png
+++ b/public/07-basic_statistics_files/figure-html/LocalMoransI-1.png
--- a/public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png
+++ b/public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png
--- a/public/07-basic_statistics_files/figure-html/MoransI-1.png
+++ b/public/07-basic_statistics_files/figure-html/MoransI-1.png
--- a/public/07-basic_statistics_files/figure-html/incidence_visualization-1.png
+++ b/public/07-basic_statistics_files/figure-html/incidence_visualization-1.png
--- a/public/07-basic_statistics_files/figure-html/kd_test-1.png
+++ b/public/07-basic_statistics_files/figure-html/kd_test-1.png
--- a/public/search.json
+++ b/public/search.json