Text corrections

1ed3f92d · lea.douchet_ird.fr · a213b8e2 · 1ed3f92d · 1ed3f92d · 1ed3f92d
Commit 1ed3f92d authored 2 years ago by lea.douchet_ird.fr
--- a/07-basic_statistics.qmd
+++ b/07-basic_statistics.qmd
@@ -43,7 +43,7 @@ mf_map(x = cases, lwd = .5, col = "#990000", pch = 20, add = TRUE)

 ```

-In epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, we cannot precisely tell if this observation represents an event of interest (e.g., illness, death, ...) or a person at risk (e.g., a participant that may or may not experience the disease). If you can consider that the population at risk is uniformly distributed in small area (a city for example), this is likely not the case at a country scale. Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appear as great areal units for cases aggregation since they make available data on population count and structures. In this study, we will use the district as the areal unit of the study.
+In epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, we cannot precisely tell if this observation represents an event of interest (e.g., illness, death, ...) or a person at risk (e.g., a participant that may or may not experience the disease). If you can consider that the population at risk is uniformly distributed in small area (within a city for example), this is likely not the case at a country scale. Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appear as great areal units for cases aggregation since they make available data on population count and structures. In this study, we will use the district as the areal unit of the study.

 ```{r district_aggregate, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
 # Aggregate cases over districts
@@ -225,7 +225,7 @@ For each district $i$, the Local Moran's I statistics is:
 $$I_i = \frac{(Y_i-\bar{Y})}{\sum_{i=1}^N(Y_i-\bar{Y})^2}\sum_{j=1}^Nw_{ij}(Y_j - \bar{Y}) \text{ with }  I = \sum_{i=1}^NI_i/N$$
 :::

-The `localmoran()`function from the package `spdep` treats the variable of interest as if it was normally distributed. In some cases, this assumption could be reasonable for incidence rate, especially when the areal units of analysis have sufficiently large population count suggesting that the values have similar level of variances. Unfortunately, the local Moran’s test has not been implemented for Poisson distribution (population not large enough in some districts) in `spdep` package. However, Bivand **et al.** [@bivand2008applied] provided some code to manual perform the analysis using Poisson distribution and was further implemented in the course "[Spatial Epidemiology](https://mkram01.github.io/EPI563-SpatialEPI/index.html)”.
+The `localmoran()`function from the package `spdep` treats the variable of interest as if it was normally distributed. In some cases, this assumption could be reasonable for incidence rate, especially when the areal units of analysis have sufficiently large population count suggesting that the values have similar level of variances. Unfortunately, the local Moran’s test has not been implemented for Poisson distribution (population not large enough in some districts) in `spdep` package. However, Bivand **et al.** [@bivand2008applied] provided some code to manually perform the analysis using Poisson distribution and this code was further implemented in the course "[Spatial Epidemiology](https://mkram01.github.io/EPI563-SpatialEPI/index.html)”.


 ```{r LocalMoransI, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
@@ -270,7 +270,7 @@ diff <- ifelse(diff > 0, diff, 0)
 district$pval_lm <- punif((diff + 1) / (nsim + 1))
 ```

-For each district, we obtain a p-value based on the comparison of the observed value and permutations process that draw the distribution under the null hypothesis (i.e. the distribution of cases is spatially independent).
+Briefly, the process consist on 1) computing the I statistics for the observed data, 2) estimating the null distribution of the I statistics by performing random sampling into a poisson distribution and 3) comparing the observed I statistic with the null distribution to determine the probability to observe such value if the number of cases were spatially independent. For each district, we obtain a p-value based on the comparison of the observed value and the null distribution.

 A conventional way of plotting these results is to classify the districts into 5 classes based on local Moran's I output. The classification of cluster that are significantly autocorrelated to their neighbors is performed based on a comparison of the scaled incidence in the district compared to the scaled weighted averaged incidence of it neighboring districts (computed with `lag.listw()`):


--- a/public/07-basic_statistics.html
+++ b/public/07-basic_statistics.html
@@ -2,7 +2,7 @@
 <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head>

 <meta charset="utf-8">
-<meta name="generator" content="quarto-1.1.251">
+<meta name="generator" content="quarto-1.1.189">

 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">

@@ -237,7 +237,7 @@ div.csl-indent {
  <li><a href="#test-for-spatial-autocorrelation-morans-i-test" id="toc-test-for-spatial-autocorrelation-morans-i-test" class="nav-link" data-scroll-target="#test-for-spatial-autocorrelation-morans-i-test"><span class="toc-section-number">6.2.2</span>  Test for spatial autocorrelation (Moran’s I test)</a>
  <ul class="collapse">
  <li><a href="#the-global-morans-i-test" id="toc-the-global-morans-i-test" class="nav-link" data-scroll-target="#the-global-morans-i-test"><span class="toc-section-number">6.2.2.1</span>  The global Moran’s I test</a></li>
-  <li><a href="#morans-i-local-test" id="toc-morans-i-local-test" class="nav-link" data-scroll-target="#morans-i-local-test"><span class="toc-section-number">6.2.2.2</span>  Moran’s I local test</a></li>
+  <li><a href="#the-local-morans-i-lisa-test" id="toc-the-local-morans-i-lisa-test" class="nav-link" data-scroll-target="#the-local-morans-i-lisa-test"><span class="toc-section-number">6.2.2.2</span>  The Local Moran’s I LISA test</a></li>
  </ul></li>
  <li><a href="#spatial-scan-statistics" id="toc-spatial-scan-statistics" class="nav-link" data-scroll-target="#spatial-scan-statistics"><span class="toc-section-number">6.2.3</span>  Spatial scan statistics</a></li>
  </ul></li>
@@ -263,7 +263,7 @@ div.csl-indent {

 </header>

-<p>This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data. If you wish to go further into spatial statistics applied to epidemiology and their limitations you can consult the tutorial “<a href="https://mkram01.github.io/EPI563-SpatialEPI/index.html">Spatial Epidemiology</a>” from M. Kramer from which the statistical analysis of this section was adapted. We will use</p>
+<p>This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data. If you wish to go further into spatial statistics applied to epidemiology and their limitations you can consult the tutorial “<a href="https://mkram01.github.io/EPI563-SpatialEPI/index.html">Spatial Epidemiology</a>” from M. Kramer from which the statistical analysis of this section was adapted.</p>
 <section id="import-and-visualize-epidemiological-data" class="level2" data-number="6.1">
 <h2 data-number="6.1" class="anchored" data-anchor-id="import-and-visualize-epidemiological-data"><span class="header-section-number">6.1</span> Import and visualize epidemiological data</h2>
 <p>In this section, we load data that reference the cases of an imaginary disease, the W fever, throughout Cambodia. Each point corresponds to the geo-localization of a case.</p>
@@ -310,7 +310,7 @@ Projected CRS: WGS 84 / UTM zone 48N
 <p><img src="07-basic_statistics_files/figure-html/cases_visualization-1.png" class="img-fluid" width="768"></p>
 </div>
 </div>
-<p>In epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, we cannot precisely tell if this observation represents an event of interest (e.g., illness, death, …) or a person at risk (e.g., a participant that may or may not experience the disease). Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appear as great areal units for cases aggregation since they make available data on population count and structures. In this study, we will use the district as the areal unit of the study.</p>
+<p>In epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, we cannot precisely tell if this observation represents an event of interest (e.g., illness, death, …) or a person at risk (e.g., a participant that may or may not experience the disease). If you can consider that the population at risk is uniformly distributed in small area (within a city for example), this is likely not the case at a country scale. Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appear as great areal units for cases aggregation since they make available data on population count and structures. In this study, we will use the district as the areal unit of the study.</p>
 <div class="cell" data-nm="true">
 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Aggregate cases over districts</span></span>
 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>district<span class="sc">$</span>cases <span class="ot">&lt;-</span> <span class="fu">lengths</span>(<span class="fu">st_intersects</span>(district, cases))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -385,7 +385,7 @@ To go further …
 </div>
 <div class="callout-body-container callout-body">
 <p>In this example, we standardized the cases distribution for population count. This simple standardization assumes that the risk of contracting the disease is similar for each person. However, assumption does not hold for all diseases and for all observed events since confounding effects can create nuisance into the interpretations (e.g., the number of childhood illness and death outcomes in a district are usually related to the age pyramid) and you should keep in mind that other standardization can be performed based on variables known to have an effect but that you don’t want to analyze (e.g., sex ratio, occupations, age pyramid).</p>
-<p>In addition, one can wonder what does an <span class="math inline">\(SIR \~ 1\)</span> means, i.e., what is the threshold to decide whether the SIR is greater, lower or equivalent to 1. The significant of the SIR can be tested globally (to determine whether or not the incidence is homogeneously distributed) and locally in each district (to determine Which district have an SIR different than 1). We won’t perform these analyses in this tutorial but you can look at the function <code>?achisq.test()</code> (from <code>Dcluster</code> package <span class="citation" data-cites="DCluster">(<a href="references.html#ref-DCluster" role="doc-biblioref">Gómez-Rubio et al. 2015</a>)</span>) and <code>?probmap()</code> (from <code>spdep</code> package <span class="citation" data-cites="spdep">(<a href="references.html#ref-spdep" role="doc-biblioref">R. Bivand et al. 2015</a>)</span>) to compute these statistics.</p>
+<p>In addition, one can wonder what does an SIR ~ 1 means, i.e., what is the threshold to decide whether the SIR is greater, lower or equivalent to 1. The significant of the SIR can be tested globally (to determine whether or not the incidence is homogeneously distributed) and locally in each district (to determine Which district have an SIR different than 1). We won’t perform these analyses in this tutorial but you can look at the function <code>?achisq.test()</code> (from <code>Dcluster</code> package <span class="citation" data-cites="DCluster">(<a href="references.html#ref-DCluster" role="doc-biblioref">Gómez-Rubio et al. 2015</a>)</span>) and <code>?probmap()</code> (from <code>spdep</code> package <span class="citation" data-cites="spdep">(<a href="references.html#ref-spdep" role="doc-biblioref">R. Bivand et al. 2015</a>)</span>) to compute these statistics.</p>
 </div>
 </div>
 </section>
@@ -410,7 +410,7 @@ Statistic tests and distributions
 </div>
 </div>
 <div class="callout-body-container callout-body">
-<p>In statistics, problems are usually expressed by defining two hypotheses: the null hypothesis (H0), i.e., an <em>a priori</em> hypothesis of the studied phenomenon (e.g., the situation is a random) and the alternative hypothesis (HA), e.g., the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis.</p>
+<p>In statistics, problems are usually expressed by defining two hypotheses: the null hypothesis (H0), i.e., an <em>a priori</em> hypothesis of the studied phenomenon (e.g., the situation is a random) and the alternative hypothesis (H1), e.g., the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis.</p>
 <p>In mathematics, a probability distribution is a mathematical expression that represents what we would expect due to random chance. The choice of the probability distribution relies on the type of data you use (continuous, count, binary). In general, three distribution a used while studying disease rates, the Binomial, the Poisson and the Poisson-gamma mixture (also known as negative binomial) distributions.</p>
 <p>Many the statistical tests assume by default that data are normally distributed. It implies that your variable is continuous and that all data could easily be represented by two parameters, the mean and the variance, i.e., each value have the same level of certainty. If many measure can be assessed under the normality assumption, this is usually not the case in epidemiology with strictly positives rates and count values that 1) does not fit the normal distribution and 2) does not provide with the same degree of certainty since variances likely differ between district due to different population size, i.e., some district have very sparse data (with high variance) while other have adequate data (with lower variance).</p>
 <div class="cell" data-nm="true">
@@ -494,9 +494,9 @@ Moran’s I test
 </div>
 <p>The Moran’s statistics is here <span class="math inline">\(I =\)</span> 0.16. When comparing its value to the H0 distribution (built under 499 simulations), the probability of observing such a I value under the null hypothesis, i.e.&nbsp;the distribution of cases is spatially independent, is <span class="math inline">\(p_{value} =\)</span> 0.01. We therefore reject H0 with error risk of <span class="math inline">\(\alpha = 5\%\)</span>. The distribution of cases is therefore autocorrelated across districts in Cambodia.</p>
 </section>
-<section id="morans-i-local-test" class="level4" data-number="6.2.2.2">
-<h4 data-number="6.2.2.2" class="anchored" data-anchor-id="morans-i-local-test"><span class="header-section-number">6.2.2.2</span> Moran’s I local test</h4>
-<p>The global Moran’s test provides us a global statistical value informing whether autocorrelation occurs over the territory but does not inform on where does these correlations occurs, i.e., what is the locations of the clusters. To identify such cluster, we can decompose the Moran’s I statistic to extract local information of the level of correlation of each district and its neighbors. This is called the Local Moran’s I LISA statistic. Because the Local Moran’s I LISA statistic test each district for autocorrelation independently, concern is raised about multiple testing limitations that increase the Type I error (<span class="math inline">\(\alpha\)</span>) of the statistical tests. The use of local test should therefore be study in light of explore and describes clusters once the global test detected autocorrelation.</p>
+<section id="the-local-morans-i-lisa-test" class="level4" data-number="6.2.2.2">
+<h4 data-number="6.2.2.2" class="anchored" data-anchor-id="the-local-morans-i-lisa-test"><span class="header-section-number">6.2.2.2</span> The Local Moran’s I LISA test</h4>
+<p>The global Moran’s test provides us a global statistical value informing whether autocorrelation occurs over the territory but does not inform on where does these correlations occurs, i.e., what is the locations of the clusters. To identify such cluster, we can decompose the Moran’s I statistic to extract local information of the level of correlation of each district and its neighbors. This is called the Local Moran’s I LISA statistic. Because the Local Moran’s I LISA statistic test each district for autocorrelation independently, concern is raised about multiple testing limitations that increase the Type I error (<span class="math inline">\(\alpha\)</span>) of the statistical tests. The use of local test should therefore be study in light of explore and describes clusters once the global test has detected autocorrelation.</p>
 <div class="callout-note callout callout-style-default callout-captioned">
 <div class="callout-header d-flex align-content-center">
 <div class="callout-icon-container">
@@ -511,7 +511,7 @@ Statistical test
 <p><span class="math display">\[I_i = \frac{(Y_i-\bar{Y})}{\sum_{i=1}^N(Y_i-\bar{Y})^2}\sum_{j=1}^Nw_{ij}(Y_j - \bar{Y}) \text{ with }  I = \sum_{i=1}^NI_i/N\]</span></p>
 </div>
 </div>
-<p>The <code>localmoran()</code>function from the package <code>spdep</code> treats the variable of interest as if it was normally distributed. In some cases, this assumption could be reasonable for incidence rate, especially when the areal units of analysis have sufficiently large population count suggesting that the values have similar level of variances. Unfortunately, the local Moran’s test has not been implemented for Poisson distribution (population not large enough in some districts) in <code>spdep</code> package. However, Bivand <strong>et al.</strong> <span class="citation" data-cites="bivand2008applied">(<a href="references.html#ref-bivand2008applied" role="doc-biblioref">R. S. Bivand et al. 2008</a>)</span> provided some code to manual perform the analysis using Poisson distribution and was further implemented in the course “<a href="https://mkram01.github.io/EPI563-SpatialEPI/index.html">Spatial Epidemiology</a>”.</p>
+<p>The <code>localmoran()</code>function from the package <code>spdep</code> treats the variable of interest as if it was normally distributed. In some cases, this assumption could be reasonable for incidence rate, especially when the areal units of analysis have sufficiently large population count suggesting that the values have similar level of variances. Unfortunately, the local Moran’s test has not been implemented for Poisson distribution (population not large enough in some districts) in <code>spdep</code> package. However, Bivand <strong>et al.</strong> <span class="citation" data-cites="bivand2008applied">(<a href="references.html#ref-bivand2008applied" role="doc-biblioref">R. S. Bivand et al. 2008</a>)</span> provided some code to manually perform the analysis using Poisson distribution and this code was further implemented in the course “<a href="https://mkram01.github.io/EPI563-SpatialEPI/index.html">Spatial Epidemiology</a>”.</p>
 <div class="cell" data-nm="true">
 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Step 1 - Create the standardized deviation of observed from expected</span></span>
 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>sd_lm <span class="ot">&lt;-</span> (district<span class="sc">$</span>cases <span class="sc">-</span> district<span class="sc">$</span>expected) <span class="sc">/</span> <span class="fu">sqrt</span>(district<span class="sc">$</span>expected)</span>
@@ -552,7 +552,7 @@ Statistical test
 <span id="cb12-37"><a href="#cb12-37" aria-hidden="true" tabindex="-1"></a><span class="co"># given the null distribution generate from simulations</span></span>
 <span id="cb12-38"><a href="#cb12-38" aria-hidden="true" tabindex="-1"></a>district<span class="sc">$</span>pval_lm <span class="ot">&lt;-</span> <span class="fu">punif</span>((diff <span class="sc">+</span> <span class="dv">1</span>) <span class="sc">/</span> (nsim <span class="sc">+</span> <span class="dv">1</span>))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 </div>
-<p>For each district, we obtain a p-value based on permutations process</p>
+<p>Briefly, the process consist on 1) computing the I statistics for the observed data, 2) estimating the null distribution of the I statistics by performing random sampling into a poisson distribution and 3) comparing the observed I statistic with the null distribution to determine the probability to observe such value if the number of cases were spatially independent. For each district, we obtain a p-value based on the comparison of the observed value and the null distribution.</p>
 <p>A conventional way of plotting these results is to classify the districts into 5 classes based on local Moran’s I output. The classification of cluster that are significantly autocorrelated to their neighbors is performed based on a comparison of the scaled incidence in the district compared to the scaled weighted averaged incidence of it neighboring districts (computed with <code>lag.listw()</code>):</p>
 <ul>
 <li><p>Districts that have higher-than-average rates in both index regions and their neighbors and showing statistically significant positive values for the local <span class="math inline">\(I_i\)</span> statistic are defined as <strong>High-High</strong> (hotspot of the disease)</p></li>
@@ -706,7 +706,7 @@ Kulldorf test
 <span id="cb30-7"><a href="#cb30-7" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(df_secondary_clusters)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
 <pre class="code-out"><code>       SMR number.of.cases expected.cases p.value
-1 3.767698              16       4.246625    0.01</code></pre>
+1 3.767698              16       4.246625   0.014</code></pre>
 </div>
 </div>
 <p>We only have one secondary cluster composed of one district.</p>
@@ -911,4 +911,4 @@ window.document.addEventListener("DOMContentLoaded", function (event) {


 <script src="site_libs/quarto-html/zenscroll-min.js"></script>
-</body></html>
+</body></html>
\ No newline at end of file
--- a/public/07-basic_statistics_files/figure-html/LocalMoransI-1.png
+++ b/public/07-basic_statistics_files/figure-html/LocalMoransI-1.png
--- a/public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png
+++ b/public/07-basic_statistics_files/figure-html/LocalMoransI_plt-1.png
--- a/public/07-basic_statistics_files/figure-html/MoransI-1.png
+++ b/public/07-basic_statistics_files/figure-html/MoransI-1.png
--- a/public/07-basic_statistics_files/figure-html/cases_visualization-1.png
+++ b/public/07-basic_statistics_files/figure-html/cases_visualization-1.png
--- a/public/07-basic_statistics_files/figure-html/distribution-1.png
+++ b/public/07-basic_statistics_files/figure-html/distribution-1.png
--- a/public/07-basic_statistics_files/figure-html/inc_visualization-1.png
+++ b/public/07-basic_statistics_files/figure-html/inc_visualization-1.png
--- a/public/07-basic_statistics_files/figure-html/incidence_visualization-1.png
+++ b/public/07-basic_statistics_files/figure-html/incidence_visualization-1.png
--- a/public/07-basic_statistics_files/figure-html/kd_test-1.png
+++ b/public/07-basic_statistics_files/figure-html/kd_test-1.png
--- a/public/07-basic_statistics_files/figure-html/plt_clusters-1.png
+++ b/public/07-basic_statistics_files/figure-html/plt_clusters-1.png
--- a/public/search.json
+++ b/public/search.json
--- a/public/site_libs/bootstrap/bootstrap.min.css
+++ b/public/site_libs/bootstrap/bootstrap.min.css