diff --git a/05-mapping_with_r_files/figure-html/choro-1.png b/05-mapping_with_r_files/figure-html/choro-1.png new file mode 100644 index 0000000000000000000000000000000000000000..d6f9589633fe8e688e659e7030888bf9dc3cfc29 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/choro-1.png differ diff --git a/05-mapping_with_r_files/figure-html/choro-incidence-1.png b/05-mapping_with_r_files/figure-html/choro-incidence-1.png new file mode 100644 index 0000000000000000000000000000000000000000..b6d42fe94baa2b3848f3e77a9542d17fd96e1aa8 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/choro-incidence-1.png differ diff --git a/05-mapping_with_r_files/figure-html/choroprop-1.png b/05-mapping_with_r_files/figure-html/choroprop-1.png new file mode 100644 index 0000000000000000000000000000000000000000..093057fe056f115218b3cc404470540aa4cd94e1 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/choroprop-1.png differ diff --git a/05-mapping_with_r_files/figure-html/choropt-1.png b/05-mapping_with_r_files/figure-html/choropt-1.png new file mode 100644 index 0000000000000000000000000000000000000000..967acd51dd40c54ce44828595ff150ae84c3d127 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/choropt-1.png differ diff --git a/05-mapping_with_r_files/figure-html/credit-1.png b/05-mapping_with_r_files/figure-html/credit-1.png new file mode 100644 index 0000000000000000000000000000000000000000..6a00d3178a76b6909417f02ed07153573addc39f Binary files /dev/null and b/05-mapping_with_r_files/figure-html/credit-1.png differ diff --git a/05-mapping_with_r_files/figure-html/discr3-1.png b/05-mapping_with_r_files/figure-html/discr3-1.png new file mode 100644 index 0000000000000000000000000000000000000000..d52857eb0e8bd94a2822a3f6946c8742ba995c25 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/discr3-1.png differ diff --git a/05-mapping_with_r_files/figure-html/inset-1.png b/05-mapping_with_r_files/figure-html/inset-1.png new file mode 100644 index 0000000000000000000000000000000000000000..99d5781343635421989a8e0a73c75c441e337272 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/inset-1.png differ diff --git a/05-mapping_with_r_files/figure-html/labs-1.png b/05-mapping_with_r_files/figure-html/labs-1.png new file mode 100644 index 0000000000000000000000000000000000000000..1b3d34779d6bac3e70e47c7ca9180294391e9946 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/labs-1.png differ diff --git a/05-mapping_with_r_files/figure-html/layout1-1.png b/05-mapping_with_r_files/figure-html/layout1-1.png new file mode 100644 index 0000000000000000000000000000000000000000..c135719e3a0dc0c9258b06f2cdcea01ae631f4c4 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/layout1-1.png differ diff --git a/05-mapping_with_r_files/figure-html/linemap-1.png b/05-mapping_with_r_files/figure-html/linemap-1.png new file mode 100644 index 0000000000000000000000000000000000000000..c5e23ee71a3555f236081c5f0083fcafe1c958e7 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/linemap-1.png differ diff --git a/05-mapping_with_r_files/figure-html/logo-1.png b/05-mapping_with_r_files/figure-html/logo-1.png new file mode 100644 index 0000000000000000000000000000000000000000..37113f21ae9671a277dcb3be94cc898d6f43e38a Binary files /dev/null and b/05-mapping_with_r_files/figure-html/logo-1.png differ diff --git a/05-mapping_with_r_files/figure-html/mf_base-1.png b/05-mapping_with_r_files/figure-html/mf_base-1.png new file mode 100644 index 0000000000000000000000000000000000000000..fef6eab7773bad072e802d4e43728bed0893db4b Binary files /dev/null and b/05-mapping_with_r_files/figure-html/mf_base-1.png differ diff --git a/05-mapping_with_r_files/figure-html/mfrow0-1.png b/05-mapping_with_r_files/figure-html/mfrow0-1.png new file mode 100644 index 0000000000000000000000000000000000000000..b1b55e2e7b3db4767b134e5ccb5662877ffff40f Binary files /dev/null and b/05-mapping_with_r_files/figure-html/mfrow0-1.png differ diff --git a/05-mapping_with_r_files/figure-html/north-1.png b/05-mapping_with_r_files/figure-html/north-1.png new file mode 100644 index 0000000000000000000000000000000000000000..933e2cc6e4b3c29ac341da32a0275432e6ba04db Binary files /dev/null and b/05-mapping_with_r_files/figure-html/north-1.png differ diff --git a/05-mapping_with_r_files/figure-html/pal1-1.png b/05-mapping_with_r_files/figure-html/pal1-1.png new file mode 100644 index 0000000000000000000000000000000000000000..aa19e4835543cfb43e12d1f4b4d83e383bbeae16 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/pal1-1.png differ diff --git a/05-mapping_with_r_files/figure-html/pal2-1.png b/05-mapping_with_r_files/figure-html/pal2-1.png new file mode 100644 index 0000000000000000000000000000000000000000..bb2f3b8f14c2d3815efa5a9eb764ce848066e058 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/pal2-1.png differ diff --git a/05-mapping_with_r_files/figure-html/proportional_symbols-1.png b/05-mapping_with_r_files/figure-html/proportional_symbols-1.png new file mode 100644 index 0000000000000000000000000000000000000000..09112ef709a7ae0fac9c471a7aac14c2591cfa25 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/proportional_symbols-1.png differ diff --git a/05-mapping_with_r_files/figure-html/proportional_symbols_comp-1.png b/05-mapping_with_r_files/figure-html/proportional_symbols_comp-1.png new file mode 100644 index 0000000000000000000000000000000000000000..eae7df90bb1067fa8c210df048c56346b34b6a17 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/proportional_symbols_comp-1.png differ diff --git a/05-mapping_with_r_files/figure-html/scale-1.png b/05-mapping_with_r_files/figure-html/scale-1.png new file mode 100644 index 0000000000000000000000000000000000000000..f6900d8988ac36ccd8c944693e05475e54ed2d8c Binary files /dev/null and b/05-mapping_with_r_files/figure-html/scale-1.png differ diff --git a/05-mapping_with_r_files/figure-html/shadow-1.png b/05-mapping_with_r_files/figure-html/shadow-1.png new file mode 100644 index 0000000000000000000000000000000000000000..3cad917713635398bcb9c8105ba148344e109754 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/shadow-1.png differ diff --git a/05-mapping_with_r_files/figure-html/them1-1.png b/05-mapping_with_r_files/figure-html/them1-1.png new file mode 100644 index 0000000000000000000000000000000000000000..a1e327b1e9a3dc0dc1742c6ec179a3100b5b2ee3 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/them1-1.png differ diff --git a/05-mapping_with_r_files/figure-html/theme2-1.png b/05-mapping_with_r_files/figure-html/theme2-1.png new file mode 100644 index 0000000000000000000000000000000000000000..9c8cf80fbc7aab67c1e814fefc941938d6731941 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/theme2-1.png differ diff --git a/05-mapping_with_r_files/figure-html/typo_order-1.png b/05-mapping_with_r_files/figure-html/typo_order-1.png new file mode 100644 index 0000000000000000000000000000000000000000..73a28523813cc15bae43237b6e9fd82aad053f01 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/typo_order-1.png differ diff --git a/05-mapping_with_r_files/figure-html/typo_point-1.png b/05-mapping_with_r_files/figure-html/typo_point-1.png new file mode 100644 index 0000000000000000000000000000000000000000..95cbf4c3641540cb765737627bf919bc9db3c61f Binary files /dev/null and b/05-mapping_with_r_files/figure-html/typo_point-1.png differ diff --git a/05-mapping_with_r_files/figure-html/typo_simple-1.png b/05-mapping_with_r_files/figure-html/typo_simple-1.png new file mode 100644 index 0000000000000000000000000000000000000000..2c1a4379c1f661e0762683b840a8c787630c641a Binary files /dev/null and b/05-mapping_with_r_files/figure-html/typo_simple-1.png differ diff --git a/05-mapping_with_r_files/figure-html/typoprop-1.png b/05-mapping_with_r_files/figure-html/typoprop-1.png new file mode 100644 index 0000000000000000000000000000000000000000..70b32aea754339e91f8e55f0c51315359291072f Binary files /dev/null and b/05-mapping_with_r_files/figure-html/typoprop-1.png differ diff --git a/05-mapping_with_r_files/figure-html/unnamed-chunk-10-1.png b/05-mapping_with_r_files/figure-html/unnamed-chunk-10-1.png new file mode 100644 index 0000000000000000000000000000000000000000..036296ec662c70d6f2ddf7d98fd1cc8140ba5b01 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/unnamed-chunk-10-1.png differ diff --git a/05-mapping_with_r_files/figure-html/unnamed-chunk-12-1.png b/05-mapping_with_r_files/figure-html/unnamed-chunk-12-1.png new file mode 100644 index 0000000000000000000000000000000000000000..4ca682737a9dd129a3dca15e6a25af937cb5821b Binary files /dev/null and b/05-mapping_with_r_files/figure-html/unnamed-chunk-12-1.png differ diff --git a/05-mapping_with_r_files/figure-html/unnamed-chunk-14-1.png b/05-mapping_with_r_files/figure-html/unnamed-chunk-14-1.png new file mode 100644 index 0000000000000000000000000000000000000000..376a90e948eba1a067ab4bf819da28546c5b9dae Binary files /dev/null and b/05-mapping_with_r_files/figure-html/unnamed-chunk-14-1.png differ diff --git a/05-mapping_with_r_files/figure-html/unnamed-chunk-16-1.png b/05-mapping_with_r_files/figure-html/unnamed-chunk-16-1.png new file mode 100644 index 0000000000000000000000000000000000000000..b8acb38d5ba27ef5a9e2966669448551425ad720 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/unnamed-chunk-16-1.png differ diff --git a/05-mapping_with_r_files/figure-html/unnamed-chunk-18-1.png b/05-mapping_with_r_files/figure-html/unnamed-chunk-18-1.png new file mode 100644 index 0000000000000000000000000000000000000000..e5f1c02e616b2c23a46d307f6288d834dc18ed9a Binary files /dev/null and b/05-mapping_with_r_files/figure-html/unnamed-chunk-18-1.png differ diff --git a/05-mapping_with_r_files/figure-html/unnamed-chunk-22-1.png b/05-mapping_with_r_files/figure-html/unnamed-chunk-22-1.png new file mode 100644 index 0000000000000000000000000000000000000000..63650f92d31a64ff020ee1382b7f10e9affc4c54 Binary files /dev/null and b/05-mapping_with_r_files/figure-html/unnamed-chunk-22-1.png differ diff --git a/05-mapping_with_r_files/figure-html/unnamed-chunk-22-2.png b/05-mapping_with_r_files/figure-html/unnamed-chunk-22-2.png new file mode 100644 index 0000000000000000000000000000000000000000..987ed721d941aff7590a1a40bf117304695d798f Binary files /dev/null and b/05-mapping_with_r_files/figure-html/unnamed-chunk-22-2.png differ diff --git a/05-mapping_with_r_files/figure-html/unnamed-chunk-4-1.png b/05-mapping_with_r_files/figure-html/unnamed-chunk-4-1.png new file mode 100644 index 0000000000000000000000000000000000000000..fafa28b0ce08eb4798f4c6245289c63a401f6bbf Binary files /dev/null and b/05-mapping_with_r_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/05-mapping_with_r_files/figure-html/unnamed-chunk-8-1.png b/05-mapping_with_r_files/figure-html/unnamed-chunk-8-1.png new file mode 100644 index 0000000000000000000000000000000000000000..2c1261dbff8f7f185af888892be345ff8ba7e4ba Binary files /dev/null and b/05-mapping_with_r_files/figure-html/unnamed-chunk-8-1.png differ diff --git a/07-basic_statistics.qmd b/07-basic_statistics.qmd index 176a1fbd82987bfa53b1fff5e4b8ac80b0b14b5f..a515b24f227e430c6a930023720bfbad9cc4a2fa 100644 --- a/07-basic_statistics.qmd +++ b/07-basic_statistics.qmd @@ -4,11 +4,11 @@ bibliography: references.bib # Basic statistics for spatial analysis -This section aims at providing some basic statistical tools to study the spatial distribution of the cases. +This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data. ## Import and visualize epidemiological data -In this section, we load data that reference the cases of an imaginary disease throughout Cambodia. +In this section, we load data that reference the cases of an imaginary disease throughout Cambodia. Each point correspond to the geolocalisation of a case. ```{r load_cases, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE} library(sf) @@ -24,10 +24,6 @@ district = st_read("data_cambodia/cambodia.gpkg", layer = "district", quiet = TR cases = st_read("data_cambodia/cambodia.gpkg", layer = "cases", quiet = TRUE) cases = subset(cases, Disease == "W fever") -# Aggregate cases over districts -district$cases <- lengths(st_intersects(district, cases)) - - ``` The first step of any statistical analysis always consists on visualizing the data to check they were correctly loaded and to observe general pattern of the cases. @@ -46,11 +42,83 @@ mf_map(x = cases, lwd = .5, col = "#990000", pch = 20, add = TRUE) ``` -## Basics statistics +In epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, its not clear if this observation represents an event of interest (e.g. illness, death, ...) or a person at risk (e.g. a participant that may or may not experience the disease). Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appears as great areal units for cases aggreagation since they make available data on population count and structures. In this study, we will use district as the areal unit of the study. + +```{r district_aggregate, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE} +# Aggregate cases over districts +district$cases <- lengths(st_intersects(district, cases)) + +``` + +The incidence ($\frac{cases}{population}$) is commonly use to represent cases distribution related to population density but other indicators exists. As example, the standardized incidence ratios (SIRs) represents the deviation of observed and expected number of cases and is expressed as $SIR = \frac{Y_i}{E_i}$ with $Y_i$, the observed number of cases and $E_i$, the expected number of cases. In this study, we computed the expected number of cases in each district by assuming infections are homogeneously distributed across Cambodia, i.e. the incidence is the same in each district. + +```{r indicators, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, fig.height=4, class.output="code-out", warning=FALSE, message=FALSE} + +# Compute incidence in each district (per 100 000 population) +district$incidence = district$cases/district$T_POP * 100000 + +# Compute the global risk +rate = sum(district$cases)/sum(district$T_POP) + +# Compute expected number of cases +district$expected = district$T_POP * rate + +# Compute SIR +district$SIR = district$cases / district$expected +``` + +```{r inc_visualization, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, fig.height=4, class.output="code-out", warning=FALSE, message=FALSE} +par(mfrow = c(1, 3)) +# Plot number of cases using proportional symbol +mf_map(x = district) +mf_map( + x = district, + var = "cases", + val_max = 50, + type = "prop", + col = "#990000", + leg_title = "Cases") +mf_layout(title = "Number of cases of W Fever") + +# Plot incidence +mf_map(x = district, + var = "incidence", + type = "choro", + pal = "Reds 3", + leg_title = "Incidence \n(per 100 000)") +mf_layout(title = "Incidence of W Fever") + +# Plot SIRs + +# create breaks and associated color palette +break_SIR = c(0, exp(mf_get_breaks(log(district$SIR), nbreaks = 8, breaks = "pretty"))) +col_pal = c("#273871", "#3267AD", "#6496C8", "#9BBFDD", "#CDE3F0", "#FFCEBC", "#FF967E", "#F64D41", "#B90E36") + +mf_map(x = district, + var = "SIR", + type = "choro", + breaks = break_SIR, + pal = col_pal, + cex = 2, + leg_title = "SIR") +mf_layout(title = "Standardized Incidence Ratio of W Fever") +``` + +These maps illustrates the spatial heterogenity of the cases. The incidence shows how the disease vary from one district to another while the SIR highlight districts that have : + +- higher risk than average (SIR \> 1) when standardized for population -The problem is usually expressed by defining two hypothesis : the null hypothesis (H0), i.e. an a priori hypothesis of the studied phenomenon (e.g. the situation is a random) and the alternative hypothesis (HA), e.g. the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis. +- lower risk than average (SIR \< 1) when standardized for population -The statistical analysis performed relies on the type of data. +- average risk (SIR \~ 1) when standardized for population + +In this example, we standardized the cases distribution for population count. This simple standardization assume that the risk of contracting the disease is similar for each person. Howerver, this case does not apply for all diseases and for all observed events (e.g. the number of childhood illness and death outcomes in a district are usually related to the age pyramid) and you should keep in mind that other standardization can be performed based on variables known to have an effect but that you don't want to analyze (e.g. sex ratio, occupations, age pyramid). + +## Cluster analysis + +Since this W fever seems to have a heterogenous distribution across Cambodia, it would be interesting to study where excess of cases appears, i.e. to identify clusters of the disease. + +In statistics, problems are usually expressed by defining two hypothesis : the null hypothesis (H0), i.e. an a priori hypothesis of the studied phenomenon (e.g. the situation is a random) and the alternative hypothesis (HA), e.g. the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis. ### Spatial autocorrelation (Moran's I test) @@ -58,13 +126,10 @@ A popular test for spatial autocorrelation is the Moran's test. Moran's I test tells us whether nearby units tend to exhibit similar rates. It ranges from -1 to +1, whith a value of -1 denoting that units with low rates are located near other units with high rates, while a Moran's I value of +1 indicates a concentration of spatial units exhibiting similar rates. -We will compute the Moran's statistics using `spdep` and `Dcluster` packages. This package provides a collection of functions to analyze spatial correlations of polygons and works with sp objects. `Dcluster` package provides a set of functions for the detection of spatial clusters of disease using count data. +We will compute the Moran's statistics using `spdep` and `Dcluster` packages. `spdep` package provides a collection of functions to analyze spatial correlations of polygons and works with sp objects. `Dcluster` package provides a set of functions for the detection of spatial clusters of disease using count data. ```{r MoransI, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE} -# Compte incidence in each district (per 100 000 population) -district$incidence <- district$cases/district$T_POP * 100000 - # Plot the incidence histogramm hist(log(district$incidence)) diff --git a/img/dist_filter_1.png b/img/dist_filter_1.png index e11d04497a5dd79cb75c5f49140e6855f14397a7..72efd1a6323904b0bad518a240ff8d58315d1ac5 100644 Binary files a/img/dist_filter_1.png and b/img/dist_filter_1.png differ diff --git a/public/07-basic_statistics.html b/public/07-basic_statistics.html index 57e51874ddce1a37dd6aa36fbb56196df35d4921..310c59914b74b3300399429f23a412135b7a1fc0 100644 --- a/public/07-basic_statistics.html +++ b/public/07-basic_statistics.html @@ -124,6 +124,7 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni } }</script> + <script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js" type="text/javascript"></script> <link rel="stylesheet" href="styles.css"> </head> @@ -214,16 +215,16 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni <h2 id="toc-title">Table of contents</h2> <ul> - <li><a href="#load-and-visualize-data" id="toc-load-and-visualize-data" class="nav-link active" data-scroll-target="#load-and-visualize-data"><span class="toc-section-number">7.1</span> Load and visualize data</a></li> + <li><a href="#import-and-visualize-epidemiological-data" id="toc-import-and-visualize-epidemiological-data" class="nav-link active" data-scroll-target="#import-and-visualize-epidemiological-data"><span class="toc-section-number">7.1</span> Import and visualize epidemiological data</a></li> <li><a href="#basics-statistics" id="toc-basics-statistics" class="nav-link" data-scroll-target="#basics-statistics"><span class="toc-section-number">7.2</span> Basics statistics</a> <ul class="collapse"> - <li><a href="#autocorrelation" id="toc-autocorrelation" class="nav-link" data-scroll-target="#autocorrelation"><span class="toc-section-number">7.2.1</span> Autocorrelation</a></li> - <li><a href="#morans-test" id="toc-morans-test" class="nav-link" data-scroll-target="#morans-test"><span class="toc-section-number">7.2.2</span> Moran’s test</a></li> + <li><a href="#spatial-autocorrelation-morans-i-test" id="toc-spatial-autocorrelation-morans-i-test" class="nav-link" data-scroll-target="#spatial-autocorrelation-morans-i-test"><span class="toc-section-number">7.2.1</span> Spatial autocorrelation (Moran’s I test)</a></li> </ul></li> <li><a href="#cluster-analysis" id="toc-cluster-analysis" class="nav-link" data-scroll-target="#cluster-analysis"><span class="toc-section-number">7.3</span> Cluster analysis</a> <ul class="collapse"> <li><a href="#population-based-clusters-kulldorf-statistic" id="toc-population-based-clusters-kulldorf-statistic" class="nav-link" data-scroll-target="#population-based-clusters-kulldorf-statistic"><span class="toc-section-number">7.3.1</span> Population-based clusters (kulldorf statistic)</a></li> <li><a href="#expectation-based-cluster" id="toc-expectation-based-cluster" class="nav-link" data-scroll-target="#expectation-based-cluster"><span class="toc-section-number">7.3.2</span> Expectation-based cluster</a></li> + <li><a href="#to-go-further" id="toc-to-go-further" class="nav-link" data-scroll-target="#to-go-further"><span class="toc-section-number">7.3.3</span> To go further …</a></li> </ul></li> </ul> </nav> @@ -247,9 +248,10 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni </header> -<section id="load-and-visualize-data" class="level2" data-number="7.1"> -<h2 data-number="7.1" class="anchored" data-anchor-id="load-and-visualize-data"><span class="header-section-number">7.1</span> Load and visualize data</h2> -<p>In this section, we load data that reference the cases of an imaginary disease throughout Cambodia.</p> +<p>This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data.</p> +<section id="import-and-visualize-epidemiological-data" class="level2" data-number="7.1"> +<h2 data-number="7.1" class="anchored" data-anchor-id="import-and-visualize-epidemiological-data"><span class="header-section-number">7.1</span> Import and visualize epidemiological data</h2> +<p>In this section, we load data that reference the cases of an imaginary disease throughout Cambodia. Each point correspond to the geolocalisation of a case.</p> <div class="cell" data-nm="true"> <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(sf)</span> <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span> @@ -261,7 +263,8 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni <span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>district <span class="ot">=</span> <span class="fu">st_read</span>(<span class="st">"data_cambodia/cambodia.gpkg"</span>, <span class="at">layer =</span> <span class="st">"district"</span>, <span class="at">quiet =</span> <span class="cn">TRUE</span>)</span> <span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a></span> <span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Import locations of cases from an imaginary disease</span></span> -<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>cases <span class="ot">=</span> <span class="fu">st_read</span>(<span class="st">"data_cambodia/cambodia.gpkg"</span>, <span class="at">layer =</span> <span class="st">"cases"</span>, <span class="at">quiet =</span> <span class="cn">TRUE</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> +<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>cases <span class="ot">=</span> <span class="fu">st_read</span>(<span class="st">"data_cambodia/cambodia.gpkg"</span>, <span class="at">layer =</span> <span class="st">"cases"</span>, <span class="at">quiet =</span> <span class="cn">TRUE</span>)</span> +<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a>cases <span class="ot">=</span> <span class="fu">subset</span>(cases, Disease <span class="sc">==</span> <span class="st">"W fever"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> </div> <p>The first step of any statistical analysis always consists on visualizing the data to check they were correctly loaded and to observe general pattern of the cases.</p> <div class="cell" data-nm="true"> @@ -286,19 +289,93 @@ Projected CRS: WGS 84 / UTM zone 48N <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span> <span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="fu">mf_map</span>(<span class="at">x =</span> district, <span class="at">border =</span> <span class="st">"white"</span>)</span> <span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="fu">mf_map</span>(<span class="at">x =</span> country,<span class="at">lwd =</span> <span class="dv">2</span>, <span class="at">col =</span> <span class="cn">NA</span>, <span class="at">add =</span> <span class="cn">TRUE</span>)</span> -<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="fu">mf_map</span>(<span class="at">x =</span> <span class="fu">subset</span>(cases, Disease <span class="sc">==</span> <span class="st">"W fever"</span>), <span class="at">lwd =</span> .<span class="dv">5</span>, <span class="at">col =</span> <span class="st">"#990000"</span>, <span class="at">pch =</span> <span class="dv">20</span>, <span class="at">add =</span> <span class="cn">TRUE</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> +<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="fu">mf_map</span>(<span class="at">x =</span> cases, <span class="at">lwd =</span> .<span class="dv">5</span>, <span class="at">col =</span> <span class="st">"#990000"</span>, <span class="at">pch =</span> <span class="dv">20</span>, <span class="at">add =</span> <span class="cn">TRUE</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> <div class="cell-output-display"> <p><img src="07-basic_statistics_files/figure-html/cases_visualization-1.png" class="img-fluid" width="768"></p> </div> </div> +<p>In epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, its not clear if this observation represents an event of interest (e.g. illness, death, …) or a person at risk (e.g. a participant that may or may not experience the disease). Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appears as great areal units for cases aggreagation since they make available data on population count and structures. In this study, we will use district as the areal unit of the study.</p> +<div class="cell" data-nm="true"> +<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Aggregate cases over districts</span></span> +<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>district<span class="sc">$</span>cases <span class="ot"><-</span> <span class="fu">lengths</span>(<span class="fu">st_intersects</span>(district, cases))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> +</div> +<p>The incidence (<span class="math inline">\(\frac{cases}{population}\)</span>) is commonly use to represent cases distribution related to population density but other indicators exists. As example, the standardized incidence ratios (SIRs) represents the deviation of observed and expected number of cases and is expressed as <span class="math inline">\(SIR = \frac{Y_i}{E_i}\)</span> with <span class="math inline">\(Y_i\)</span>, the observed number of cases and <span class="math inline">\(E_i\)</span>, the expected number of cases. In this study, we computed the expected number of cases in each district by assuming infections are homogeneously distributed across Cambodia, i.e. the incidence is the same in each district.</p> +<div class="cell" data-nm="true"> +<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute incidence in each district (per 100 000 population)</span></span> +<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>district<span class="sc">$</span>incidence <span class="ot">=</span> district<span class="sc">$</span>cases<span class="sc">/</span>district<span class="sc">$</span>T_POP <span class="sc">*</span> <span class="dv">100000</span></span> +<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute the global risk</span></span> +<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>rate <span class="ot">=</span> <span class="fu">sum</span>(district<span class="sc">$</span>cases)<span class="sc">/</span><span class="fu">sum</span>(district<span class="sc">$</span>T_POP)</span> +<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute expected number of cases </span></span> +<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a>district<span class="sc">$</span>expected <span class="ot">=</span> district<span class="sc">$</span>T_POP <span class="sc">*</span> rate</span> +<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute SIR</span></span> +<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a>district<span class="sc">$</span>SIR <span class="ot">=</span> district<span class="sc">$</span>cases <span class="sc">/</span> district<span class="sc">$</span>expected</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> +</div> +<div class="cell" data-nm="true"> +<div class="sourceCode cell-code" id="cb7"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="fu">par</span>(<span class="at">mfrow =</span> <span class="fu">c</span>(<span class="dv">1</span>, <span class="dv">3</span>))</span> +<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Plot number of cases using proportional symbol </span></span> +<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="fu">mf_map</span>(<span class="at">x =</span> district) </span> +<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="fu">mf_map</span>(</span> +<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> <span class="at">x =</span> district, </span> +<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> <span class="at">var =</span> <span class="st">"cases"</span>,</span> +<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> <span class="at">val_max =</span> <span class="dv">50</span>,</span> +<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> <span class="at">type =</span> <span class="st">"prop"</span>,</span> +<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> <span class="at">col =</span> <span class="st">"#990000"</span>, </span> +<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a> <span class="at">leg_title =</span> <span class="st">"Cases"</span>)</span> +<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a><span class="fu">mf_layout</span>(<span class="at">title =</span> <span class="st">"Number of cases of W Fever"</span>)</span> +<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="co"># Plot incidence </span></span> +<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a><span class="fu">mf_map</span>(<span class="at">x =</span> district,</span> +<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a> <span class="at">var =</span> <span class="st">"incidence"</span>,</span> +<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a> <span class="at">type =</span> <span class="st">"choro"</span>,</span> +<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a> <span class="at">pal =</span> <span class="st">"Reds 3"</span>,</span> +<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a> <span class="at">leg_title =</span> <span class="st">"Incidence </span><span class="sc">\n</span><span class="st">(per 100 000)"</span>)</span> +<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a><span class="fu">mf_layout</span>(<span class="at">title =</span> <span class="st">"Incidence of W Fever"</span>)</span> +<span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a><span class="co"># Plot SIRs</span></span> +<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a><span class="co"># create breaks and associated color palette</span></span> +<span id="cb7-24"><a href="#cb7-24" aria-hidden="true" tabindex="-1"></a>break_SIR <span class="ot">=</span> <span class="fu">c</span>(<span class="dv">0</span>, <span class="fu">exp</span>(<span class="fu">mf_get_breaks</span>(<span class="fu">log</span>(district<span class="sc">$</span>SIR), <span class="at">nbreaks =</span> <span class="dv">8</span>, <span class="at">breaks =</span> <span class="st">"pretty"</span>)))</span> +<span id="cb7-25"><a href="#cb7-25" aria-hidden="true" tabindex="-1"></a>col_pal <span class="ot">=</span> <span class="fu">c</span>(<span class="st">"#273871"</span>, <span class="st">"#3267AD"</span>, <span class="st">"#6496C8"</span>, <span class="st">"#9BBFDD"</span>, <span class="st">"#CDE3F0"</span>, <span class="st">"#FFCEBC"</span>, <span class="st">"#FF967E"</span>, <span class="st">"#F64D41"</span>, <span class="st">"#B90E36"</span>)</span> +<span id="cb7-26"><a href="#cb7-26" aria-hidden="true" tabindex="-1"></a></span> +<span id="cb7-27"><a href="#cb7-27" aria-hidden="true" tabindex="-1"></a><span class="fu">mf_map</span>(<span class="at">x =</span> district,</span> +<span id="cb7-28"><a href="#cb7-28" aria-hidden="true" tabindex="-1"></a> <span class="at">var =</span> <span class="st">"SIR"</span>,</span> +<span id="cb7-29"><a href="#cb7-29" aria-hidden="true" tabindex="-1"></a> <span class="at">type =</span> <span class="st">"choro"</span>,</span> +<span id="cb7-30"><a href="#cb7-30" aria-hidden="true" tabindex="-1"></a> <span class="at">breaks =</span> break_SIR, </span> +<span id="cb7-31"><a href="#cb7-31" aria-hidden="true" tabindex="-1"></a> <span class="at">pal =</span> col_pal, </span> +<span id="cb7-32"><a href="#cb7-32" aria-hidden="true" tabindex="-1"></a> <span class="at">cex =</span> <span class="dv">2</span>,</span> +<span id="cb7-33"><a href="#cb7-33" aria-hidden="true" tabindex="-1"></a> <span class="at">leg_title =</span> <span class="st">"SIR"</span>)</span> +<span id="cb7-34"><a href="#cb7-34" aria-hidden="true" tabindex="-1"></a><span class="fu">mf_layout</span>(<span class="at">title =</span> <span class="st">"Standardized Incidence Ratio of W Fever in Cambodia"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> +<div class="cell-output-display"> +<p><img src="07-basic_statistics_files/figure-html/inc_visualization-1.png" class="img-fluid" width="768"></p> +</div> +</div> +<p>These maps illustrates the spatial heterogenity of the cases. The incidence shows how the disease vary from one district to another while the SIR highlight districts that have :</p> +<ul> +<li><p>higher risk than average (SIR > 1) when standardized for population</p></li> +<li><p>lower risk than average (SIR < 1) when standardized for population</p></li> +<li><p>average risk (SIR ~ 1) when standardized for population</p></li> +</ul> +<p>In this example, we standardized the cases distribution for population count. This simple standardization assume that the risk of contracting the disease is similar for each person. Howerver, this case does not apply for all disease and for all observed events (e.g. the number of childhood illness and death outcomes are usually related to the age pyramid) and you should keep in mind that other standardization can be performed based on variables known to have an effect but that you don’t want to analyze (e.g. sex ratio, occupations, age pyramid).</p> </section> <section id="basics-statistics" class="level2" data-number="7.2"> <h2 data-number="7.2" class="anchored" data-anchor-id="basics-statistics"><span class="header-section-number">7.2</span> Basics statistics</h2> -<section id="autocorrelation" class="level3" data-number="7.2.1"> -<h3 data-number="7.2.1" class="anchored" data-anchor-id="autocorrelation"><span class="header-section-number">7.2.1</span> Autocorrelation</h3> -</section> -<section id="morans-test" class="level3" data-number="7.2.2"> -<h3 data-number="7.2.2" class="anchored" data-anchor-id="morans-test"><span class="header-section-number">7.2.2</span> Moran’s test</h3> +<p>The problem is usually expressed by defining two hypothesis : the null hypothesis (H0), i.e. an a priori hypothesis of the studied phenomenon (e.g. the situation is a random) and the alternative hypothesis (HA), e.g. the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis.</p> +<p>The statistical analysis performed relies on the type of data.</p> +<section id="spatial-autocorrelation-morans-i-test" class="level3" data-number="7.2.1"> +<h3 data-number="7.2.1" class="anchored" data-anchor-id="spatial-autocorrelation-morans-i-test"><span class="header-section-number">7.2.1</span> Spatial autocorrelation (Moran’s I test)</h3> +<p>A popular test for spatial autocorrelation is the Moran’s test.</p> +<p>Moran’s I test tells us whether nearby units tend to exhibit similar rates. It ranges from -1 to +1, whith a value of -1 denoting that units with low rates are located near other units with high rates, while a Moran’s I value of +1 indicates a concentration of spatial units exhibiting similar rates.</p> +<p>We will compute the Moran’s statistics using <code>spdep</code> and <code>Dcluster</code> packages. <code>spdep</code> package provides a collection of functions to analyze spatial correlations of polygons and works with sp objects. <code>Dcluster</code> package provides a set of functions for the detection of spatial clusters of disease using count data.</p> +<div class="cell" data-nm="true"> +<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Plot the incidence histogramm</span></span> +<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="fu">hist</span>(<span class="fu">log</span>(district<span class="sc">$</span>incidence))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> +<div class="cell-output-display"> +<p><img src="07-basic_statistics_files/figure-html/MoransI-1.png" class="img-fluid" width="768"></p> +</div> +</div> </section> </section> <section id="cluster-analysis" class="level2" data-number="7.3"> @@ -306,10 +383,14 @@ Projected CRS: WGS 84 / UTM zone 48N <p>In epidemiology, the definition of a cluster</p> <section id="population-based-clusters-kulldorf-statistic" class="level3" data-number="7.3.1"> <h3 data-number="7.3.1" class="anchored" data-anchor-id="population-based-clusters-kulldorf-statistic"><span class="header-section-number">7.3.1</span> Population-based clusters (kulldorf statistic)</h3> +<p>Kulldorff ’s spatial scan statistic identifies the most likely disease clusters maximizing the likelihood that disease cases are located within a set of concentric circles that are moved across the study area.</p> </section> <section id="expectation-based-cluster" class="level3" data-number="7.3.2"> <h3 data-number="7.3.2" class="anchored" data-anchor-id="expectation-based-cluster"><span class="header-section-number">7.3.2</span> Expectation-based cluster</h3> <p>In many case, population is not specific enough to</p> +</section> +<section id="to-go-further" class="level3" data-number="7.3.3"> +<h3 data-number="7.3.3" class="anchored" data-anchor-id="to-go-further"><span class="header-section-number">7.3.3</span> To go further …</h3> </section> diff --git a/public/07-basic_statistics_files/figure-html/MoransI-1.png b/public/07-basic_statistics_files/figure-html/MoransI-1.png new file mode 100644 index 0000000000000000000000000000000000000000..62fff8800aa2837506b3e31f741313f70f382df6 Binary files /dev/null and b/public/07-basic_statistics_files/figure-html/MoransI-1.png differ diff --git a/public/07-basic_statistics_files/figure-html/inc_visualization-1.png b/public/07-basic_statistics_files/figure-html/inc_visualization-1.png new file mode 100644 index 0000000000000000000000000000000000000000..573c23b0e3e6d18f1b59f99d3f20f6d3aeab929c Binary files /dev/null and b/public/07-basic_statistics_files/figure-html/inc_visualization-1.png differ diff --git a/public/07-basic_statistics_files/figure-html/incidence_visualization-1.png b/public/07-basic_statistics_files/figure-html/incidence_visualization-1.png new file mode 100644 index 0000000000000000000000000000000000000000..573c23b0e3e6d18f1b59f99d3f20f6d3aeab929c Binary files /dev/null and b/public/07-basic_statistics_files/figure-html/incidence_visualization-1.png differ diff --git a/public/search.json b/public/search.json index df8e5fdcbc02deccebe747699d7afd5ca2599764..0dcb34e30017ada10ce172eefec1e54c53533dd1 100644 --- a/public/search.json +++ b/public/search.json @@ -221,20 +221,27 @@ "href": "07-basic_statistics.html#cluster-analysis", "title": "7 Basic statistics for spatial analysis", "section": "7.3 Cluster analysis", - "text": "7.3 Cluster analysis\nIn epidemiology, the definition of a cluster\n\n7.3.1 Population-based clusters (kulldorf statistic)\n\n\n7.3.2 Expectation-based cluster\nIn many case, population is not specific enough to" + "text": "7.3 Cluster analysis\nIn epidemiology, the definition of a cluster\n\n7.3.1 Population-based clusters (kulldorf statistic)\nKulldorff ’s spatial scan statistic identifies the most likely disease clusters maximizing the likelihood that disease cases are located within a set of concentric circles that are moved across the study area.\n\n\n7.3.2 Expectation-based cluster\nIn many case, population is not specific enough to\n\n\n7.3.3 To go further …" }, { "objectID": "07-basic_statistics.html", "href": "07-basic_statistics.html", "title": "7 Basic statistics for spatial analysis", "section": "", - "text": "In this section, we load data that reference the cases of an imaginary disease throughout Cambodia.\n\nlibrary(sf)\n\n#Import Cambodia country border\ncountry = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"country\", quiet = TRUE)\n#Import provincial administrative border of Cambodia\neducation = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"education\", quiet = TRUE)\n#Import district administrative border of Cambodia\ndistrict = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"district\", quiet = TRUE)\n\n# Import locations of cases from an imaginary disease\ncases = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"cases\", quiet = TRUE)\n\nThe first step of any statistical analysis always consists on visualizing the data to check they were correctly loaded and to observe general pattern of the cases.\n\n# View the cases object\nhead(cases)\n\nSimple feature collection with 6 features and 2 fields\nGeometry type: MULTIPOINT\nDimension: XY\nBounding box: xmin: 255891 ymin: 1179092 xmax: 506647.4 ymax: 1467441\nProjected CRS: WGS 84 / UTM zone 48N\n id Disease geom\n1 0 W fever MULTIPOINT ((280036.2 12841...\n2 1 W fever MULTIPOINT ((451859.5 11790...\n3 2 W fever MULTIPOINT ((255891 1467441))\n4 5 W fever MULTIPOINT ((506647.4 12322...\n5 6 W fever MULTIPOINT ((440668 1197958))\n6 7 W fever MULTIPOINT ((481594.5 12714...\n\n# Map the cases\nlibrary(mapsf)\n\nmf_map(x = district, border = \"white\")\nmf_map(x = country,lwd = 2, col = NA, add = TRUE)\nmf_map(x = subset(cases, Disease == \"W fever\"), lwd = .5, col = \"#990000\", pch = 20, add = TRUE)" + "text": "This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data." }, { "objectID": "07-basic_statistics.html#basics-statistics", "href": "07-basic_statistics.html#basics-statistics", "title": "7 Basic statistics for spatial analysis", "section": "7.2 Basics statistics", - "text": "7.2 Basics statistics\n\n7.2.1 Autocorrelation\n\n\n7.2.2 Moran’s test" + "text": "7.2 Basics statistics\nThe problem is usually expressed by defining two hypothesis : the null hypothesis (H0), i.e. an a priori hypothesis of the studied phenomenon (e.g. the situation is a random) and the alternative hypothesis (HA), e.g. the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis.\nThe statistical analysis performed relies on the type of data.\n\n7.2.1 Spatial autocorrelation (Moran’s I test)\nA popular test for spatial autocorrelation is the Moran’s test.\nMoran’s I test tells us whether nearby units tend to exhibit similar rates. It ranges from -1 to +1, whith a value of -1 denoting that units with low rates are located near other units with high rates, while a Moran’s I value of +1 indicates a concentration of spatial units exhibiting similar rates.\nWe will compute the Moran’s statistics using spdep and Dcluster packages. spdep package provides a collection of functions to analyze spatial correlations of polygons and works with sp objects. Dcluster package provides a set of functions for the detection of spatial clusters of disease using count data.\n\n# Plot the incidence histogramm\nhist(log(district$incidence))" + }, + { + "objectID": "07-basic_statistics.html#import-and-visualize-epidemiological-data", + "href": "07-basic_statistics.html#import-and-visualize-epidemiological-data", + "title": "7 Basic statistics for spatial analysis", + "section": "7.1 Import and visualize epidemiological data", + "text": "7.1 Import and visualize epidemiological data\nIn this section, we load data that reference the cases of an imaginary disease throughout Cambodia. Each point correspond to the geolocalisation of a case.\n\nlibrary(sf)\n\n#Import Cambodia country border\ncountry = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"country\", quiet = TRUE)\n#Import provincial administrative border of Cambodia\neducation = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"education\", quiet = TRUE)\n#Import district administrative border of Cambodia\ndistrict = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"district\", quiet = TRUE)\n\n# Import locations of cases from an imaginary disease\ncases = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"cases\", quiet = TRUE)\ncases = subset(cases, Disease == \"W fever\")\n\nThe first step of any statistical analysis always consists on visualizing the data to check they were correctly loaded and to observe general pattern of the cases.\n\n# View the cases object\nhead(cases)\n\nSimple feature collection with 6 features and 2 fields\nGeometry type: MULTIPOINT\nDimension: XY\nBounding box: xmin: 255891 ymin: 1179092 xmax: 506647.4 ymax: 1467441\nProjected CRS: WGS 84 / UTM zone 48N\n id Disease geom\n1 0 W fever MULTIPOINT ((280036.2 12841...\n2 1 W fever MULTIPOINT ((451859.5 11790...\n3 2 W fever MULTIPOINT ((255891 1467441))\n4 5 W fever MULTIPOINT ((506647.4 12322...\n5 6 W fever MULTIPOINT ((440668 1197958))\n6 7 W fever MULTIPOINT ((481594.5 12714...\n\n# Map the cases\nlibrary(mapsf)\n\nmf_map(x = district, border = \"white\")\nmf_map(x = country,lwd = 2, col = NA, add = TRUE)\nmf_map(x = cases, lwd = .5, col = \"#990000\", pch = 20, add = TRUE)\n\n\n\n\nIn epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, its not clear if this observation represents an event of interest (e.g. illness, death, …) or a person at risk (e.g. a participant that may or may not experience the disease). Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appears as great areal units for cases aggreagation since they make available data on population count and structures. In this study, we will use district as the areal unit of the study.\n\n# Aggregate cases over districts\ndistrict$cases <- lengths(st_intersects(district, cases))\n\nThe incidence (\\(\\frac{cases}{population}\\)) is commonly use to represent cases distribution related to population density but other indicators exists. As example, the standardized incidence ratios (SIRs) represents the deviation of observed and expected number of cases and is expressed as \\(SIR = \\frac{Y_i}{E_i}\\) with \\(Y_i\\), the observed number of cases and \\(E_i\\), the expected number of cases. In this study, we computed the expected number of cases in each district by assuming infections are homogeneously distributed across Cambodia, i.e. the incidence is the same in each district.\n\n# Compute incidence in each district (per 100 000 population)\ndistrict$incidence = district$cases/district$T_POP * 100000\n\n# Compute the global risk\nrate = sum(district$cases)/sum(district$T_POP)\n\n# Compute expected number of cases \ndistrict$expected = district$T_POP * rate\n\n# Compute SIR\ndistrict$SIR = district$cases / district$expected\n\n\npar(mfrow = c(1, 3))\n# Plot number of cases using proportional symbol \nmf_map(x = district) \nmf_map(\n x = district, \n var = \"cases\",\n val_max = 50,\n type = \"prop\",\n col = \"#990000\", \n leg_title = \"Cases\")\nmf_layout(title = \"Number of cases of W Fever\")\n\n# Plot incidence \nmf_map(x = district,\n var = \"incidence\",\n type = \"choro\",\n pal = \"Reds 3\",\n leg_title = \"Incidence \\n(per 100 000)\")\nmf_layout(title = \"Incidence of W Fever\")\n\n# Plot SIRs\n\n# create breaks and associated color palette\nbreak_SIR = c(0, exp(mf_get_breaks(log(district$SIR), nbreaks = 8, breaks = \"pretty\")))\ncol_pal = c(\"#273871\", \"#3267AD\", \"#6496C8\", \"#9BBFDD\", \"#CDE3F0\", \"#FFCEBC\", \"#FF967E\", \"#F64D41\", \"#B90E36\")\n\nmf_map(x = district,\n var = \"SIR\",\n type = \"choro\",\n breaks = break_SIR, \n pal = col_pal, \n cex = 2,\n leg_title = \"SIR\")\nmf_layout(title = \"Standardized Incidence Ratio of W Fever in Cambodia\")\n\n\n\n\nThese maps illustrates the spatial heterogenity of the cases. The incidence shows how the disease vary from one district to another while the SIR highlight districts that have :\n\nhigher risk than average (SIR > 1) when standardized for population\nlower risk than average (SIR < 1) when standardized for population\naverage risk (SIR ~ 1) when standardized for population\n\nIn this example, we standardized the cases distribution for population count. This simple standardization assume that the risk of contracting the disease is similar for each person. Howerver, this case does not apply for all disease and for all observed events (e.g. the number of childhood illness and death outcomes are usually related to the age pyramid) and you should keep in mind that other standardization can be performed based on variables known to have an effect but that you don’t want to analyze (e.g. sex ratio, occupations, age pyramid)." } ] \ No newline at end of file