Vizualisation of epidemiolgical data

cbd3abf1 · lea.douchet_ird.fr · bb1e60f9 · cbd3abf1 · cbd3abf1 · cbd3abf1
Commit cbd3abf1 authored 2 years ago by lea.douchet_ird.fr
--- a/05-mapping_with_r_files/figure-html/them1-1.png
+++ b/05-mapping_with_r_files/figure-html/them1-1.png
--- a/05-mapping_with_r_files/figure-html/theme2-1.png
+++ b/05-mapping_with_r_files/figure-html/theme2-1.png
--- a/05-mapping_with_r_files/figure-html/typo_order-1.png
+++ b/05-mapping_with_r_files/figure-html/typo_order-1.png
--- a/05-mapping_with_r_files/figure-html/typo_point-1.png
+++ b/05-mapping_with_r_files/figure-html/typo_point-1.png
--- a/05-mapping_with_r_files/figure-html/typo_simple-1.png
+++ b/05-mapping_with_r_files/figure-html/typo_simple-1.png
--- a/05-mapping_with_r_files/figure-html/typoprop-1.png
+++ b/05-mapping_with_r_files/figure-html/typoprop-1.png
--- a/05-mapping_with_r_files/figure-html/unnamed-chunk-10-1.png
+++ b/05-mapping_with_r_files/figure-html/unnamed-chunk-10-1.png
--- a/05-mapping_with_r_files/figure-html/unnamed-chunk-12-1.png
+++ b/05-mapping_with_r_files/figure-html/unnamed-chunk-12-1.png
--- a/05-mapping_with_r_files/figure-html/unnamed-chunk-14-1.png
+++ b/05-mapping_with_r_files/figure-html/unnamed-chunk-14-1.png
--- a/05-mapping_with_r_files/figure-html/unnamed-chunk-16-1.png
+++ b/05-mapping_with_r_files/figure-html/unnamed-chunk-16-1.png
--- a/05-mapping_with_r_files/figure-html/unnamed-chunk-18-1.png
+++ b/05-mapping_with_r_files/figure-html/unnamed-chunk-18-1.png
--- a/05-mapping_with_r_files/figure-html/unnamed-chunk-22-1.png
+++ b/05-mapping_with_r_files/figure-html/unnamed-chunk-22-1.png
--- a/05-mapping_with_r_files/figure-html/unnamed-chunk-22-2.png
+++ b/05-mapping_with_r_files/figure-html/unnamed-chunk-22-2.png
--- a/05-mapping_with_r_files/figure-html/unnamed-chunk-4-1.png
+++ b/05-mapping_with_r_files/figure-html/unnamed-chunk-4-1.png
--- a/05-mapping_with_r_files/figure-html/unnamed-chunk-8-1.png
+++ b/05-mapping_with_r_files/figure-html/unnamed-chunk-8-1.png
--- a/07-basic_statistics.qmd
+++ b/07-basic_statistics.qmd
@@ -4,11 +4,11 @@ bibliography: references.bib

 # Basic statistics for spatial analysis

-This section aims at providing some basic statistical tools to study the spatial distribution of the cases.
+This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data.

 ## Import and visualize epidemiological data

-In this section, we load data that reference the cases of an imaginary disease throughout Cambodia.
+In this section, we load data that reference the cases of an imaginary disease throughout Cambodia. Each point correspond to the geolocalisation of a case.

 ```{r load_cases, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
 library(sf)
@@ -24,10 +24,6 @@ district = st_read("data_cambodia/cambodia.gpkg", layer = "district", quiet = TR
 cases = st_read("data_cambodia/cambodia.gpkg", layer = "cases", quiet = TRUE)
 cases = subset(cases, Disease == "W fever")

-# Aggregate cases over districts
-district$cases <- lengths(st_intersects(district, cases))
-
-
 ```

 The first step of any statistical analysis always consists on visualizing the data to check they were correctly loaded and to observe general pattern of the cases.
@@ -46,11 +42,83 @@ mf_map(x = cases, lwd = .5, col = "#990000", pch = 20, add = TRUE)

 ```

-## Basics statistics
+In epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, its not clear if this observation represents an event of interest (e.g. illness, death, ...) or a person at risk (e.g. a participant that may or may not experience the disease). Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appears as great areal units for cases aggreagation since they make available data on population count and structures. In this study, we will use district as the areal unit of the study.
+
+```{r district_aggregate, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
+# Aggregate cases over districts
+district$cases <- lengths(st_intersects(district, cases))
+
+```
+
+The incidence ($\frac{cases}{population}$) is commonly use to represent cases distribution related to population density but other indicators exists. As example, the standardized incidence ratios (SIRs) represents the deviation of observed and expected number of cases and is expressed as $SIR = \frac{Y_i}{E_i}$ with $Y_i$, the observed number of cases and $E_i$, the expected number of cases. In this study, we computed the expected number of cases in each district by assuming infections are homogeneously distributed across Cambodia, i.e. the incidence is the same in each district.
+
+```{r indicators, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, fig.height=4, class.output="code-out", warning=FALSE, message=FALSE}
+
+# Compute incidence in each district (per 100 000 population)
+district$incidence = district$cases/district$T_POP * 100000
+
+# Compute the global risk
+rate = sum(district$cases)/sum(district$T_POP)
+
+# Compute expected number of cases 
+district$expected = district$T_POP * rate
+
+# Compute SIR
+district$SIR = district$cases / district$expected
+```
+
+```{r inc_visualization, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, fig.height=4, class.output="code-out", warning=FALSE, message=FALSE}
+par(mfrow = c(1, 3))
+# Plot number of cases using proportional symbol 
+mf_map(x = district) 
+mf_map(
+  x = district, 
+  var = "cases",
+  val_max = 50,
+  type = "prop",
+  col = "#990000", 
+  leg_title = "Cases")
+mf_layout(title = "Number of cases of W Fever")
+
+# Plot incidence 
+mf_map(x = district,
+       var = "incidence",
+       type = "choro",
+       pal = "Reds 3",
+       leg_title = "Incidence \n(per 100 000)")
+mf_layout(title = "Incidence of W Fever")
+
+# Plot SIRs
+
+# create breaks and associated color palette
+break_SIR = c(0, exp(mf_get_breaks(log(district$SIR), nbreaks = 8, breaks = "pretty")))
+col_pal = c("#273871", "#3267AD", "#6496C8", "#9BBFDD", "#CDE3F0", "#FFCEBC", "#FF967E", "#F64D41", "#B90E36")
+
+mf_map(x = district,
+       var = "SIR",
+       type = "choro",
+       breaks = break_SIR, 
+       pal = col_pal, 
+       cex = 2,
+       leg_title = "SIR")
+mf_layout(title = "Standardized Incidence Ratio of W Fever")
+```
+
+These maps illustrates the spatial heterogenity of the cases. The incidence shows how the disease vary from one district to another while the SIR highlight districts that have :
+
+-   higher risk than average (SIR \> 1) when standardized for population

-The problem is usually expressed by defining two hypothesis : the null hypothesis (H0), i.e. an a priori hypothesis of the studied phenomenon (e.g. the situation is a random) and the alternative hypothesis (HA), e.g. the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis.
+-   lower risk than average (SIR \< 1) when standardized for population

-The statistical analysis performed relies on the type of data.
+-   average risk (SIR \~ 1) when standardized for population
+
+In this example, we standardized the cases distribution for population count. This simple standardization assume that the risk of contracting the disease is similar for each person. Howerver, this case does not apply for all diseases and for all observed events (e.g. the number of childhood illness and death outcomes in a district are usually related to the age pyramid) and you should keep in mind that other standardization can be performed based on variables known to have an effect but that you don't want to analyze (e.g. sex ratio, occupations, age pyramid). 
+
+## Cluster analysis
+
+Since this W fever seems to have a heterogenous distribution across Cambodia, it would be interesting to study where excess of cases appears, i.e. to identify clusters of the disease. 
+
+In statistics, problems are usually expressed by defining two hypothesis : the null hypothesis (H0), i.e. an a priori hypothesis of the studied phenomenon (e.g. the situation is a random) and the alternative hypothesis (HA), e.g. the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis.

 ### Spatial autocorrelation (Moran's I test)

@@ -58,13 +126,10 @@ A popular test for spatial autocorrelation is the Moran's test.

 Moran's I test tells us whether nearby units tend to exhibit similar rates. It ranges from -1 to +1, whith a value of -1 denoting that units with low rates are located near other units with high rates, while a Moran's I value of +1 indicates a concentration of spatial units exhibiting similar rates.

-We will compute the Moran's statistics using `spdep` and `Dcluster` packages. This package provides a collection of functions to analyze spatial correlations of polygons and works with sp objects. `Dcluster` package provides a set of functions for the detection of spatial clusters of disease using count data.
+We will compute the Moran's statistics using `spdep` and `Dcluster` packages. `spdep` package provides a collection of functions to analyze spatial correlations of polygons and works with sp objects. `Dcluster` package provides a set of functions for the detection of spatial clusters of disease using count data.

 ```{r MoransI, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}

-# Compte incidence in each district (per 100 000 population)
-district$incidence <- district$cases/district$T_POP * 100000
-
 # Plot the incidence histogramm
 hist(log(district$incidence))


--- a/img/dist_filter_1.png
+++ b/img/dist_filter_1.png
--- a/public/07-basic_statistics.html
+++ b/public/07-basic_statistics.html
--- a/public/07-basic_statistics_files/figure-html/MoransI-1.png
+++ b/public/07-basic_statistics_files/figure-html/MoransI-1.png
--- a/public/07-basic_statistics_files/figure-html/inc_visualization-1.png
+++ b/public/07-basic_statistics_files/figure-html/inc_visualization-1.png