diff --git a/07-basic_statistics.qmd b/07-basic_statistics.qmd new file mode 100644 index 0000000000000000000000000000000000000000..176a1fbd82987bfa53b1fff5e4b8ac80b0b14b5f --- /dev/null +++ b/07-basic_statistics.qmd @@ -0,0 +1,88 @@ +--- +bibliography: references.bib +--- + +# Basic statistics for spatial analysis + +This section aims at providing some basic statistical tools to study the spatial distribution of the cases. + +## Import and visualize epidemiological data + +In this section, we load data that reference the cases of an imaginary disease throughout Cambodia. + +```{r load_cases, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE} +library(sf) + +#Import Cambodia country border +country = st_read("data_cambodia/cambodia.gpkg", layer = "country", quiet = TRUE) +#Import provincial administrative border of Cambodia +education = st_read("data_cambodia/cambodia.gpkg", layer = "education", quiet = TRUE) +#Import district administrative border of Cambodia +district = st_read("data_cambodia/cambodia.gpkg", layer = "district", quiet = TRUE) + +# Import locations of cases from an imaginary disease +cases = st_read("data_cambodia/cambodia.gpkg", layer = "cases", quiet = TRUE) +cases = subset(cases, Disease == "W fever") + +# Aggregate cases over districts +district$cases <- lengths(st_intersects(district, cases)) + + +``` + +The first step of any statistical analysis always consists on visualizing the data to check they were correctly loaded and to observe general pattern of the cases. + +```{r cases_visualization, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE} + +# View the cases object +head(cases) + +# Map the cases +library(mapsf) + +mf_map(x = district, border = "white") +mf_map(x = country,lwd = 2, col = NA, add = TRUE) +mf_map(x = cases, lwd = .5, col = "#990000", pch = 20, add = TRUE) + +``` + +## Basics statistics + +The problem is usually expressed by defining two hypothesis : the null hypothesis (H0), i.e. an a priori hypothesis of the studied phenomenon (e.g. the situation is a random) and the alternative hypothesis (HA), e.g. the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis. + +The statistical analysis performed relies on the type of data. + +### Spatial autocorrelation (Moran's I test) + +A popular test for spatial autocorrelation is the Moran's test. + +Moran's I test tells us whether nearby units tend to exhibit similar rates. It ranges from -1 to +1, whith a value of -1 denoting that units with low rates are located near other units with high rates, while a Moran's I value of +1 indicates a concentration of spatial units exhibiting similar rates. + +We will compute the Moran's statistics using `spdep` and `Dcluster` packages. This package provides a collection of functions to analyze spatial correlations of polygons and works with sp objects. `Dcluster` package provides a set of functions for the detection of spatial clusters of disease using count data. + +```{r MoransI, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE} + +# Compte incidence in each district (per 100 000 population) +district$incidence <- district$cases/district$T_POP * 100000 + +# Plot the incidence histogramm +hist(log(district$incidence)) + + + + +``` + +## Cluster analysis + +In epidemiology, the definition of a cluster + +### Population-based clusters (kulldorf statistic) + +Kulldorff 's spatial scan statistic identifies the most likely disease clusters maximizing the likelihood that disease cases are located within a set of concentric circles that are moved across the study area. + +### Expectation-based cluster + +In many case, population is not specific enough to + +### To go further ...