Skip to content
Snippets Groups Projects
Commit 715cc68f authored by Lea's avatar Lea
Browse files

basic statistic file

parent e48cd23c
No related branches found
No related tags found
No related merge requests found
---
bibliography: references.bib
---
# Basic statistics for spatial analysis
This section aims at providing some basic statistical tools to study the spatial distribution of the cases.
## Import and visualize epidemiological data
In this section, we load data that reference the cases of an imaginary disease throughout Cambodia.
```{r load_cases, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
library(sf)
#Import Cambodia country border
country = st_read("data_cambodia/cambodia.gpkg", layer = "country", quiet = TRUE)
#Import provincial administrative border of Cambodia
education = st_read("data_cambodia/cambodia.gpkg", layer = "education", quiet = TRUE)
#Import district administrative border of Cambodia
district = st_read("data_cambodia/cambodia.gpkg", layer = "district", quiet = TRUE)
# Import locations of cases from an imaginary disease
cases = st_read("data_cambodia/cambodia.gpkg", layer = "cases", quiet = TRUE)
cases = subset(cases, Disease == "W fever")
# Aggregate cases over districts
district$cases <- lengths(st_intersects(district, cases))
```
The first step of any statistical analysis always consists on visualizing the data to check they were correctly loaded and to observe general pattern of the cases.
```{r cases_visualization, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
# View the cases object
head(cases)
# Map the cases
library(mapsf)
mf_map(x = district, border = "white")
mf_map(x = country,lwd = 2, col = NA, add = TRUE)
mf_map(x = cases, lwd = .5, col = "#990000", pch = 20, add = TRUE)
```
## Basics statistics
The problem is usually expressed by defining two hypothesis : the null hypothesis (H0), i.e. an a priori hypothesis of the studied phenomenon (e.g. the situation is a random) and the alternative hypothesis (HA), e.g. the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis.
The statistical analysis performed relies on the type of data.
### Spatial autocorrelation (Moran's I test)
A popular test for spatial autocorrelation is the Moran's test.
Moran's I test tells us whether nearby units tend to exhibit similar rates. It ranges from -1 to +1, whith a value of -1 denoting that units with low rates are located near other units with high rates, while a Moran's I value of +1 indicates a concentration of spatial units exhibiting similar rates.
We will compute the Moran's statistics using `spdep` and `Dcluster` packages. This package provides a collection of functions to analyze spatial correlations of polygons and works with sp objects. `Dcluster` package provides a set of functions for the detection of spatial clusters of disease using count data.
```{r MoransI, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
# Compte incidence in each district (per 100 000 population)
district$incidence <- district$cases/district$T_POP * 100000
# Plot the incidence histogramm
hist(log(district$incidence))
```
## Cluster analysis
In epidemiology, the definition of a cluster
### Population-based clusters (kulldorf statistic)
Kulldorff 's spatial scan statistic identifies the most likely disease clusters maximizing the likelihood that disease cases are located within a set of concentric circles that are moved across the study area.
### Expectation-based cluster
In many case, population is not specific enough to
### To go further ...
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment