Newer
Older
"objectID": "index.html",
"href": "index.html",
"title": "Mapping and spatial analyses in R for One Health studies",
"text": "This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data."
},
{
"objectID": "07-basic_statistics.html#import-and-visualize-epidemiological-data",
"href": "07-basic_statistics.html#import-and-visualize-epidemiological-data",
"title": "6 Basic statistics for spatial analysis",
"section": "6.1 Import and visualize epidemiological data",
"text": "6.1 Import and visualize epidemiological data\nIn this section, we load data that reference the cases of an imaginary disease, the W fever, throughout Cambodia. Each point corresponds to the geo-localization of a case.\n\nlibrary(dplyr)\nlibrary(sf)\n\n#Import Cambodia country border\ncountry <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"country\", quiet = TRUE)\n#Import provincial administrative border of Cambodia\neducation <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"education\", quiet = TRUE)\n#Import district administrative border of Cambodia\ndistrict <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"district\", quiet = TRUE)\n\n# Import locations of cases from an imaginary disease\ncases <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"cases\", quiet = TRUE)\ncases <- subset(cases, Disease == \"W fever\")\n\nThe first step of any statistical analysis always consists on visualizing the data to check they were correctly loaded and to observe general pattern of the cases.\n\n# View the cases object\nhead(cases)\n\nSimple feature collection with 6 features and 2 fields\nGeometry type: MULTIPOINT\nDimension: XY\nBounding box: xmin: 255891 ymin: 1179092 xmax: 506647.4 ymax: 1467441\nProjected CRS: WGS 84 / UTM zone 48N\n id Disease geom\n1 0 W fever MULTIPOINT ((280036.2 12841...\n2 1 W fever MULTIPOINT ((451859.5 11790...\n3 2 W fever MULTIPOINT ((255891 1467441))\n4 5 W fever MULTIPOINT ((506647.4 12322...\n5 6 W fever MULTIPOINT ((440668 1197958))\n6 7 W fever MULTIPOINT ((481594.5 12714...\n\n# Map the cases\nlibrary(mapsf)\n\nmf_map(x = district, border = \"white\")\nmf_map(x = country,lwd = 2, col = NA, add = TRUE)\nmf_map(x = cases, lwd = .5, col = \"#990000\", pch = 20, add = TRUE)\nmf_layout(title = \"W Fever infections in Cambodia\")\n\n\n\n\nIn epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, we cannot precisely tell if this observation represents an event of interest (e.g., illness, death, …) or a person at risk (e.g., a participant that may or may not experience the disease). If you can consider that the population at risk is uniformly distributed in small area (within a city for example), this is likely not the case at a country scale. Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appear as great areal units for cases aggregation since they make available data on population count and structures. In this study, we will use the district as the areal unit of the study.\n\n# Aggregate cases over districts\ndistrict$cases <- lengths(st_intersects(district, cases))\n\n# Plot number of cases using proportional symbol \nmf_map(x = district) \nmf_map(\n x = district, \n var = \"cases\",\n val_max = 50,\n type = \"prop\",\n col = \"#990000\", \n leg_title = \"Cases\")\nmf_layout(title = \"Number of cases of W Fever\")\n\n\n\n\nThe incidence (\\(\\frac{cases}{population}\\)) expressed per 100,000 population is commonly use to represent cases distribution related to population density but other indicators exists. As example, the standardized incidence ratios (SIRs) represent the deviation of observed and expected number of cases and is expressed as \\(SIR = \\frac{Y_i}{E_i}\\) with \\(Y_i\\), the observed number of cases and \\(E_i\\), the expected number of cases. In this study, we computed the expected number of cases in each district by assuming infections are homogeneously distributed across Cambodia, i.e., the incidence is the same in each district. The SIR therefore represents the deviation of incidence compared to the average incidence across Cambodia.\n\n# Compute incidence in each district (per 100 000 population)\ndistrict$incidence <- district$cases/district$T_POP * 100000\n\n# Compute the global risk\nrate <- sum(district$cases)/sum(district$T_POP)\n\n# Compute expected number of cases \ndistrict$expected <- district$T_POP * rate\n\n# Compute SIR\ndistrict$SIR <- district$cases / district$expected\n\n\npar(mfrow = c(1, 2))\n\n# Plot incidence \nmf_map(x = district)\nmf_map(x = district,\n var = c(\"T_POP\", \"incidence\"),\n type = \"prop_choro\",\n pal = \"Reds\",\n inches = .1,\n breaks = exp(mf_get_breaks(log(district$incidence+1), breaks = \"pretty\"))-1,\n leg_title = c(\"Population\", \"Incidence \\n(per 100 000)\"))\nmf_layout(title = \"Incidence of W Fever\")\n\n# Plot SIRs\n# create breaks and associated color palette\nbreak_SIR <- c(0,exp(mf_get_breaks(log(district$SIR), nbreaks = 8, breaks = \"pretty\")))\ncol_pal <- c(\"#273871\", \"#3267AD\", \"#6496C8\", \"#9BBFDD\", \"#CDE3F0\", \"#FFCEBC\", \"#FF967E\", \"#F64D41\", \"#B90E36\")\nmf_map(x = district)\nmf_map(x = district,\n var = c(\"T_POP\", \"SIR\"),\n type = \"prop_choro\",\n breaks = break_SIR,\n pal = col_pal,\n inches = .1,\n #cex = 2,\n leg_title = c(\"Population\", \"SIR\"))\nmf_layout(title = \"Standardized Incidence Ratio of W Fever\")\n\n\n\n\nThese maps illustrate the spatial heterogeneity of the cases. The incidence shows how the disease vary from one district to another while the SIR highlight districts that have:\n\nhigher risk than average (SIR > 1) when standardized for population\nlower risk than average (SIR < 1) when standardized for population\naverage risk (SIR ~ 1) when standardized for population\n\n\n\n\n\n\n\nTo go further …\n\n\n\nIn this example, we standardized the cases distribution for population count. This simple standardization assumes that the risk of contracting the disease is similar for each person. However, assumption does not hold for all diseases and for all observed events since confounding effects can create nuisance into the interpretations (e.g., the number of childhood illness and death outcomes in a district are usually related to the age pyramid). A confounding factor is a variable that influences both the dependent variable and independent variable, causing a spurious association. You should keep in mind that other standardization can be performed based on these confounding factors, i.e. variables known to have an effect but that you don’t want to analyze (e.g., sex ratio, occupations, age pyramid).\n\n\n\n\n\nIn addition, one can wonder what does an SIR ~ 1 means, i.e., what is the threshold to decide whether the SIR is greater, lower or equivalent to 1. The significant of the SIR can be tested globally (to determine whether or not the incidence is homogeneously distributed) and locally in each district (to determine Which district have an SIR different than 1). We won’t perform these analyses in this tutorial but you can look at the functions ?achisq.test() (from Dcluster package (Gómez-Rubio et al. 2015)) and ?probmap() (from spdep package (R. Bivand et al. 2015)) to compute these statistics."
},
{
"objectID": "07-basic_statistics.html#cluster-analysis",
"href": "07-basic_statistics.html#cluster-analysis",
"title": "6 Basic statistics for spatial analysis",
"section": "6.2 Cluster analysis",
"text": "6.2 Cluster analysis\n\n6.2.1 General introduction\nWhy studying clusters in epidemiology? Cluster analysis help identifying unusual patterns that occurs during a given period of time. The underlying ultimate goal of such analysis is to explain the observation of such patterns. In epidemiology, we can distinguish two types of process that would explain heterogeneity in case distribution:\n\nThe 1st order effects are the spatial variations of cases distribution caused by underlying properties of environment or the population structure itself. In such process individual get infected independently from the rest of the population. Such process includes the infection through an environment at risk as, for example, air pollution, contaminated waters or soils and UV exposition. This effect assume that the observed pattern is caused by a difference in risk intensity.\nThe 2nd order effects describes process of spread, contagion and diffusion of diseases caused by interactions between individuals. This includes transmission of infectious disease by proximity, but also the transmission of non-infectious disease, for example, with the diffusion of social norms within networks. This effect assume that the observed pattern is caused by correlations or co-variations.\n\n\n\n\n\n\nNo statistical methods could distinguish between these competing processes since their outcome results in similar pattern of points. The cluster analysis help describing the magnitude and the location of pattern but in no way could answer the question of why such patterns occurs. It is therefore a step that help detecting cluster for description and surveillance purpose and rising hypothesis on the underlying process that will lead further investigations.\nKnowledge about the disease and its transmission process could orientate the choice of the methods of study. We presented in this brief tutorial two methods of cluster detection, the Moran’s I test that test for spatial independence (likely related to 2nd order effects) and the scan statistics that test for homogeneous distribution (likely related 1st order effects). It relies on epidemiologist to select the tools that best serve the studied question.\n\n\n\n\n\n\nStatistic tests and distributions\n\n\n\nIn statistics, problems are usually expressed by defining two hypotheses: the null hypothesis (H0), i.e., an a priori hypothesis of the studied phenomenon (e.g., the situation is a random) and the alternative hypothesis (H1), e.g., the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis.\nIn mathematics, a probability distribution is a mathematical expression that represents what we would expect due to random chance. The choice of the probability distribution relies on the type of data you use (continuous, count, binary). In general, three distribution a used while studying disease rates, the Binomial, the Poisson and the Poisson-gamma mixture (also known as negative binomial) distributions.\nMany the statistical tests assume by default that data are normally distributed. It implies that your variable is continuous and that all data could easily be represented by two parameters, the mean and the variance, i.e., each value have the same level of certainty. If many measure can be assessed under the normality assumption, this is usually not the case in epidemiology with strictly positives rates and count values that 1) does not fit the normal distribution and 2) does not provide with the same degree of certainty since variances likely differ between district due to different population size, i.e., some district have very sparse data (with high variance) while other have adequate data (with lower variance).\n\n# dataset statistics\nm_cases <- mean(district$incidence)\nsd_cases <- sd(district$incidence)\n\nhist(district$incidence, probability = TRUE, ylim = c(0, 0.4), xlim = c(-5, 16), xlab = \"Number of cases\", ylab = \"Probability\", main = \"Histogram of observed incidence compared\\nto Normal and Poisson distributions\")\n\ncurve(dnorm(x, m_cases, sd_cases),col = \"blue\", lwd = 1, add = TRUE)\n\npoints(0:max(district$incidence), dpois(0:max(district$incidence),m_cases),\n type = 'b', pch = 20, col = \"red\", ylim = c(0, 0.6), lty = 2)\n\nlegend(\"topright\", legend = c(\"Normal distribution\", \"Poisson distribution\", \"Observed distribution\"), col = c(\"blue\", \"red\", \"black\"),pch = c(NA, 20, NA), lty = c(1, 2, 1))\n\n\n\n\nIn this tutorial, we used the Poisson distribution in our statistical tests.\n\n\n\n\n6.2.2 Test for spatial autocorrelation (Moran’s I test)\n\n6.2.2.1 The global Moran’s I test\nA popular test for spatial autocorrelation is the Moran’s test. This test tells us whether nearby units tend to exhibit similar incidences. It ranges from -1 to +1. A value of -1 denote that units with low rates are located near other units with high rates, while a Moran’s I value of +1 indicates a concentration of spatial units exhibiting similar rates.\n\n\n\n\n\n\nMoran’s I test\n\n\n\nThe Moran’s statistics is:\n\\[I = \\frac{N}{\\sum_{i=1}^N\\sum_{j=1}^Nw_{ij}}\\frac{\\sum_{i=1}^N\\sum_{j=1}^Nw_{ij}(Y_i-\\bar{Y})(Y_j - \\bar{Y})}{\\sum_{i=1}^N(Y_i-\\bar{Y})^2}\\] with:\n\n\\(N\\): the number of polygons,\n\\(w_{ij}\\): is a matrix of spatial weight with zeroes on the diagonal (i.e., \\(w_{ii}=0\\)). For example, if polygons are neighbors, the weight takes the value \\(1\\) otherwise it takes the value \\(0\\).\n\\(Y_i\\): the variable of interest,\n\\(\\bar{Y}\\): the mean value of \\(Y\\).\n\nUnder the Moran’s test, the statistics hypotheses are:\n\nH0: the distribution of cases is spatially independent, i.e., \\(I=0\\).\nH1: the distribution of cases is spatially autocorrelated, i.e., \\(I\\ne0\\).\n\n\n\nWe will compute the Moran’s statistics using spdep(R. Bivand et al. 2015) and Dcluster(Gómez-Rubio et al. 2015) packages. spdep package provides a collection of functions to analyze spatial correlations of polygons and works with sp objects. In this example, we use poly2nb() and nb2listw(). These functions respectively detect the neighboring polygons and assign weight corresponding to \\(1/\\#\\ of\\ neighbors\\). Dcluster package provides a set of functions for the detection of spatial clusters of disease using count data.\n\n#install.packages(\"spdep\")\n#install.packages(\"DCluster\")\nlibrary(spdep) # Functions for creating spatial weight, spatial analysis\nlibrary(DCluster) # Package with functions for spatial cluster analysis\n\nset.seed(345) # remove random sampling for reproducibility\n\nqueen_nb <- poly2nb(district) # Neighbors according to queen case\nq_listw <- nb2listw(queen_nb, style = 'W') # row-standardized weights\n\n# Moran's I test\nm_test <- moranI.test(cases ~ offset(log(expected)), \n data = district,\n model = 'poisson',\n R = 499,\n listw = q_listw,\n n = length(district$cases), # number of regions\n S0 = Szero(q_listw)) # Global sum of weights\nprint(m_test)\n\nMoran's I test of spatial autocorrelation \n\n Type of boots.: parametric \n Model used when sampling: Poisson \n Number of simulations: 499 \n Statistic: 0.1566449 \n p-value : 0.006 \n\nplot(m_test)\n\n\n\n\nThe Moran’s statistics is here \\(I =\\) 0.16. When comparing its value to the H0 distribution (built under 499 simulations), the probability of observing such a I value under the null hypothesis, i.e. the distribution of cases is spatially independent, is \\(p_{value} =\\) 0.006. We therefore reject H0 with error risk of \\(\\alpha = 5\\%\\). The distribution of cases is therefore autocorrelated across districts in Cambodia.\n\n\n6.2.2.2 The Local Moran’s I LISA test\nThe global Moran’s test provides us a global statistical value informing whether autocorrelation occurs over the territory but does not inform on where does these correlations occurs, i.e., what is the locations of the clusters. To identify such cluster, we can decompose the Moran’s I statistic to extract local information of the level of correlation of each district and its neighbors. This is called the Local Moran’s I LISA statistic. Because the Local Moran’s I LISA statistic test each district for autocorrelation independently, concern is raised about multiple testing limitations that increase the Type I error (\\(\\alpha\\)) of the statistical tests. The use of local test should therefore be study in light of explore and describes clusters once the global test has detected autocorrelation.\n\n\n\n\n\n\nStatistical test\n\n\n\nFor each district \\(i\\), the Local Moran’s I statistics is:\n\\[I_i = \\frac{(Y_i-\\bar{Y})}{\\sum_{i=1}^N(Y_i-\\bar{Y})^2}\\sum_{j=1}^Nw_{ij}(Y_j - \\bar{Y}) \\text{ with } I = \\sum_{i=1}^NI_i/N\\]\n\n\nThe localmoran()function from the package spdep treats the variable of interest as if it was normally distributed. In some cases, this assumption could be reasonable for incidence rate, especially when the areal units of analysis have sufficiently large population count suggesting that the values have similar level of variances. Unfortunately, the local Moran’s test has not been implemented for Poisson distribution (population not large enough in some districts) in spdep package. However, Bivand et al. (R. S. Bivand et al. 2008) provided some code to manually perform the analysis using Poisson distribution and this code was further implemented in the course “Spatial Epidemiology”.\n\n# Step 1 - Create the standardized deviation of observed from expected\nsd_lm <- (district$cases - district$expected) / sqrt(district$expected)\n\n# Step 2 - Create a spatially lagged version of standardized deviation of neighbors\nwsd_lm <- lag.listw(q_listw, sd_lm)\n\n# Step 3 - the local Moran's I is the product of step 1 and step 2\ndistrict$I_lm <- sd_lm * wsd_lm\n\n# Step 4 - setup parameters for simulation of the null distribution\n\n# Specify number of simulations to run\nnsim <- 499\n\n# Specify dimensions of result based on number of regions\nN <- length(district$expected)\n\n# Create a matrix of zeros to hold results, with a row for each county, and a column for each simulation\nsims <- matrix(0, ncol = nsim, nrow = N)\n\n# Step 5 - Start a for-loop to iterate over simulation columns\nfor(i in 1:nsim){\n y <- rpois(N, lambda = district$expected) # generate a random event count, given expected\n sd_lmi <- (y - district$expected) / sqrt(district$expected) # standardized local measure\n wsd_lmi <- lag.listw(q_listw, sd_lmi) # standardized spatially lagged measure\n sims[, i] <- sd_lmi * wsd_lmi # this is the I(i) statistic under this iteration of null\n}\n\n# Step 6 - For each county, test where the observed value ranks with respect to the null simulations\nxrank <- apply(cbind(district$I_lm, sims), 1, function(x) rank(x)[1])\n\n# Step 7 - Calculate the difference between observed rank and total possible (nsim)\ndiff <- nsim - xrank\ndiff <- ifelse(diff > 0, diff, 0)\n\n# Step 8 - Assuming a uniform distribution of ranks, calculate p-value for observed\n# given the null distribution generate from simulations\ndistrict$pval_lm <- punif((diff + 1) / (nsim + 1))\n\nBriefly, the process consist on 1) computing the I statistics for the observed data, 2) estimating the null distribution of the I statistics by performing random sampling into a poisson distribution and 3) comparing the observed I statistic with the null distribution to determine the probability to observe such value if the number of cases were spatially independent. For each district, we obtain a p-value based on the comparison of the observed value and the null distribution.\nA conventional way of plotting these results is to classify the districts into 5 classes based on local Moran’s I output. The classification of cluster that are significantly autocorrelated to their neighbors is performed based on a comparison of the scaled incidence in the district compared to the scaled weighted averaged incidence of it neighboring districts (computed with lag.listw()):\n\nDistricts that have higher-than-average rates in both index regions and their neighbors and showing statistically significant positive values for the local \\(I_i\\) statistic are defined as High-High (hotspot of the disease)\nDistricts that have lower-than-average rates in both index regions and their neighbors and showing statistically significant positive values for the local \\(I_i\\) statistic are defined as Low-Low (cold spot of the disease).\nDistricts that have higher-than-average rates in the index regions and lower-than-average rates in their neighbors, and showing statistically significant negative values for the local \\(I_i\\) statistic are defined as High-Low(outlier with high incidence in an area with low incidence).\nDistricts that have lower-than-average rates in the index regions and higher-than-average rates in their neighbors, and showing statistically significant negative values for the local \\(I_i\\) statistic are defined as Low-High (outlier of low incidence in area with high incidence).\nDistricts with non-significant values for the \\(I_i\\) statistic are defined as Non-significant.\n\n\n# create lagged local raw_rate - in other words the average of the queen neighbors value\n# values are scaled (centered and reduced) to be compared to average\ndistrict$lag_std <- scale(lag.listw(q_listw, var = district$incidence))\ndistrict$incidence_std <- scale(district$incidence)\n\n# extract pvalues\n# district$lm_pv <- lm_test[,5]\n\n# Classify local moran's outputs\ndistrict$lm_class <- NA\ndistrict$lm_class[district$incidence_std >=0 & district$lag_std >=0] <- 'High-High'\ndistrict$lm_class[district$incidence_std <=0 & district$lag_std <=0] <- 'Low-Low'\ndistrict$lm_class[district$incidence_std <=0 & district$lag_std >=0] <- 'Low-High'\ndistrict$lm_class[district$incidence_std >=0 & district$lag_std <=0] <- 'High-Low'\ndistrict$lm_class[district$pval_lm >= 0.05] <- 'Non-significant'\n\ndistrict$lm_class <- factor(district$lm_class, levels=c(\"High-High\", \"Low-Low\", \"High-Low\", \"Low-High\", \"Non-significant\") )\n\n# create map\nmf_map(x = district,\n var = \"lm_class\",\n type = \"typo\",\n cex = 2,\n col_na = \"white\",\n #val_order = c(\"High-High\", \"Low-Low\", \"High-Low\", \"Low-High\", \"Non-significant\") ,\n pal = c(\"#6D0026\" , \"blue\", \"white\") , # \"#FF755F\",\"#7FABD3\" ,\n leg_title = \"Clusters\")\n\nmf_layout(title = \"Cluster using Local Moran's I statistic\")\n\n\n\n\n\n\n\n6.2.3 Spatial scan statistics\nWhile Moran’s indices focus on testing for autocorrelation between neighboring polygons (under the null assumption of spatial independence), the spatial scan statistic aims at identifying an abnormal higher risk in a given region compared to the risk outside of this region (under the null assumption of homogeneous distribution). The conception of a cluster is therefore different between the two methods.\nThe function kulldorff from the package SpatialEpi (Kim and Wakefield 2010) is a simple tool to implement spatial-only scan statistics.\n\n\n\n\n\n\nKulldorf test\n\n\n\nUnder the kulldorff test, the statistics hypotheses are:\n\nH0: the risk is constant over the area, i.e., there is a spatial homogeneity of the incidence.\nH1: a particular window have higher incidence than the rest of the area , i.e., there is a spatial heterogeneity of incidence.\n\n\n\nBriefly, the kulldorff scan statistics scan the area for clusters using several steps:\n\nIt create a circular window of observation by defining a single location and an associated radius of the windows varying from 0 to a large number that depends on population distribution (largest radius could include 50% of the population).\nIt aggregates the count of events and the population at risk (or an expected count of events) inside and outside the window of observation.\nFinally, it computes the likelihood ratio and test whether the risk is equal inside versus outside the windows (H0) or greater inside the observed window (H1). The H0 distribution is estimated by simulating the distribution of counts under the null hypothesis (homogeneous risk).\nThese 3 steps are repeated for each location and each possible windows-radii.\n\nWhile we test the significance of a large number of observation windows, one can raise concern about multiple testing and Type I error. This approach however suggest that we are not interest in a set of signifiant cluster but only in a most-likely cluster. This a priori restriction eliminate concern for multpile comparison since the test is simplified to a statistically significance of one single most-likely cluster.\nBecause we tested all-possible locations and window-radius, we can also choose to look at secondary clusters. In this case, you should keep in mind that increasing the number of secondary cluster you select, increases the risk for Type I error.\n\n#install.packages(\"SpatialEpi\")\nlibrary(\"SpatialEpi\")\n\nThe use of R spatial object is not implements in kulldorff() function. It uses instead matrix of xy coordinates that represents the centroids of the districts. A given district is included into the observed circular window if its centroids fall into the circle.\n\ndistrict_xy <- st_centroid(district) %>% \n st_coordinates()\n\nhead(district_xy)\n\n X Y\n1 330823.3 1464560\n2 749758.3 1541787\n3 468384.0 1277007\n4 494548.2 1215261\n5 459644.2 1194615\n6 360528.3 1516339\n\n\nWe can then call kulldorff function (you are strongly encouraged to call ?kulldorff to properly call the function). The alpha.level threshold filter for the secondary clusters that will be retained. The most-likely cluster will be saved whatever its significance.\n\nkd_Wfever <- kulldorff(district_xy, \n cases = district$cases,\n population = district$T_POP,\n expected.cases = district$expected,\n pop.upper.bound = 0.5, # include maximum 50% of the population in a windows\n n.simulations = 499,\n alpha.level = 0.2)\n\n\n\n\nThe function plot the histogram of the distribution of log-likelihood ratio simulated under the null hypothesis that is estimated based on Monte Carlo simulations. The observed value of the most significant cluster identified from all possible scans is compared to the distribution to determine significance. All outputs are saved into an R object, here called kd_Wfever. Unfortunately, the package did not develop any summary and visualization of the results but we can explore the output object.\n\nnames(kd_Wfever)\n\n[1] \"most.likely.cluster\" \"secondary.clusters\" \"type\" \n[4] \"log.lkhd\" \"simulated.log.lkhd\" \n\n\nFirst, we can focus on the most likely cluster and explore its characteristics.\n\n# We can see which districts (r number) belong to this cluster\nkd_Wfever$most.likely.cluster$location.IDs.included\n\n [1] 48 93 66 180 133 29 194 118 50 144 31 141 3 117 22 43 142\n\n# standardized incidence ratio\nkd_Wfever$most.likely.cluster$SMR\n\n[1] 2.303106\n\n# number of observed and expected cases in this cluster\nkd_Wfever$most.likely.cluster$number.of.cases\n\n[1] 122\n\nkd_Wfever$most.likely.cluster$expected.cases\n\n[1] 52.97195\n\n\n17 districts belong to the cluster and its number of cases is 2.3 times higher than the expected number of cases.\nSimilarly, we could study the secondary clusters. Results are saved in a list.\n\n# We can see which districts (r number) belong to this cluster\nlength(kd_Wfever$secondary.clusters)\n\n[1] 1\n\n# retrieve data for all secondary clusters into a table\ndf_secondary_clusters <- data.frame(SMR = sapply(kd_Wfever$secondary.clusters, '[[', 5), \n number.of.cases = sapply(kd_Wfever$secondary.clusters, '[[', 3),\n expected.cases = sapply(kd_Wfever$secondary.clusters, '[[', 4),\n p.value = sapply(kd_Wfever$secondary.clusters, '[[', 8))\n\nprint(df_secondary_clusters)\n\n SMR number.of.cases expected.cases p.value\n1 3.767698 16 4.246625 0.008\n\n\nWe only have one secondary cluster composed of one district.\n\n# create empty column to store cluster informations\ndistrict$k_cluster <- NA\n\n# save cluster information from kulldorff outputs\ndistrict$k_cluster[kd_Wfever$most.likely.cluster$location.IDs.included] <- 'Most likely cluster'\n\nfor(i in 1:length(kd_Wfever$secondary.clusters)){\ndistrict$k_cluster[kd_Wfever$secondary.clusters[[i]]$location.IDs.included] <- paste(\n 'Secondary cluster', i, sep = '')\n}\n\n#district$k_cluster[is.na(district$k_cluster)] <- \"No cluster\"\n\n\n# create map\nmf_map(x = district,\n var = \"k_cluster\",\n type = \"typo\",\n cex = 2,\n col_na = \"white\",\n pal = mf_get_pal(palette = \"Reds\", n = 3)[1:2],\n leg_title = \"Clusters\")\n\nmf_layout(title = \"Cluster using kulldorf scan statistic\")\n\n\n\n\n\n\n\n\n\n\nTo go further …\n\n\n\nIn this example, the expected number of cases was defined using the population count but note that standardization over other variables as age could also be implemented with the strata parameter in the kulldorff() function.\nIn addition, this cluster analysis was performed solely using the spatial scan but you should keep in mind that this method of cluster detection can be implemented for spatio-temporal data as well where the cluster definition is an abnormal number of cases in a delimited spatial area and during a given period of time. The windows of observation are therefore defined for a different center, radius and time-period. You should take a look at the function scan_ep_poisson() function in the package scanstatistic (Allévius 2018) for this analysis."
{
"objectID": "01-introduction.html",
"href": "01-introduction.html",
"title": "1 Introduction",
"section": "",
"text": "Note\n\n\n\nThe installation part is based on “An Introduction to R” book writed by Alex Douglas, Deon Roos, Francesca Mancini, Ana Couto & David Lusseau\n\n\n\n\n\n\nFor Windows users select the ‘Download R for Windows’ link and then click on the ‘base’ link and finally the download link ‘Download R 4.2.1 for Windows’. This will begin the download of the ‘.exe’ installation file. When the download has completed double click on the R executable file and follow the on-screen instructions. Full installation instructions can be found at the CRAN website.\n\n\n\nFor Mac users select the ‘Download R for (Mac) OS X’ link. The binary can be downloaded by selecting the ‘R-4.2.1.pkg’. Once downloaded, double click on the file icon and follow the on-screen instructions to guide you through the necessary steps. See the ‘R for Mac OS X FAQ’ for further information on installation.\n\n\n\nFor Linux users, the installation method will depend on which flavour of Linux you are using. There are reasonably comprehensive instruction here for Debian, Redhat, Suse and Ubuntu. In most cases you can just use your OS package manager to install R from the official repository. On Ubuntu fire up a shell (Terminal) and use (you will need root permission to do this):\n\nsudo apt update\nsudo apt install r-base r-base-dev\n\nwhich will install base R and also the development version of base R (you only need this if you want to compile R packages from source but it doesn’t hurt to have it).\nIf you receive an error after running the code above you may need to add a ‘source.list’ entry to your etc/apt/sources.list file. To do this open the terminal and enter this:\n\nsudo apt install -y --no-install-recommends software-properties-common dirmngr\n# Add keys\nwget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | sudo tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc\n\nsudo add-apt-repository \"deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/\"\n\nOnce you have done this then re-run the apt commands above and you should be good to go.\nInstall the following packages to allow for future spatial data analysis:\n\nsudo apt install -y libgdal-dev libproj-dev libgeos-dev libudunits2-dev libv8-dev libnode-dev libcairo2-dev libnetcdf-dev\n\n\n\n\n\nWhilst its eminently possible to just use the base installation of R (many people do), we will be using a popular Integrated Development Environment (IDE) called RStudio. RStudio can be thought of as an add-on to R which provides a more user-friendly interface, incorporating the R Console, a script editor and other useful functionality (like R markdown and Git Hub integration). You can find more information about RStudio here.\nRStudio is freely available for Windows, Mac and Linux operating systems and can be downloaded from the RStudio site. You should select the ‘RStudio Desktop’ version. Note: you must install R before you install RStudio.\n\n\nFor Windows and Mac users you should be presented with the appropriate link for downloading. Click on this link and once downloaded run the installer and follow the instructions. If you don’t see the link then scroll down to the ‘All Installers’ section and choose the link manually.\n\n\n\nFor Linux users scroll down to the ‘All Installers’ section and choose the appropriate link to download the binary for your Linux operating system. RStudio for Ubuntu (and Debian) is available as a *.deb package.\nTo install the *.deb file navigate to where you downloaded the file and then enter the following command with root permission\n\nsudo apt install ./rstudio-2022.07.2-576-amd64.deb\n\nYou can then start RStudio from the Console by simply typing\n\nrstudio\n\nor you can create a shortcut on you Desktop for easy startup.\n\n\n\n\n\nThe R help is very useful for the use of functions.\n\n?plot #displays the help page for the plot function\nhelp(\"*\") #for unconventional characters\n\nCalling the help opens a page (the exact behavior depends on the operating system) with information and usage examples about the documented function(s) or operators.\n\n\n\nThe basic syntax is:\n\nafunction <- function(arg1, arg2){\n arg1 + arg2\n}\nafunction(10, 5)\n\n[1] 15"
},
{
"objectID": "01-introduction.html#spatial-in-r-history-and-evolutions",
"href": "01-introduction.html#spatial-in-r-history-and-evolutions",
"title": "1 Introduction",
"section": "1.2 Spatial in R : History and evolutions",
"text": "1.2 Spatial in R : History and evolutions\nHistorically, 4 packages make it possible to import, manipulate and transform spatial data:\n\nThe package rgdal (Bivand, Keitt, and Rowlingson 2022) which is an interface between R and the GDAL (GDAL/OGR contributors, n.d.) and PROJ (PROJ contributors 2021) libraries allow you to import and export spatial data (shapefiles for example) and also to manage cartographic projections\n\nThe package sp (E. J. Pebesma and Bivand 2005) provides class and methods for vector spatial data in R. It allows displaying background maps, inspectiong an attribute table etc.\n\nThe package rgeos (Bivand and Rundel 2021) gives access to the GEOS spatial operations library and therefore makes classic GIS operations available: calculation of surfaces or perimeters, calculation of distances, spatial aggregations, buffer zones, intersections, etc.\n\nThe package raster (Hijmans 2022a) is dedicated to the import, manipulation and modeling of raster data.\n\nToday, the main developments concerning vector data have moved away from the old 3 (sp, rgdal, rgeos) to rely mainly on the package sf ((E. Pebesma 2018a), (E. Pebesma 2018b)). In this manual we will rely exclusively on this package to manipulate vector data.\nThe packages stars (E. Pebesma 2021) and terra (Hijmans 2022b) come to replace the package raster for processing raster data. We have chosen to use the package here terra for its proximity to the raster."
},
{
"objectID": "01-introduction.html#the-package-sf",
"href": "01-introduction.html#the-package-sf",
"title": "1 Introduction",
"section": "1.3 The package sf",
"text": "1.3 The package sf\n The package sf was released in late 2016 by Edzer Pebesma (also author of sp). Its goal is to combine the feature of sp, rgeos and rgdal in a single, more ergonomic package. This package offers simple objects (following the simple feature standard) which are easier to manipulate. Particular attention has been paid to the compatibility of the package with the pipe syntax and the operators of the tidyverse.\nsf directly uses the GDAL, GEOS and PROJ libraries.\n\n\n\n\n\nFrom r-spatial.org\n\n\n\n\n\n\nWebsite of package sf : Simple Features for R\n\n\n\nMany of the spatial data available on the internet are in shapefile format, which can be opened in the following way\n\nlibrary(sf)\n\nLinking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.1; sf_use_s2() is TRUE\n\ndistrict <- st_read(\"data_cambodia/district.shp\")\n\nReading layer `district' from data source \n `/home/lucas/Documents/ForgeIRD/rspatial-for-onehealth/data_cambodia/district.shp' \n using driver `ESRI Shapefile'\nSimple feature collection with 197 features and 10 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: 211534.7 ymin: 1149105 xmax: 784612.1 ymax: 1625495\nProjected CRS: WGS 84 / UTM zone 48N\n\n\n\n\n\n\n\n\nShapefile format limitations\n\n\n\nFor the multiple limitations of this format (multi-file, limited number of records…) we advise you to prefer another format such as the geopackage *.gpkg. All the good reasons not to use the shapefile are here.\n\n\nA geopackage is a database, to load a layer, you must know its name\n\nst_layers(\"data_cambodia/cambodia.gpkg\")\n\nDriver: GPKG \nAvailable layers:\n layer_name geometry_type features fields crs_name\n1 country Multi Polygon 1 10 WGS 84 / UTM zone 48N\n2 district Multi Polygon 197 10 WGS 84 / UTM zone 48N\n3 education Multi Polygon 25 19 WGS 84 / UTM zone 48N\n4 hospital Point 956 13 WGS 84 / UTM zone 48N\n5 cases Multi Point 972 2 WGS 84 / UTM zone 48N\n6 road Multi Line String 6 9 WGS 84 / UTM zone 48N\n\n\n\nroad <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"road\")\n\nReading layer `road' from data source \n `/home/lucas/Documents/ForgeIRD/rspatial-for-onehealth/data_cambodia/cambodia.gpkg' \n using driver `GPKG'\nSimple feature collection with 6 features and 9 fields\nGeometry type: MULTILINESTRING\nDimension: XY\nBounding box: xmin: 212377 ymin: 1152214 xmax: 784654.7 ymax: 1625281\nProjected CRS: WGS 84 / UTM zone 48N\n\n\n\n1.3.1 Format of spatial objects sf\n\n\n\n\n\nObjectssf are objects in data.frame which one of the columns contains geometries. This column is the class of sfc (simple feature column) and each individual of the column is a sfg (simple feature geometry). This format is very practical insofa as the data and the geometries are intrinsically linked in the same object.\n\n\n\n\n\n\nThumbnail describing the simple feature format: Simple Features for R\n\n\n\n\n\n\n\n\n\nTip\n\n\n\nA benchmark of vector processing libraries is available here."
},
{
"objectID": "01-introduction.html#package-mapsf",
"href": "01-introduction.html#package-mapsf",
"title": "1 Introduction",
"section": "1.4 Package mapsf",
"text": "1.4 Package mapsf\nThe free R software spatial ecosystem is rich, dynamic and mature and several packages allow to import, process and represent spatial data. The package mapsf (Giraud 2022) relies on this ecosystem to integrate the creation of quality thematic maps into processing chains with R.\nOther packages can be used to make thematic maps. The package ggplot2 (Wickham 2016), in association with the package ggspatial (Dunnington 2021), allows for example to display spatial objects and to make simple thematic maps. The package tmap (Tennekes 2018) is dedicated to the creation of thematic maps, it uses a syntax close to that of ggplot2 (sequence of instructions combined with the ‘+’ sign). Documentation and tutorials for using these two packages are readily available on the web.\nHere, we will mainly use the package mapsf whose functionalities are quite complete and the handling rather simple. In addition, the package is relatively light.\n\nmapsf allows you to create most of the types of map usually used in statistical cartography (choropleth maps, typologies, proportional or graduated symbols, etc.). For each type of map, several parameters are used to customize the cartographic representation. These parameters are the same as those found in the usual GIS or cartography software (for example, the choice of discretizations and color palettes, the modification of the size of the symbols or the customization of the legends). Associated with the data representation functions, other functions are dedicated to cartographic dressing (themes or graphic charters, legends, scales, orientation arrows, title, credits, annotations, etc.), the creation of boxes or the exporting maps.\nmapsf is the successor of cartography (Giraud and Lambert 2016), it offers the same main functionalities while being lighter and more ergonomic.\nTo use this package several sources can be consulted:\n\nThe package documentation accessible on the internet or directly in R (?mapsf),\nA cheat sheet,\n\n\n\n\n\n\n\nThe vignettes associated with the package show sample scripts,\nThe R Geomatics blog which provides resources and examples related to the package and more generally to the R spatial ecosystem."
},
{
"objectID": "01-introduction.html#the-package-terra",
"href": "01-introduction.html#the-package-terra",
"title": "1 Introduction",
"section": "1.5 The package terra",
"text": "1.5 The package terra\n The package terra was release in early 2020 by Robert J. Hijmans (also author of raster). Its objective is to propose methods of treatment and analysis of raster data. This package is very similar to the package raster; but it has more features, it’s easier to use, and it’s faster.\n\n\n\n\n\n\nWebsite of package terra : Spatial Data Science with R and “terra”\n\n\n\n\n\n\n\n\n\nTip\n\n\n\nA benchmark of raster processing libraries is available here.\n\n\n\n\n\n\nBivand, Roger, Tim Keitt, and Barry Rowlingson. 2022. “Rgdal: Bindings for the ’Geospatial’ Data Abstraction Library.” https://CRAN.R-project.org/package=rgdal.\n\n\nBivand, Roger, and Colin Rundel. 2021. “Rgeos: Interface to Geometry Engine - Open Source (’GEOS’).” https://CRAN.R-project.org/package=rgeos.\n\n\nDunnington, Dewey. 2021. “Ggspatial: Spatial Data Framework for Ggplot2.” https://CRAN.R-project.org/package=ggspatial.\n\n\nGDAL/OGR contributors. n.d. GDAL/OGR Geospatial Data Abstraction Software Library. Open Source Geospatial Foundation. https://gdal.org.\n\n\nGiraud, Timothée. 2022. “Mapsf: Thematic Cartography.” https://CRAN.R-project.org/package=mapsf.\n\n\nGiraud, Timothée, and Nicolas Lambert. 2016. “Cartography: Create and Integrate Maps in Your r Workflow” 1. https://doi.org/10.21105/joss.00054.\n\n\nHijmans, Robert J. 2022a. “Raster: Geographic Data Analysis and Modeling.” https://CRAN.R-project.org/package=raster.\n\n\n———. 2022b. “Terra: Spatial Data Analysis.” https://CRAN.R-project.org/package=terra.\n\n\nPebesma, Edzer. 2018a. “Simple Features for r: Standardized Support for Spatial Vector Data” 10. https://doi.org/10.32614/RJ-2018-009.\n\n\n———. 2018b. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439. https://doi.org/10.32614/rj-2018-009.\n\n\n———. 2021. “Stars: Spatiotemporal Arrays, Raster and Vector Data Cubes.” https://CRAN.R-project.org/package=stars.\n\n\nPebesma, Edzer J., and Roger S. Bivand. 2005. “Classes and Methods for Spatial Data in r” 5. https://CRAN.R-project.org/doc/Rnews/.\n\n\nPROJ contributors. 2021. PROJ Coordinate Transformation Software Library. Open Source Geospatial Foundation. https://proj.org/.\n\n\nTennekes, Martijn. 2018. “Tmap: Thematic Maps in r” 84. https://doi.org/10.18637/jss.v084.i06.\n\n\nWickham, Hadley. 2016. “Ggplot2: Elegant Graphics for Data Analysis.” https://ggplot2.tidyverse.org."
{
"objectID": "02-data_acquisition.html",
"href": "02-data_acquisition.html",
"title": "2 Data Acquisition",
"section": "",
"text": "Note\n\n\n\nSeveral data sets are referenced by the ESoR (Environnement, Societies and Health Risk) research group here\n\n\n\n\nSince the appearance of the sf package, which has greatly contributed to the popularization of spatial data manipulation with R, many packages for making geographic data (geometries and/or attributes) available have been developed. Most of them are API packages that allow to query data made available on the Web, directly with R. This chapter presents a non-exhaustive list of them.\n\n rnaturalearth (South 2017): retrieves Natural Earth map data.\n\n gadmr (Guevarra 2021): retrieves data from the GADM (national and sub-national administrative divisions of all countries in the world).\n\n rgeoboundaries (Dicko 2021) : R client for the geoBoundaries API, providing political administrative boundaries of countries.\n cshapes (Weidmann, Schvitz, and Girardin 2021): makes available national boundaries, from 1886 to present.\n\n osmextract (Gilardi and Lovelace 2021): allows importing OpenStreetMap data.\n osmdata (Padgham et al. 2017b): to download and use OpenStreetMap data.\n\n maptiles (Giraud 2021) : This package downloads, composes and displays tiles from a large number of providers (OpenStreetMap, Stamen, Esri, CARTO or Thunderforest).\n\n geonames (Rowlingson 2019) : allows you to query the geonames DB, which provides locations in particular.\n wbstats (wbstats2020?) and WDI (R-WDI?): provide access to World Bank data and statistics.\n\n sen2r (R-sen2r?): allows automatic download and preprocessing of Sentinel-2 satellite data.\n\n MODIStsp (MODIStsp2016?): find, download and process MODIS images.\n\n geodata (R-geodata?): provides access to data on climate, elevation, soil, species occurrence and administrative boundaries.\n\n elevatr (R-elevatr?): provides access to elevation data made available by Amazon Web Services Terrain Tiles, the Open Topography Global Datasets API and the USGS Elevation Point Query Service.\n rgee (R-rgee?): allows use of the Google Earth Engine API, a public data catalog and computational infrastructure for satellite images.\n\n nasapower (nasapower2018?): NASA client API (global energy resource forecasting, meteorology, surface solar energy, and climatology).\n geoknife (geoknife2015?): allows processing (online) of large raster data from the Geo Data Portal of the U.S. Geological Survey.\n\n wopr (R-wopr?): provides API access to the WorldPop Open Population Repository database.\n\n rdhs (rdhs2019?) : Demographic and Health Survey (DHS) client API and data managements.\n\n\n\n\nSeveral open data portals are particularly useful in the region, including Humanitarian Data Exchange and Open Development.\n\n\n\nCambodia administrative boundaries, divided in 4 levels:\n\nlevel 0: country,\nlevel 1: province / khaet and capital / reach thani,\nlevel 2: municipality / district\nlevel 3: commune / khum quarter / sangkat.\n\nThe contours maps (shapefiles) are made available from the Humanitarian Data Exchange (HDX), provided by OCHA (United Nations Offices for the Coordination of Humanitarian Affairs). These maps were originally produced by the Departement of geography of the Ministry of Land Management, Urbanization and Construction in 2008 and unofficially updated in 2014 according to sub-decrees on administrative modifications. They were provided by WFP - VAM unit Cambodia.\nYou can download these administrative boundaries, as zip folders, here:\n\nkhm_admbnda_adm0_gov_20181004.zip\nkhm_admbnda_adm1_gov_20181004.zip\nkhm_admbnda_adm2_gov_20181004.zip\nkhm_admbnda_adm3_gov_20181004.zip\n\nPopulation data:\nPopulation data is available at these different levels from the Humanitarian Data Exchange (HDX) repository. It comes from the Commune database (CDB), provided by the Cambodia Ministry of Planning.\nhttps://data.humdata.org/dataset/cambodia-population-statistics\nHealth Facility data:\nThe Humanitarian Data Exchange (HDX) repository provides a dataset on the location of health facilities (Referral Hospitals, Health Centers, Health Posts). These maps were originally produced by the Cambodia Ministry of Health (MoH).\nhttps://data.humdata.org/dataset/cambodia-health\nTransportation data:\nThe roads network is available from Humanitarian Data Exchange (HDX) repository. These maps were originally produced by the Cambodia Department of Geography of the Ministry of Land Management, Urbanization and Construction. They include: National road primary and secondary, Provincial road primary, Provincial and rural roads, Foot path, Cart track, Bridge line.\nhttps://data.humdata.org/dataset/cambodia-roads\nHydrology data:\nThe hydrological network is available from Humanitarian Data Exchange (HDX) repository. These maps were originally produced by the Cambodia Department of Geography of the Ministry of Land Management, Urbanization and Construction. They include: rivers (“Non-Perenial/Intermittent/Fluctuating” and “Perennial/Permanent”), lakes\nhttps://data.humdata.org/dataset/cambodia-water-courses-0\nDigital Elevation Model (DEM):\nThe SRTM (Shuttle Radar Topography Mission) is a free DEM provided by NASA and NGA (formerly NIMA). Space Shuttle Endeavour (STS-99) collected these altimetry data during an 11-day mission in February 2000 at an altitude of 233 km using radar interferometry. The SRTM covers nearly 80% of the land area from 56° South latitude to 60° North latitude. Spatial resolution is approximately 30 meters on the line of the Equator.\nThe SRTM data can be downloaded here: http://srtm.csi.cgiar.org"
{
"objectID": "02-data_acquisition.html#openstreetmap",
"href": "02-data_acquisition.html#openstreetmap",
"title": "2 Data Acquisition",
"text": "2.2 OpenStreetMap\n\n\n\nOpenStreetMap (OSM) is a participatory mapping project that aims to built a free geographic database on a global scale. OpenStreetMap lets you view, edit and use geographic data around the world.\nTerms of use\n\nOpenStreetMap is open data : you are free to use it for ant purpose as long as you credit OpenStreetMap and its contributers. If you modify or rely data in any way, you may distribute the result only under the same license. (…)\n\nContributors\n\n(…) Our contributors incloude enthusiastic mapmakers, GIS professional, engineers running OSM servers, humanitarians mapping disaster-stricken areas and many mmore.(…)\n\n\n2.2.1 Display and interactive map\nThe two main packages that allow to display as interactive map based on OSM are leaflet (Cheng, Karambelkar, and Xie 2022) and mapview (Appelhans et al. 2022).\n\n2.2.1.1 leaflet\n leaflet uses the javascript library Leaflet (Agafonkin 2015) to create interactive maps.\n\nlibrary(sf)\nlibrary(leaflet)\n\ndistrict <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"district\", quiet = TRUE)\nhospital <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"hospital\", quiet = TRUE)\n\n\nbanan <- district[district$ADM2_PCODE == \"KH0201\", ] #Select one district (Banan district: KH0201)\nhealth_banan <- hospital[hospital$DCODE == \"201\", ] #Select Health centers in Banan\n\nbanan <- st_transform(banan, 4326) #Transform coordinate system to WGS84\nhealth_banan <- st_transform(health_banan, 4326)\n\nbanan_map <- leaflet(banan) %>% #Create interactive map\n addTiles() %>%\n addPolygons() %>%\n addMarkers(data = health_banan)\nbanan_map\n\n\n\n\n\n\n\n\n\n\n\nWebsite of leaflet\nLeaflet for R\n\n\n\n\n\n2.2.1.2 mapview\n mapview relies on leaflet to create interactive maps, its use is easier and its documentation is a bit dense.\n\nlibrary(mapview)\nmapview(banan) + mapview(health_banan)\n\n\n\n\n\n\n\n\n\n\n\n\nWebsite of mapview\nmapview\n\n\n\n\n\n\n2.2.2 Import basemaps\nThe package maptiles (Giraud 2021) allows downlaoding and displaying raster basemaps.\nThe function get_tiles() allow you to download OSM background maps and the function plot_tiles() allows to display them.\nRenders are better if the input data used the same coordinate system as the tiles (EPSG:3857).\n\nlibrary(sf)\nlibrary(maptiles)\ndistrict <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"district\", quiet = TRUE)\ndistrict <- st_transform(district, 3857)\nosm_tiles <- get_tiles(x = district, zoom = 10, crop = TRUE)\nplot_tiles(osm_tiles)\nplot(st_geometry(district), border = \"grey20\", lwd = .7, add = TRUE)\nmtext(side = 1, line = -2, text = get_credit(\"OpenStreetMap\"), col=\"tomato\")\n\n\n\n\n\n\n2.2.3 Import OSM data\n\n2.2.3.1 osmdata\n The package osmdata (Padgham et al. 2017a) allows extracting vector data from OSM using the Overpass turbo API.\n\nlibrary(sf)\nlibrary(osmdata)\nlibrary(sf)\n\ncountry <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"country\", quiet = TRUE)\next <- opq(bbox = st_bbox(st_transform(country, 4326))) #Define the bounding box\nquery <- add_osm_feature(opq = ext, key = 'amenity', value = \"hospital\") #Health Center Extraction\nhospital <- osmdata_sf(query)\nhospital <- unique_osmdata(hospital) #Result reduction (points composing polygon are detected)\n\nThe result contains a point layer and a polygon layer. The polygon layer contains polygons that represent hospitals. To obtain a coherent point layer we can use the centroids of the polygons.\n\n\nSpherical geometry (s2) switched off\n\n\n\nhospital_point <- hospital$osm_points\nhospital_poly <- hospital$osm_polygons #Extracting centroids of polygons\nhospital_poly_centroid <- st_centroid(hospital_poly)\n\ncambodia_point <- intersect(names(hospital_point), names(hospital_poly_centroid)) #Identify fields in Cambodia boundary\nhospitals <- rbind(hospital_point[, cambodia_point], hospital_poly_centroid[, cambodia_point]) #Gather the 2 objects\n\nResult display\n\nlibrary(mapview)\nmapview(country) + mapview(hospitals)\n\n\n\n\n\n\n\n\n\n\n\n\nWebsite of osmdata\nosmdata\n\n\n\n\n\n2.2.3.2 osmextract\n The package osmextract (Gilardi and Lovelace 2021) allows to extract data from an OSM database directly. This package make it possible to work on very large volumes of data.\n\n\n\n\n\n\nWebsite of osmextract\nosmextract\n\n\n\nFor administrative boundaries, check here the administrative levels by country:\n\nlibrary(osmextract)\nlibrary(mapsf)\nprovince <- oe_get(\n place = \"Cambodia\",\n download_directory = \"data_cambodia/\",\n layer = \"multipolygons\",\n extra_tags = c(\"wikidata\", \"ISO3166-2\", \"wikipedia\", \"name:en\"),\n vectortranslate_options = c(\n \"-t_srs\", \"EPSG:32648\",\n \"-nlt\", \"PROMOTE_TO_MULTI\",\n \"-where\", \"type = 'boundary' AND boundary = 'administrative' AND admin_level = '4'\"\n ))\n\n0...10...20...30...40...50...60...70...80...90...100 - done.\nReading layer `multipolygons' from data source \n `/home/lucas/Documents/ForgeIRD/rspatial-for-onehealth/data_cambodia/geofabrik_cambodia-latest.gpkg' \n using driver `GPKG'\nSimple feature collection with 25 features and 29 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: 211418.1 ymin: 1047956 xmax: 784614.9 ymax: 1625621\nProjected CRS: WGS 84 / UTM zone 48N\n\nmf_map(x = province)\n\n\n\n\n\nroads <- oe_get(\n place = \"Cambodia\",\n download_directory = \"data_cambodia/\",\n layer = \"lines\",\n extra_tags = c(\"access\", \"service\", \"maxspeed\"),\n vectortranslate_options = c(\n \"-t_srs\", \"EPSG:32648\",\n \"-nlt\", \"PROMOTE_TO_MULTI\",\n \"-where\", \"\n highway IS NOT NULL\n AND\n highway NOT IN (\n 'abandonded', 'bus_guideway', 'byway', 'construction', 'corridor', 'elevator',\n 'fixme', 'escalator', 'gallop', 'historic', 'no', 'planned', 'platform',\n 'proposed', 'cycleway', 'pedestrian', 'bridleway', 'footway',\n 'steps', 'path', 'raceway', 'road', 'service', 'track'\n )\n \"\n),\n boundary = subset(province, name_en == \"Phnom Penh\"),\n boundary_type = \"clipsrc\"\n)\n\n0...10...20...30...40...50...60...70...80...90...100 - done.\nReading layer `lines' from data source \n `/home/lucas/Documents/ForgeIRD/rspatial-for-onehealth/data_cambodia/geofabrik_cambodia-latest.gpkg' \n using driver `GPKG'\nSimple feature collection with 18794 features and 12 fields\nGeometry type: MULTILINESTRING\nDimension: XY\nBounding box: xmin: 469524.2 ymin: 1263268 xmax: 503494.3 ymax: 1296780\nProjected CRS: WGS 84 / UTM zone 48N\n\nmf_map(x = roads)"
},
{
"objectID": "02-data_acquisition.html#import-from-lat-long-file",
"href": "02-data_acquisition.html#import-from-lat-long-file",
"title": "2 Data Acquisition",
"section": "2.3 Import from lat / long file",
"text": "2.3 Import from lat / long file\nThe function st_as_sf() makes it possible to transform a data.frame container of geographic coordinates into an object sf. Here we use the data.frame places2 created in the previous point.\n\nlibrary(sf)\nplace_sf <- st_as_sf(read.csv(\"data_cambodia/adress.csv\"), coords = c(\"long\", \"lat\"), crs = 4326)\nplace_sf\n\nSimple feature collection with 2 features and 1 field\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: 104.8443 ymin: 11.54366 xmax: 104.9047 ymax: 11.55349\nGeodetic CRS: WGS 84\n address\n1 Phnom Penh International Airport, Phnom Penh, Cambodia\n2 Khmer Soviet Friendship Hospital, Phnom Penh, Cambodia\n geometry\n1 POINT (104.8443 11.55349)\n2 POINT (104.9047 11.54366)\n\n\n\n\n\nTo create a sf POINT type object with only one pair of coordinate (WGS84, longitude=0.5, latitude = 45.5) :\n\nlibrary(sf)\ntest_point <- st_as_sf(data.frame(x = 0.5, y = 45.5), coords = c(\"x\", \"y\"), crs = 4326)\ntest_point\n\nSimple feature collection with 1 feature and 0 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: 0.5 ymin: 45.5 xmax: 0.5 ymax: 45.5\nGeodetic CRS: WGS 84\n geometry\n1 POINT (0.5 45.5)\n\n\nWe can display this object sf on an OpenStreetMap basesmap with the package maptiles maptiles (Giraud 2021).\n\nlibrary(maptiles)\nosm <- get_tiles(x = place_sf, zoom = 12)\nplot_tiles(osm)\nplot(st_geometry(place_sf), pch = 2, cex = 2, col = \"red\", add = TRUE)"
},
{
"objectID": "02-data_acquisition.html#geocoding",
"href": "02-data_acquisition.html#geocoding",
"title": "2 Data Acquisition",
"section": "2.4 Geocoding",
"text": "2.4 Geocoding\nServeral pakages alow you to geocode addresses. The package tidygeocoder (Cambon et al. 2021) allow the use of a large number of online geocoding sevices. The package banR (Gombin and Chevalier 2022), which is based on the National Address Base, is the particularly suitable for geocoding addresses in France.\n\n2.4.1 tidygeocoder\n\nlibrary(tidygeocoder)\ntest_adresses <- data.frame(\n address = c(\"Phnom Penh International Airport, Phnom Penh, Cambodia\",\n \"Khmer Soviet Friendship Hospital, Phnom Penh, Cambodia\"))\nplaces1 <- geocode(test_adresses, address)\nplaces1\n\n# A tibble: 2 × 3\n address lat long\n <chr> <dbl> <dbl>\n1 Phnom Penh International Airport, Phnom Penh, Cambodia 11.6 105.\n2 Khmer Soviet Friendship Hospital, Phnom Penh, Cambodia 11.5 105.\n\n\n\n\n\n\n\n\nWebsite by tidygeocoder :\ntidygeocoder\n\n\n\n\n\n2.4.2 banR (Base Adresse Nationale)\n\n# remotes::install_github(\"joelgombin/banR\")\nlibrary(banR)\nmes_adresses <- data.frame(\n address = c(\"19 rue Michel Bakounine, 29600 Morlaix, France\",\n \"2 Allee Emile Pouget, 920128 Boulogne-Billancourt\")\n)\nplaces2 <- geocode_tbl(tbl = mes_adresses, adresse = address)\nplaces2\n\n# A tibble: 2 × 18\n address latit…¹ longi…² resul…³ resul…⁴ resul…⁵ resul…⁶ resul…⁷ resul…⁸\n <chr> <dbl> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> \n1 19 rue Michel… 48.6 -3.82 19 Rue… 0.81 housen… 29151_… 19 Rue Mi…\n2 2 Allee Emile… 48.8 2.24 2 Allé… 0.83 housen… 92012_… 2 Allée …\n# … with 9 more variables: result_street <chr>, result_postcode <chr>,\n# result_city <chr>, result_context <chr>, result_citycode <chr>,\n# result_oldcitycode <chr>, result_oldcity <chr>, result_district <chr>,\n# result_status <chr>, and abbreviated variable names ¹latitude, ²longitude,\n# ³result_label, ⁴result_score, ⁵result_type, ⁶result_id,\n# ⁷result_housenumber, ⁸result_name\n\n\n\n\n\n\n\n\nWebsite of banR :\nAn R client for the BAN API"
},
{
"objectID": "02-data_acquisition.html#digitization",
"href": "02-data_acquisition.html#digitization",
"title": "2 Data Acquisition",
"text": "2.5 Digitization\nThe package mapedit (Appelhans, Russell, and Busetto 2020) allows you to digitize base map directly in R. Although it can be practical in some cases, in package cannot replace the functionalities of a GIS for important digitization tasks.\n\n\n\nGif taken from mapedit website\n\n\n\n\n\n\nAgafonkin, Vladimir. 2015. “Leaflet Javascript Libary.”\n\n\nAppelhans, Tim, Florian Detsch, Christoph Reudenbach, and Stefan Woellauer. 2022. “Mapview: Interactive Viewing of Spatial Data in r.” https://CRAN.R-project.org/package=mapview.\n\n\nAppelhans, Tim, Kenton Russell, and Lorenzo Busetto. 2020. “Mapedit: Interactive Editing of Spatial Data in r.” https://CRAN.R-project.org/package=mapedit.\n\n\nCambon, Jesse, Diego Hernangómez, Christopher Belanger, and Daniel Possenriede. 2021. “Tidygeocoder: An r Package for Geocoding” 6: 3544. https://doi.org/10.21105/joss.03544.\n\n\nCheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2022. “Leaflet: Create Interactive Web Maps with the JavaScript ’Leaflet’ Library.” https://CRAN.R-project.org/package=leaflet.\n\n\nDicko, Ahmadou. 2021. R Client for the geoBoundaries API, Providing Country Political Administrative Boundaries. https://dickoa.gitlab.io/rgeoboundaries/index.html.\n\n\nGilardi, Andrea, and Robin Lovelace. 2021. “Osmextract: Download and Import Open Street Map Data Extracts.” https://CRAN.R-project.org/package=osmextract.\n\n\nGiraud, Timothée. 2021. “Maptiles: Download and Display Map Tiles.” https://CRAN.R-project.org/package=maptiles.\n\n\nGombin, Joel, and Paul-Antoine Chevalier. 2022. “banR: R Client for the BAN API.”\n\n\nGuevarra, Ernest. 2021. Gadmr: An r Interface to the GADM Map Repository. https://github.com/SpatialWorks/gadmr.\n\n\nPadgham, Mark, Bob Rudis, Robin Lovelace, and Maëlle Salmon. 2017a. “Osmdata” 2. https://doi.org/10.21105/joss.00305.\n\n\n———. 2017b. “Osmdata.” The Journal of Open Source Software 2 (14). https://doi.org/10.21105/joss.00305.\n\n\nRowlingson, Barry. 2019. Geonames: Interface to the \"Geonames\" Spatial Query Web Service. https://CRAN.R-project.org/package=geonames.\n\n\nSouth, Andy. 2017. “Rnaturalearth: World Map Data from Natural Earth.” https://CRAN.R-project.org/package=rnaturalearth.\n\n\nWeidmann, Nils B., Guy Schvitz, and Luc Girardin. 2021. Cshapes: The CShapes 2.0 Dataset and Utilities. https://CRAN.R-project.org/package=cshapes."
},
{
"objectID": "03-vector_data.html",
"href": "03-vector_data.html",
"title": "3 Using vector data",
"section": "",
"text": "The st_read() and st_write() function are used to import and export many types of files. The following lines import the administrative data in district level layer located in the district.shp shapefile file.\n\nlibrary(sf)\n\nLinking to GEOS 3.10.2, GDAL 3.4.3, PROJ 8.2.1; sf_use_s2() is TRUE\n\ndistrict <- st_read(\"data_cambodia/district.shp\")\n\nReading layer `district' from data source \n `/home/lucas/Documents/ForgeIRD/rspatial-for-onehealth/data_cambodia/district.shp' \n using driver `ESRI Shapefile'\nSimple feature collection with 197 features and 10 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: 211534.7 ymin: 1149105 xmax: 784612.1 ymax: 1625495\nProjected CRS: WGS 84 / UTM zone 48N\n\n\n\n\n\n\n\n\nShapefile format limitations\n\n\n\nFor the multiple limitations of this format (multi-file, limited number of records…) we advise you to prefer another format such as the geopackage *.gpkg. All the good reasons not to use the shapefile are here.\n\n\nA geopackage is a database, to load a layer, you must know its name\n\nst_layers(\"data_cambodia/cambodia.gpkg\")\n\nDriver: GPKG \nAvailable layers:\n layer_name geometry_type features fields crs_name\n1 country Multi Polygon 1 10 WGS 84 / UTM zone 48N\n2 district Multi Polygon 197 10 WGS 84 / UTM zone 48N\n3 education Multi Polygon 25 19 WGS 84 / UTM zone 48N\n4 hospital Point 956 13 WGS 84 / UTM zone 48N\n5 cases Multi Point 972 2 WGS 84 / UTM zone 48N\n6 road Multi Line String 6 9 WGS 84 / UTM zone 48N\n\n\n\nroad <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"road\")\n\nReading layer `road' from data source \n `/home/lucas/Documents/ForgeIRD/rspatial-for-onehealth/data_cambodia/cambodia.gpkg' \n using driver `GPKG'\nSimple feature collection with 6 features and 9 fields\nGeometry type: MULTILINESTRING\nDimension: XY\nBounding box: xmin: 212377 ymin: 1152214 xmax: 784654.7 ymax: 1625281\nProjected CRS: WGS 84 / UTM zone 48N\n\n\n\nlibrary(sf)\n\ndistrict = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"district\") #import district data\n\nReading layer `district' from data source \n `/home/lucas/Documents/ForgeIRD/rspatial-for-onehealth/data_cambodia/cambodia.gpkg' \n using driver `GPKG'\nSimple feature collection with 197 features and 10 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: 211534.7 ymin: 1149105 xmax: 784612.1 ymax: 1625495\nProjected CRS: WGS 84 / UTM zone 48N\n\n\nThe following lines export the district object to a data folder in geopackage and shapefile format.\n\nst_write(obj = district, dsn = \"data_cambodia/district.gpkg\", delete_layer = TRUE)\nst_write(obj = district, \"data_cambodia/district.shp\", layer_options = \"ENCODING=UTF-8\", delete_layer = TRUE)"
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
},
{
"objectID": "03-vector_data.html#display",
"href": "03-vector_data.html#display",
"title": "3 Using vector data",
"section": "3.2 Display",
"text": "3.2 Display\nPreview of the variables via the function head() and plot().\n\nhead(district)\n\nSimple feature collection with 6 features and 10 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: 300266.9 ymin: 1180566 xmax: 767313.9 ymax: 1563861\nProjected CRS: WGS 84 / UTM zone 48N\n ADM2_EN ADM2_PCODE ADM1_EN ADM1_PCODE Male Female T_POP Area.Km2.\n1 Aek Phnum KH0205 Battambang KH02 41500 43916 85416 1067.8638\n2 Andoung Meas KH1601 Ratanak Kiri KH16 7336 7372 14708 837.7064\n3 Angk Snuol KH0808 Kandal KH08 45436 47141 92577 183.9050\n4 Angkor Borei KH2101 Takeo KH21 26306 27168 53474 301.0502\n5 Angkor Chey KH0701 Kampot KH07 42448 44865 87313 316.7576\n6 Angkor Chum KH1701 Siemreap KH17 34269 34576 68845 478.6988\n Status DENs geom\n1 <4500km2 79.98773 MULTIPOLYGON (((306568.1 14...\n2 <4500km2 17.55747 MULTIPOLYGON (((751459.2 15...\n3 <4500km2 503.39580 MULTIPOLYGON (((471954.3 12...\n4 <4500km2 177.62485 MULTIPOLYGON (((490048.2 12...\n5 <4500km2 275.64610 MULTIPOLYGON (((462702.2 12...\n6 <4500km2 143.81696 MULTIPOLYGON (((363642.5 15...\n\nplot(district)\n\n\n\n\nfor Geometry display only.\n\nplot(st_geometry(district))"
},
{
"objectID": "03-vector_data.html#coordinate-systems",
"href": "03-vector_data.html#coordinate-systems",
"title": "3 Using vector data",
"section": "3.3 Coordinate systems",
"text": "3.3 Coordinate systems\n\n3.3.1 Look up the coordinate system of an object\nThe function st_crs() makes it possible to consult the system of coordinates used and object sf.\n\nst_crs(district)\n\nCoordinate Reference System:\n User input: WGS 84 / UTM zone 48N \n wkt:\nPROJCRS[\"WGS 84 / UTM zone 48N\",\n BASEGEOGCRS[\"WGS 84\",\n ENSEMBLE[\"World Geodetic System 1984 ensemble\",\n MEMBER[\"World Geodetic System 1984 (Transit)\"],\n MEMBER[\"World Geodetic System 1984 (G730)\"],\n MEMBER[\"World Geodetic System 1984 (G873)\"],\n MEMBER[\"World Geodetic System 1984 (G1150)\"],\n MEMBER[\"World Geodetic System 1984 (G1674)\"],\n MEMBER[\"World Geodetic System 1984 (G1762)\"],\n MEMBER[\"World Geodetic System 1984 (G2139)\"],\n ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n LENGTHUNIT[\"metre\",1]],\n ENSEMBLEACCURACY[2.0]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n ID[\"EPSG\",4326]],\n CONVERSION[\"UTM zone 48N\",\n METHOD[\"Transverse Mercator\",\n ID[\"EPSG\",9807]],\n PARAMETER[\"Latitude of natural origin\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8801]],\n PARAMETER[\"Longitude of natural origin\",105,\n ANGLEUNIT[\"degree\",0.0174532925199433],\n ID[\"EPSG\",8802]],\n PARAMETER[\"Scale factor at natural origin\",0.9996,\n SCALEUNIT[\"unity\",1],\n ID[\"EPSG\",8805]],\n PARAMETER[\"False easting\",500000,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8806]],\n PARAMETER[\"False northing\",0,\n LENGTHUNIT[\"metre\",1],\n ID[\"EPSG\",8807]]],\n CS[Cartesian,2],\n AXIS[\"(E)\",east,\n ORDER[1],\n LENGTHUNIT[\"metre\",1]],\n AXIS[\"(N)\",north,\n ORDER[2],\n LENGTHUNIT[\"metre\",1]],\n USAGE[\n SCOPE[\"Engineering survey, topographic mapping.\"],\n AREA[\"Between 102°E and 108°E, northern hemisphere between equator and 84°N, onshore and offshore. Cambodia. China. Indonesia. Laos. Malaysia - West Malaysia. Mongolia. Russian Federation. Singapore. Thailand. Vietnam.\"],\n BBOX[0,102,84,108]],\n ID[\"EPSG\",32648]]\n\n\n\n\n3.3.2 Changing the coordinate system of an object\nThe function st_transform() allows to change the coordinate system of an sf object, to re-project it.\n\nplot(st_geometry(district))\ntitle(\"WGS 84 / UTM zone 48N\")\n\n\n\ndist_reproj <- st_transform(district, \"epsg:4326\")\nplot(st_geometry(dist_reproj))\ntitle(\"WGS84\")\n\n\n\n\nThe Spatial Reference site provides reference for a large number of coordinate systems."
},
{
"objectID": "03-vector_data.html#selection-by-attributes",
"href": "03-vector_data.html#selection-by-attributes",
"title": "3 Using vector data",
"section": "3.4 Selection by attributes",
"text": "3.4 Selection by attributes\nThe object sf are data.frame, so you can select their rows and columns in the same way as data.frame.\n\n# row Selection\ndistrict[1:2, ]\n\nSimple feature collection with 2 features and 10 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: 300266.9 ymin: 1449408 xmax: 767313.9 ymax: 1563861\nProjected CRS: WGS 84 / UTM zone 48N\n ADM2_EN ADM2_PCODE ADM1_EN ADM1_PCODE Male Female T_POP Area.Km2.\n1 Aek Phnum KH0205 Battambang KH02 41500 43916 85416 1067.8638\n2 Andoung Meas KH1601 Ratanak Kiri KH16 7336 7372 14708 837.7064\n Status DENs geom\n1 <4500km2 79.98773 MULTIPOLYGON (((306568.1 14...\n2 <4500km2 17.55747 MULTIPOLYGON (((751459.2 15...\n\ndistrict[district$ADM1_EN == \"Phnom Penh\", ]\n\nSimple feature collection with 12 features and 10 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: 468677.5 ymin: 1262590 xmax: 505351.9 ymax: 1297419\nProjected CRS: WGS 84 / UTM zone 48N\nFirst 10 features:\n ADM2_EN ADM2_PCODE ADM1_EN ADM1_PCODE Male Female T_POP\n29 Chamkar Mon KH1201 Phnom Penh KH12 52278 54478 106756\n31 Chbar Ampov KH1212 Phnom Penh KH12 64816 68243 133059\n43 Chraoy Chongvar KH1210 Phnom Penh KH12 30920 31087 62007\n48 Dangkao KH1205 Phnom Penh KH12 46999 48525 95524\n50 Doun Penh KH1202 Phnom Penh KH12 33844 36471 70315\n93 Mean Chey KH1206 Phnom Penh KH12 68381 70366 138747\n117 Praek Pnov KH1211 Phnom Penh KH12 27566 27698 55264\n118 Prampir Meakkakra KH1203 Phnom Penh KH12 31091 33687 64778\n133 Pur SenChey KH1209 Phnom Penh KH12 95050 109297 204347\n141 Russey Keo KH1207 Phnom Penh KH12 67357 68419 135776\n Area.Km2. Status DENs geom\n29 11.049600 <4500km2 9661.5265 MULTIPOLYGON (((494709.4 12...\n31 86.780498 <4500km2 1533.2823 MULTIPOLYGON (((498855.3 12...\n43 85.609156 <4500km2 724.3034 MULTIPOLYGON (((491161.3 12...\n48 113.774833 <4500km2 839.5881 MULTIPOLYGON (((489191.1 12...\n50 7.734808 <4500km2 9090.7234 MULTIPOLYGON (((492447.1 12...\n93 28.998026 <4500km2 4784.7051 MULTIPOLYGON (((491068.2 12...\n117 115.384300 <4500km2 478.9560 MULTIPOLYGON (((481483.3 12...\n118 2.224892 <4500km2 29115.1253 MULTIPOLYGON (((491067.6 12...\n133 148.357984 <4500km2 1377.3913 MULTIPOLYGON (((479078.8 12...\n141 23.381517 <4500km2 5806.9800 MULTIPOLYGON (((490264.8 12...\n\n# column selection\ndistrict[district$ADM1_EN == \"Phnom Penh\", 1:4] \n\nSimple feature collection with 12 features and 4 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: 468677.5 ymin: 1262590 xmax: 505351.9 ymax: 1297419\nProjected CRS: WGS 84 / UTM zone 48N\nFirst 10 features:\n ADM2_EN ADM2_PCODE ADM1_EN ADM1_PCODE\n29 Chamkar Mon KH1201 Phnom Penh KH12\n31 Chbar Ampov KH1212 Phnom Penh KH12\n43 Chraoy Chongvar KH1210 Phnom Penh KH12\n48 Dangkao KH1205 Phnom Penh KH12\n50 Doun Penh KH1202 Phnom Penh KH12\n93 Mean Chey KH1206 Phnom Penh KH12\n117 Praek Pnov KH1211 Phnom Penh KH12\n118 Prampir Meakkakra KH1203 Phnom Penh KH12\n133 Pur SenChey KH1209 Phnom Penh KH12\n141 Russey Keo KH1207 Phnom Penh KH12\n geom\n29 MULTIPOLYGON (((494709.4 12...\n31 MULTIPOLYGON (((498855.3 12...\n43 MULTIPOLYGON (((491161.3 12...\n48 MULTIPOLYGON (((489191.1 12...\n50 MULTIPOLYGON (((492447.1 12...\n93 MULTIPOLYGON (((491068.2 12...\n117 MULTIPOLYGON (((481483.3 12...\n118 MULTIPOLYGON (((491067.6 12...\n133 MULTIPOLYGON (((479078.8 12...\n141 MULTIPOLYGON (((490264.8 12..."
},
{
"objectID": "03-vector_data.html#spatial-selection",
"href": "03-vector_data.html#spatial-selection",
"title": "3 Using vector data",
"section": "3.5 Spatial selection",
"text": "3.5 Spatial selection\n\n3.5.1 Intersections\nSelection of roads that are intersecting dangkao district\n\nroad <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"road\", quiet = TRUE) %>% st_cast(\"LINESTRING\")\ndangkao <- district[district$ADM2_EN == \"Dangkao\", ]\ninter <- st_intersects(x = road, y = dangkao, sparse = FALSE)\nhead(inter)\n\n [,1]\n[1,] FALSE\n[2,] FALSE\n[3,] FALSE\n[4,] FALSE\n[5,] FALSE\n[6,] FALSE\n\ndim(inter)\n\n[1] 108285 1\n\n\nThe inter object is a matrix which indicates for each of element of the road object (6 elements) whether it intersects each elements the dangkao object (1 element). The dimension of the matrix is therefore indeed 6 rows * 1 column. Note the use of the parameter sparse = FALSE here. It is then possible to create a column from this object:\n\nroad$intersect_dangkao <- inter\nplot(st_geometry(dangkao), col = \"lightblue\")\nplot(st_geometry(road), add = TRUE)\nplot(st_geometry(road[road$intersect_dangkao, ]),\n col = \"tomato\", lwd = 1.5, add = TRUE)\n\n\n\n\n\n3.5.1.1 Difference between sparse = TRUE and sparse = FALSE\n\n\n\n\n\n\nsparse = TRUE\n\n\ninter <- st_intersects(x = grid, y = pt, sparse = TRUE)\ninter\n\nSparse geometry binary predicate list of length 4, where the predicate\nwas `intersects'\n 1: (empty)\n 2: 6, 7\n 3: 1, 4\n 4: 2, 3, 5, 8\n\n\n\nsparse = FALSE\n\n\ninter <- st_intersects(x = grid, y = pt, sparse = FALSE)\nrownames(inter) <- grid$id\ncolnames(inter) <- pt$id\ninter\n\n a b c d e f g h\n1 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE\n2 FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE\n3 TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE\n4 FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE\n\n\n\n\n\n3.5.2 Contains / Within\nSelection of roads contained in the municipality of Dangkao. The function st_within() works like the function st_intersects()\n\nroad$within_dangkao <- st_within(road, dangkao, sparse = FALSE)\nplot(st_geometry(dangkao), col = \"lightblue\")\nplot(st_geometry(road), add = TRUE)\nplot(st_geometry(road[road$within_dangkao, ]), col = \"tomato\",\n lwd = 2, add = TRUE)"
},
{
"objectID": "03-vector_data.html#operation-of-geometries",
"href": "03-vector_data.html#operation-of-geometries",
"title": "3 Using vector data",
"section": "3.6 Operation of geometries",
"text": "3.6 Operation of geometries\n\n3.6.1 Extract centroids\n\ndist_c <- st_centroid(district)\nplot(st_geometry(district))\nplot(st_geometry(dist_c), add = TRUE, cex = 1.2, col = \"red\", pch = 20)\n\n\n\n\n\n\n3.6.2 Aggregate polygons\n\ncambodia_dist <- st_union(district) \nplot(st_geometry(district), col = \"lightblue\")\nplot(st_geometry(cambodia_dist), add = TRUE, lwd = 2, border = \"red\")\n\n\n\n\n\n\n3.6.3 Aggregate polygons based on a variable\n\ndist_union <- aggregate(x = district[,c(\"T_POP\")],\n by = list(STATUT = district$Status),\n FUN = \"sum\")\nplot(dist_union)\n\n\n\n\n\n\n3.6.4 Create a buffer zone\n\ndangkao_buffer <- st_buffer(x = dangkao, dist = 1000)\nplot(st_geometry(dangkao_buffer), col = \"#E8DAEF\", lwd=2, border = \"#6C3483\")\nplot(st_geometry(dangkao), add = TRUE, lwd = 2)\n\n\n\n\n\n\n3.6.5 Making an intersection\nBy using the function st_intersection() we will cut one layer by another.\n\nlibrary(magrittr)\n# creation of a buffer zone around the centroid of the municipality of Dangkao district\n# using the pipe\nzone <- st_geometry(dangkao) %>%\n st_centroid() %>%\n st_buffer(30000)\nplot(st_geometry(district))\nplot(zone, border = \"#F06292\", lwd = 2, add = TRUE)\n\n\n\ndist_z <- st_intersection(x = district, y = zone)\nplot(st_geometry(district))\nplot(st_geometry(dist_z), col=\"#AF7AC5\", border=\"#F9E79F\", add=T)\n\n\n\nplot(st_geometry(dist_z))\n\n\n\n\n\n\n3.6.6 Create regular grid\nThe function st_make_grid() allows you to create regular grid. The function produce and object sfc, you must then use the function st_sf() to transform the object sfc into and object sf. During this transformation we add here a column of unique identifiers.\n\ngrid <- st_make_grid(x = district, cellsize = 10000)\ngrid <- st_sf(ID = 1:length(grid), geom = grid)\n\nplot(st_geometry(grid), col = \"grey\", border = \"white\")\nplot(st_geometry(district), border = \"grey50\", add = TRUE)\n\n\n\n\n\n\n3.6.7 Counting points in a polygon (in a grid tile)\n\n# selection of grid tiles that intersect the district\n\ninter <- st_intersects(grid, cambodia_dist, sparse = FALSE)\ngrid <- grid[inter, ]\n\ncase_cambodia <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"cases\" , quiet = TRUE)\nplot(st_geometry(grid), col = \"grey\", border = \"white\")\nplot(st_geometry(case_cambodia), pch = 20, col = \"red\", add = TRUE, cex = 0.8)\n\n\n\ninter <- st_intersects(grid, case_cambodia, sparse = TRUE)\nlength(inter)\n\n[1] 1964\n\n\nHere we use the argument sparse = TRUE. The inter object is a list the length of the grid and each item in the list contain the index of the object items of cases and grid intersection.\nFor example grid tile 35th intersect with four cases 97, 138, 189, 522, 624, 696\n\ninter[35]\n\n[[1]]\n[1] 97 138 189 522 624 696\n\nplot(st_geometry(grid[35, ]))\nplot(st_geometry(case_cambodia), add = T)\nplot(st_geometry(case_cambodia[c(97, 138, 189, 522, 624, 696), ]), \n col = \"red\", pch = 19, add = TRUE)\n\n\n\n\nTo count number of case, simply go to the list and report length of the elements.\n\ngrid$nb_case <- sapply(X = inter, FUN = length) # create 'nb_case' column to store number of health centers in each grid tile \nplot(grid[\"nb_case\"])\n\n\n\n\n\n\n3.6.8 Aggregate point values into polygons\nIn this example we import a csv file that contain data from a population grid. Once import we transform it data.frame into an object sf.\nThe objective is to aggregate the values id these points (the population contained in the “DENs” field) in the municipalities of the district.\n\npp_pop_raw <- read.csv(\"data_cambodia/pp_pop_dens.csv\") # import file\npp_pop_raw$id <- 1:nrow(pp_pop_raw) # adding a unique identifier\npp_pop <- st_as_sf(pp_pop_raw, coords = c(\"X\", \"Y\"), crs = 32648) # Transform into object sf\npp_pop <- st_transform(pp_pop, st_crs(district)) # Transform projection\ninter <- st_intersection(pp_pop, district) # Intersection\ninter\n\nSimple feature collection with 1295 features and 12 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: 469177.5 ymin: 1263090 xmax: 505177.5 ymax: 1297090\nProjected CRS: WGS 84 / UTM zone 48N\nFirst 10 features:\n DENs id ADM2_EN ADM2_PCODE ADM1_EN ADM1_PCODE Male Female T_POP\n149 NA 149 Angk Snuol KH0808 Kandal KH08 45436 47141 92577\n150 NA 150 Angk Snuol KH0808 Kandal KH08 45436 47141 92577\n151 NA 151 Angk Snuol KH0808 Kandal KH08 45436 47141 92577\n186 NA 186 Angk Snuol KH0808 Kandal KH08 45436 47141 92577\n187 NA 187 Angk Snuol KH0808 Kandal KH08 45436 47141 92577\n188 NA 188 Angk Snuol KH0808 Kandal KH08 45436 47141 92577\n223 NA 223 Angk Snuol KH0808 Kandal KH08 45436 47141 92577\n224 NA 224 Angk Snuol KH0808 Kandal KH08 45436 47141 92577\n225 NA 225 Angk Snuol KH0808 Kandal KH08 45436 47141 92577\n226 3.400075 226 Angk Snuol KH0808 Kandal KH08 45436 47141 92577\n Area.Km2. Status DENs.1 geometry\n149 183.905 <4500km2 503.3958 POINT (469177.5 1267090)\n150 183.905 <4500km2 503.3958 POINT (470177.5 1267090)\n151 183.905 <4500km2 503.3958 POINT (471177.5 1267090)\n186 183.905 <4500km2 503.3958 POINT (469177.5 1268090)\n187 183.905 <4500km2 503.3958 POINT (470177.5 1268090)\n188 183.905 <4500km2 503.3958 POINT (471177.5 1268090)\n223 183.905 <4500km2 503.3958 POINT (469177.5 1269090)\n224 183.905 <4500km2 503.3958 POINT (470177.5 1269090)\n225 183.905 <4500km2 503.3958 POINT (471177.5 1269090)\n226 183.905 <4500km2 503.3958 POINT (472177.5 1269090)\n\n\nBy using the function st_intersection() we add to each point of the grid all the information on the municipality in which it is located.\nWe can then use the function aggregate() to aggregate the population by municipalities.\n\nresultat <- aggregate(x = list(pop_from_grid = inter$DENs), \n by = list(ADM2_EN = inter$ADM2_EN), \n FUN = \"sum\")\nhead(resultat)\n\n ADM2_EN pop_from_grid\n1 Angk Snuol NA\n2 Chamkar Mon 10492.7159\n3 Chbar Ampov 1593.9593\n4 Chraoy Chongvar 1434.1785\n5 Dangkao 942.3595\n6 Doun Penh 10781.8026\n\n\nWe can then create a new object with this result.\n\ndist_result <- merge(district, resultat, by = \"ADM2_EN\", all.x = TRUE)\ndist_result\n\nSimple feature collection with 197 features and 11 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: 211534.7 ymin: 1149105 xmax: 784612.1 ymax: 1625495\nProjected CRS: WGS 84 / UTM zone 48N\nFirst 10 features:\n ADM2_EN ADM2_PCODE ADM1_EN ADM1_PCODE Male Female T_POP\n1 Aek Phnum KH0205 Battambang KH02 41500 43916 85416\n2 Andoung Meas KH1601 Ratanak Kiri KH16 7336 7372 14708\n3 Angk Snuol KH0808 Kandal KH08 45436 47141 92577\n4 Angkor Borei KH2101 Takeo KH21 26306 27168 53474\n5 Angkor Chey KH0701 Kampot KH07 42448 44865 87313\n6 Angkor Chum KH1701 Siemreap KH17 34269 34576 68845\n7 Angkor Thum KH1702 Siemreap KH17 13802 14392 28194\n8 Anlong Veaeng KH2201 Oddar Meanchey KH22 24122 23288 47410\n9 Aoral KH0504 Kampong Speu KH05 19874 19956 39830\n10 Ba Phnum KH1401 Prey Veng KH14 46562 49852 96414\n Area.Km2. Status DENs pop_from_grid geometry\n1 1067.8638 <4500km2 79.98773 NA MULTIPOLYGON (((306568.1 14...\n2 837.7064 <4500km2 17.55747 NA MULTIPOLYGON (((751459.2 15...\n3 183.9050 <4500km2 503.39580 NA MULTIPOLYGON (((471954.3 12...\n4 301.0502 <4500km2 177.62485 NA MULTIPOLYGON (((490048.2 12...\n5 316.7576 <4500km2 275.64610 NA MULTIPOLYGON (((462702.2 12...\n6 478.6988 <4500km2 143.81696 NA MULTIPOLYGON (((363642.5 15...\n7 357.8890 <4500km2 78.77862 NA MULTIPOLYGON (((376584.4 15...\n8 1533.5702 <4500km2 30.91479 NA MULTIPOLYGON (((404936.4 15...\n9 2381.7084 <4500km2 16.72329 NA MULTIPOLYGON (((414000.6 13...\n10 342.3439 <4500km2 281.62910 NA MULTIPOLYGON (((545045.4 12..."
},
{
"objectID": "03-vector_data.html#measurements",
"href": "03-vector_data.html#measurements",
"title": "3 Using vector data",
"section": "3.7 Measurements",
"text": "3.7 Measurements\n\n3.7.1 Create a distance matrix\nIf the dataset’s projection system is specified, the distance are expressed in the projection measurement unit (most often in meter)\n\nmat <- st_distance(x = dist_c, y = dist_c)\nmat[1:5,1:5]\n\nUnits: [m]\n [,1] [,2] [,3] [,4] [,5]\n[1,] 0.0 425993.7 232592.12 298254.12 299106.92\n[2,] 425993.7 0.0 386367.88 414428.82 452431.87\n[3,] 232592.1 386367.9 0.00 67060.05 82853.88\n[4,] 298254.1 414428.8 67060.05 0.00 40553.15\n[5,] 299106.9 452431.9 82853.88 40553.15 0.00\n\n\n\n\n3.7.2 Calculate routes\n The package osrm (R-osrm?) acts as an interface R and the OSRM (luxen-vetter-2011?). This package allows to calculate time and distance matrices, road routes, isochrones. The package uses the OSRM demo server by default. In case of intensive use it is strongly recommended to use your own instance of OSRM (with Docker).\n\n3.7.2.1 Calculate a route\nThe fonction osrmRoute() allows you to calculate routes.\n\nlibrary(sf)\nlibrary(osrm)\nlibrary(maptiles)\ndistrict <- st_read(\"data_cambodia/cambodia.gpkg\",layer = \"district\", quiet = TRUE)\ndistrict <- st_transform(district, 32648)\n\nodongk <- district[district$ADM2_PCODE == \"KH0505\", ] # Itinerary between Odongk district and Toul Kouk\ntakmau <- district[district$ADM2_PCODE == \"KH0811\",]\nroute <- osrmRoute(src = odongk, \n dst = takmau, \n returnclass = \"sf\")\nosm <- get_tiles(route, crop = TRUE)\nplot_tiles(osm)\nplot(st_geometry(route), col = \"#b23a5f\", lwd = 6, add = T)\nplot(st_geometry(route), col = \"#eee0e5\", lwd = 1, add = T)\n\n\n\n\n\n\n3.7.2.2 Calculation of a time matrix\nThe function osrmTable() makes it possible to calculate matrices of distances or times by road.\nIn this example we calculate a time matrix between 2 addresses and health centers in Phnom Penh on foot.\n\nlibrary(sf)\nlibrary(tidygeocoder)\nhospital <- st_read(\"data_cambodia/cambodia.gpkg\",layer= \"hospital\", quiet = TRUE)\n\nhospital_pp <- hospital[hospital$PCODE == \"12\", ] # Selection of health centers in Phnom Penh\n\nadresses <- data.frame(adr = c(\"Royal Palace Park, Phnom Penh Phnom, Cambodia\",\n \"Wat Phnom Daun Penh, Phnom Penh, Cambodia\")) # Geocoding of 2 addresses in Phnom Penh\n\nplaces <- tidygeocoder::geocode(.tbl = adresses,address = adr)\nplaces\n\n# A tibble: 2 × 3\n adr lat long\n <chr> <dbl> <dbl>\n1 Royal Palace Park, Phnom Penh Phnom, Cambodia 11.6 105.\n2 Wat Phnom Daun Penh, Phnom Penh, Cambodia 11.6 105.\n\n# Calculation of the distance matrix between the 2 addresses and the health center in Phnom Penh\n\ncal_mat <- osrmTable(src = places[,c(1,3,2)], \n dst = hospital_pp, \n osrm.profile = \"foot\")\n\ncal_mat$durations[1:2, 1:5]\n\n 684 685 686 687 691\nRoyal Palace Park, Phnom Penh Phnom, Cambodia 55.9 71.6 64.4 40.2 76.7\nWat Phnom Daun Penh, Phnom Penh, Cambodia 60.1 80.4 40.1 32.8 53.1\n\n# Which address has better accessibility to health center in Phnom Penh?\n\nboxplot(t(cal_mat$durations[,]), cex.axis = 0.7)"
},
{
"objectID": "04-raster_data.html",
"href": "04-raster_data.html",
"title": "4 Using raster data",
"section": "",
"text": "This chapter is largely inspired by two presentation; Madelin (2021) and Nowosad (2021); carried out as part of the SIGR2021 thematic school."
},
{
"objectID": "04-raster_data.html#format-of-objects-spatraster",
"href": "04-raster_data.html#format-of-objects-spatraster",
"title": "4 Using raster data",
"section": "4.1 Format of objects SpatRaster",
"text": "4.1 Format of objects SpatRaster\nThe package terra (Hijmans 2022) allows to handle vector and raster data. To manipulate this spatial data, terra store it in object of type SpatVector and SpatRaster. In this chapter, we focus on the manipulation of raster data (SpatRaster) from functions offered by this package.\nAn object SpatRaster allows to handle vector and raster data, in one or more layers (variables). This object also stores a number of fundamental parameters that describe it (number of columns, rows, spatial extent, coordinate reference system, etc.).\n\n\n\nSource : (Racine 2016)"
},
{
"objectID": "04-raster_data.html#importing-and-exporting-data",
"href": "04-raster_data.html#importing-and-exporting-data",
"title": "4 Using raster data",
"section": "4.2 Importing and exporting data",
"text": "4.2 Importing and exporting data\nThe package terra allows importing and exporting raster files. It is based on the GDAL library which makes it possible to read and process a very large number of geographic image formats.\n\nlibrary(terra)\n\nThe function rast() allows you to create and/or import raster data. The following lines import the raster file elevation.tif (Tagged Image File Format) into an object of type SpatRaster (default).\n\nelevation <- rast(\"data_cambodia/elevation.tif\") \nelevation\n\nclass : SpatRaster \ndimensions : 5235, 6458, 1 (nrow, ncol, nlyr)\nresolution : 0.0008333394, 0.0008332568 (x, y)\nextent : 102.2935, 107.6752, 10.33984, 14.70194 (xmin, xmax, ymin, ymax)\ncoord. ref. : lon/lat WGS 84 (EPSG:4326) \nsource : elevation.tif \nname : elevation \n\n\nModifying the name of the stored variable (altitude).\n\nnames(elevation) <- \"Altitude\" \n\nThe function writeRaster() allow you to save an object SpatRaster on your machine, in the format of your choice.\n\nwriteRaster(x = elevation, filename = \"data_cambodia/new_elevation.tif\")"
},
{
"objectID": "04-raster_data.html#displaying-a-spatraster-object",
"href": "04-raster_data.html#displaying-a-spatraster-object",
"title": "4 Using raster data",
"section": "4.3 Displaying a SpatRaster object",
"text": "4.3 Displaying a SpatRaster object\nThe function plot() is use to display an object SpatRaster.\n\nplot(elevation)\n\n\n\n\n\n\n\n\nA raster always contains numerical data, but it can be both quantitative data and numerically coded qualitative (categorical) data (ex: type of land cover).\nSpecify the type of data stored with the augment type (type = \"continuous\" default), to display them correctly.\nImport and display of raster containing categorical data: Phnom Penh Land Cover 2019 (land cover types) with a resolution of 1.5 meters:\n\nlulc_2019 <- rast(\"data_cambodia/lulc_2019.tif\") #Import Phnom Penh landcover 2019, landcover types\n\nThe landcover data was produced from SPOT7 satellite image with 1.5 meter spatial resolution. An extraction centered on the municipality of Phnom Penh was then carried out.\n\nplot(lulc_2019, type = \"classes\")\n\n\n\n\n\n\n\n\nTo display the actual tiles of landcover types you can proceed as follows.\n\nclass_name <- c(\n \"Roads\",\n \"Built-up areas\",\n \"Water Bodies and rivers\",\n \"Wetlands\",\n \"Dry bare area\",\n \"Bare crop fields\",\n \"Low vegetation areas\",\n \"High vegetation areas\",\n \"Forested areas\")\n\nclass_color <- c(\"#070401\", \"#c84639\", \"#1398eb\",\"#8bc2c2\",\n \"#dc7b34\", \"#a6bd5f\",\"#e8e8e8\", \"#4fb040\", \"#35741f\")\nplot(lulc_2019,\n type = \"classes\",\n levels = class_name,\n col = class_color,\n plg = list(cex = 0.7),\n mar = c(3.1, 3.1, 2.1, 10) #The margin are (bottom, left, top, right) respectively\n )"
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
},
{
"objectID": "04-raster_data.html#change-to-the-study-area",
"href": "04-raster_data.html#change-to-the-study-area",
"title": "4 Using raster data",
"section": "4.4 Change to the study area",
"text": "4.4 Change to the study area\n\n4.4.1 (Re)projections\nTo modify the projection system of a raster, use the function project(). It is then necessary to indicate the method for estimating the new cell values.\n\n\n\nSource : Centre Canadien de Télédétection\n\n\nFour interpolation methods are available:\n\nnear : nearest neighbor, fast and default method for qualitative data;\n\nbilinear : bilinear interpolation. Default method for quantitative data;\n\ncubic : cubic interpolation;\n\ncubicspline : cubic spline interpolation.\n\n\n# Re-project data \n\nelevation_utm = project(x = elevation, y = \"EPSG:32648\", method = \"bilinear\") #from WGS84(EPSG:4326) to UTM zone48N(EPSG:32648) \nlulc_2019_utm = project(x = lulc_2019, y = \"EPSG:32648\", method = \"near\") #keep original projection: UTM zone48N(EPSG:32648)\n\n\n\n\n\n\n\n\n\n\n\n\n4.4.2 Crop\nClipping a raster to the extent of another object SpatVector or SpatRaster is achievable with the crop().\n\n\n\n\n\n\n\n\n\n\n\nSource : (Racine 2016)\n\n\n\nImport vector data of (municipal divisions) using function vect. This data will be stored in an SpatVector object.\n\ndistrict <- vect(\"data_cambodia/cambodia.gpkg\", layer=\"district\")\n\nExtraction of district boundaries of Thma Bang district (ADM2_PCODE : KH0907).\n\nthma_bang <- subset(district, district$ADM2_PCODE == \"KH0907\") \n\nUsing the function crop(), Both data layers must be in the same projection.\n\ncrop_thma_bang <- crop(elevation_utm, thma_bang)\n\nplot(crop_thma_bang)\nplot(thma_bang, add=TRUE)\n\n\n\n\n\n\n\n\n\n\n4.4.3 Mask\nTo display only the values of a raster contained in a polygon, use the function mask().\n\n\n\nSource : (Racine 2016)\n\n\nCreation of a mask on the crop_thma_bang raster to the municipal limits (polygon) of Thma Bang district.\n\nmask_thma_bang <- mask(crop_thma_bang, thma_bang)\n\nplot(mask_thma_bang)\nplot(thma_bang, add = TRUE)\n\n\n\n\n\n\n\n\n\n\n4.4.4 Aggregation and disaggregation\nResampling a raster to a different resolution is done in two steps.\n\n\n\n\n\n\n1\n\n\n\n\n\n\n\n2\n\n\n\n\n\n\n\n3\n\n\n\n\n\n\nSource : (Racine 2016)\n\n\n\nDisplay the resolution of a raster with the function res().\n\nres(elevation_utm) #check cell size\n\n[1] 91.19475 91.19475\n\n\nCreate a grid with the same extent, then decrease the spatial resolution (larger cells).\n\nelevation_LowerGrid <- elevation_utm\n# elevation_HigherGrid <- elevation_utm\n\nres(elevation_LowerGrid) <- 1000 #cells size = 1000 meter\n# res(elevation_HigherGrid) <- 10 #cells size = 10 meter\n\nelevation_LowerGrid\n\nclass : SpatRaster \ndimensions : 484, 589, 1 (nrow, ncol, nlyr)\nresolution : 1000, 1000 (x, y)\nextent : 203586.3, 792586.3, 1142954, 1626954 (xmin, xmax, ymin, ymax)\ncoord. ref. : WGS 84 / UTM zone 48N (EPSG:32648) \n\n\nThe function resample() allows to resample the atarting values in the new spatial resolution. Several resampling methods are available (cf. partie 5.4.1).\n\nelevation_LowerGrid <- resample(elevation_utm, \n elevation_LowerGrid, \n method = \"bilinear\") \n\nplot(elevation_LowerGrid, \n main=\"Cell size = 1000m\\nBilinear resampling method\")\n\n\n\n\n\n\n\n\n\n\n4.4.5 Raster fusion\nMerge multiple objects SpatRaster into one with merge() or mosaic().\n\n\n\nSource : https://desktop.arcgis.com/fr/arcmap/10.3/manage-data/raster-and-images/what-is-a-mosaic.htm\n\n\nAfter cutting the elevation raster by the municipal boundary of Thma Bang district (cf partie 5.4.2), we do the same thing for the neighboring municipality of Phnum Kravanh district.\n\nphnum_kravanh <- subset(district, district$ADM2_PCODE == \"KH1504\") # Extraction of the municipal boundaries of Phnum Kravanh district\n\ncrop_phnum_kravanh <- crop(elevation_utm, phnum_kravanh) #clipping the elevation raster according to district boundaries\n\nThe crop_thma_bang and crop_phnum_kravanh elevation raster overlap spatially:\n\n\n\n\n\n\n\n\n\nThe difference between the functions merge() and mosiac() relates to values of the overlapping cells. The function mosaic() calculate the average value while merge() holding the value of the object SpaRaster called n the function.\n\n#in this example, merge() and mosaic() give the same result\nmerge_raster <- merge(crop_thma_bang, crop_phnum_kravanh) \nmosaic_raster <- mosaic(crop_thma_bang, crop_phnum_kravanh)\n\nplot(merge_raster)\n\n\n\n\n\n\n\n# plot(mosaic_raster)\n\n\n\n4.4.6 Segregate\nDecompose a raster by value (or modality) into different rasterlayers with the function segregate.\n\nlulc_2019_class <- segregate(lulc_2019, keep=TRUE, other=NA) #creating a raster layer by modality\nplot(lulc_2019_class)"
},
{
"objectID": "04-raster_data.html#map-algebra",
"href": "04-raster_data.html#map-algebra",
"title": "4 Using raster data",
"section": "4.5 Map Algebra",
"text": "4.5 Map Algebra\nMap algebra is classified into four groups of operation (Tomlin 1990):\n\nLocal : operation by cell, on one or more layers;\n\nFocal : neighborhood operation (surrounding cells);\n\nZonal : to summarize the matrix values for certain zones, usually irregular;\nGlobal : to summarize the matrix values of one or more matrices.\n\n\n\n\nSource : (Li 2009)\n\n\n\n4.5.1 Local operations\n\n\n\nSource : (Mennis 2015)\n\n\n\n4.5.1.1 Value replacement\n\nelevation_utm[elevation_utm[[1]]== -9999] <- NA #replaces -9999 values with NA\n\nelevation_utm[elevation_utm < 1500] <- NA #Replace values < 1500 with NA\n\n\nelevation_utm[is.na(elevation_utm)] <- 0 #replace NA values with 0\n\n\n\n4.5.1.2 Operation on each cell\n\nelevation_1000 <- elevation_utm + 1000 # Adding 1000 to the value of each cell\n\nelevation_median <- elevation_utm - global(elevation_utm, median)[[1]] # Removed median elevation to each cell's value\n\n\n\n\n\n\n\n\n\n\n\n\n4.5.1.3 Reclassification\nReclassifying raster values can be used to discretize quantitative data as well as to categorize qualitative categories.\n\nreclassif <- matrix(c(1, 2, 1, \n 2, 4, 2,\n 4, 6, 3,\n 6, 9, 4), \n ncol = 3, byrow = TRUE)\n\nValues between 1 and 2 will be replaced by the value 1.\nValues between 3 and 4 will be replaced by the value 2.\nValues between 5 and 6 will be replaced by the value 3. Values between 7 and 9 will be replaced by the value 4.\n…\n\nreclassif\n\n [,1] [,2] [,3]\n[1,] 1 2 1\n[2,] 2 4 2\n[3,] 4 6 3\n[4,] 6 9 4\n\n\nThe function classify() allows you to perform the reclassification.\n\nlulc_2019_reclass <- classify(lulc_2019, rcl = reclassif)\nplot(lulc_2019_reclass, type =\"classes\")\n\n\n\n\nDisplay with the official titles and colors of the different categories.\n\nplot(lulc_2019_reclass, \n type =\"classes\", \n levels=c(\"Urban areas\",\n \"Water body\",\n \"Bare areas\",\n \"Vegetation areas\"),\n col=c(\"#E6004D\",\n \"#00BFFF\",\n \"#D3D3D3\", \n \"#32CD32\"),\n mar=c(3, 1.5, 1, 11))\n\n\n\n\n\n\n\n\n\n\n4.5.1.4 Operation on several layers (ex: NDVI)\nIt is possible to calculate the value of a cell from its values stored in different layers of an object SpatRaster.\nPerhaps the most common example is the calculation of the Normalized Vegetation Index (NDVI). For each cell, a value is calculated from two layers of raster from a multispectral satellite image.\n\n# Import d'une image satellite multispectrale\nsentinel2a <- rast(\"data_cambodia/Sentinel2A.tif\")\n\nThis multispectral satellite image (10m resolution) dated 25/02/2020, was produced by Sentinel-2 satellite and was retrieved from plateforme Copernicus Open Access Hub. An extraction of Red and near infrared spectral bands, centered on the Phnom Penh city, was then carried out.\n\nplot(sentinel2a)\n\n\n\n\n\n\n\n\nTo lighten the code, we assign the two matrix layers in different SpatRaster objects.\n\nB04_Red <- sentinel2a[[1]] #spectral band Red\n\nB08_NIR <-sentinel2a[[2]] #spectral band near infrared\n\nFrom these two raster objects , we can calculate the normalized vegetation index:\n\\[{NDVI}=\\frac{\\mathrm{NIR} - \\mathrm{Red}} {\\mathrm{NIR} + \\mathrm{Red}}\\]\n\nraster_NDVI <- (B08_NIR - B04_Red ) / (B08_NIR + B04_Red )\n\nplot(raster_NDVI)\n\n\n\n\n\n\n\n\nThe higher the values (close to 1), the denser the vegetation.\n\n\n\n4.5.2 Focal operations\n\n\n\nSource : (Mennis 2015)\n\n\nFocal analysis conisders a cell plus its direct neighbors in contiguous and symmetrical (neighborhood operations). Most often, the value of the output cell is the result of a block of 3 x 3 (odd number) input cells.\nThe first step is to build a matrix that determines the block of cells that will be considered around each cell.\n\n# 5 x 5 matrix, where each cell has the same weight\nmon_focal <- matrix(1, nrow = 5, ncol = 5)\nmon_focal\n\n [,1] [,2] [,3] [,4] [,5]\n[1,] 1 1 1 1 1\n[2,] 1 1 1 1 1\n[3,] 1 1 1 1 1\n[4,] 1 1 1 1 1\n[5,] 1 1 1 1 1\n\n\nThe function focal() Then allows you to perform the desired analysis. For example: calculating the average of the values of all contiguous cells, for each cell in the raster.\n\nelevation_LowerGrid_mean <- focal(elevation_LowerGrid, \n w = mon_focal, \n fun = mean)\n\n\n\n\n\n\n\n\n\n\n\n4.5.2.1 Focal operations for elevation rasters\nThe function terrain() allows to perform focal analyzes specific to elevation rasters. Six operations are available:\n\nslope = calculation of the slope or degree of inclination of the surface;\n\naspect = calculate slope orientation;\n\nroughness = calculate of the variability or irregularity of the elevation;\n\nTPI = calculation of the index of topgraphic positions;\n\nTRI = elevation variability index calculation;\n\nflowdir = calculation of the water flow direction.\n\nExample with calculation of slopes(slope).\n\n#slope calculation\nslope <- terrain(elevation_utm, \"slope\", \n neighbors = 8, #8 (or 4) cells around taken into account\n unit = \"degrees\") #Output unit\n\nplot(slope) #Inclination of the slopes, in degrees\n\n\n\n\n\n\n\n\n\n\n\n4.5.3 Global operations\n\n\n\nSource : https://gisgeography.com/map-algebra-global-zonal-focal-local\n\n\nGlobal operation are used to summarize the matrix values of one or more matrices.\n\nglobal(elevation_utm, fun = \"mean\") #average values\n\n mean\nAltitude 80.01082\n\n\n\nglobal(elevation_utm, fun = \"sd\") #standard deviation\n\n sd\nAltitude 155.885\n\n\n\nfreq(lulc_2019_reclass) #frequency\n\n layer value count\n1 1 1 47485325\n2 1 2 13656289\n3 1 3 14880961\n4 1 4 37194979\n\ntable(lulc_2019_reclass[]) #contingency table\n\n\n 1 2 3 4 \n47485325 13656289 14880961 37194979 \n\n\nStatistical representations that summarize matrix information.\n\nhist(elevation_utm) #histogram\n\nWarning: [hist] a sample of3% of the cells was used\n\n\n\n\n\n\n\n\ndensity(elevation_utm) #density\n\n\n\n\n\n\n\n\n\n\n4.5.4 Zonal operation\n\n\n\nSource : (Mennis 2015)\n\n\nThe zonal operation make it possible to summarize the matrix values of certain zones (group of contiguous cells in space or in value).\n\n4.5.4.1 Zonal operation on an extraction\nAll global operations can be performed on an extraction of cells resulting from the functions crop(), mask(), segregate()…\nExample: average elevation for the city of Thma Bang district (cf partie 5.4.3).\n\n# Average value of the \"mask\" raster over Thma Bang district\nglobal(mask_thma_bang, fun = \"mean\", na.rm=TRUE)\n\n mean\nAltitude 584.7703\n\n\n\n\n4.5.4.2 Zonal operation from a vector layer\nThe function extract() allows you to extract and manipulate the values of cells that intersect vector data.\nExample from polygons:\n\n# Average elevation for each polygon (district)?\nelevation_by_dist <- extract(elevation_LowerGrid, district, fun=mean)\nhead(elevation_by_dist, 10)\n\n ID Altitude\n1 1 8.953352\n2 2 196.422240\n3 3 23.453937\n4 4 3.973118\n5 5 29.545801\n6 6 41.579593\n7 7 50.162749\n8 8 85.128777\n9 9 269.068091\n10 10 8.439041\n\n\n\n\n4.5.4.3 Zonal operation from raster\nZonal operation can be performed by area bounded by the categorical values of a second raster. For this, the two raster must have exaclty the same extent and the same resolution.\n\n#create a second raster with same resolution and extent as \"elevation_clip\"\nelevation_clip <- rast(\"data_cambodia/elevation_clip.tif\")\nelevation_clip_utm <- project(x = elevation_clip, y = \"EPSG:32648\", method = \"bilinear\")\nsecond_raster_CLC <- rast(elevation_clip_utm)\n\n#resampling of lulc_2019_reclass \nsecond_raster_CLC <- resample(lulc_2019_reclass, second_raster_CLC, method = \"near\") \n \n#added a variable name for the second raster\nnames(second_raster_CLC) <- \"lulc_2019_reclass_resample\"\n\n\n\n\n\n\n\n\n\n\nCalculation of the average elevation for the different areas of the second raster.\n\n#average elevation for each area of the \"second_raster\"\nzonal(elevation_clip_utm, second_raster_CLC , \"mean\", na.rm=TRUE)\n\n lulc_2019_reclass_resample elevation_clip\n1 1 12.83846\n2 2 8.31809\n3 3 11.41178\n4 4 11.93546"
},
{
"objectID": "04-raster_data.html#transformation-and-conversion",
"href": "04-raster_data.html#transformation-and-conversion",
"title": "4 Using raster data",
"section": "4.6 Transformation and conversion",
"text": "4.6 Transformation and conversion\n\n4.6.1 Rasterization\nConvert polygons to raster format.\n\nchamkarmon = subset(district, district$ADM2_PCODE ==\"KH1201\") \nraster_district <- rasterize(x = chamkarmon, y = elevation_clip_utm)\n\n\nplot(raster_district)\n\n\n\n\n\n\n\n\nConvert points to raster format\n\n#rasterization of the centroids of the municipalities\nraster_dist_centroid <- rasterize(x = centroids(district), \n y = elevation_clip_utm, fun=sum)\nplot(raster_dist_centroid, col = \"red\")\nplot(district, add =TRUE)\n\n\n\n\nConvert lines in raster format\n\n#rasterization of municipal boundaries\nraster_dist_line <- rasterize(x = as.lines(district), y = elevation_clip_utm, fun=sum)\n\n\nplot(raster_dist_line)\n\n\n\n\n\n\n4.6.2 Vectorisation\nTransform a raster to vector polygons.\n\npolygon_elevation <- as.polygons(elevation_clip_utm)\n\n\nplot(polygon_elevation, y = 1, border=\"white\")\n\n\n\n\nTransform a raster to vector points.\n\npoints_elevation <- as.points(elevation_clip_utm)\n\n\nplot(points_elevation, y = 1, cex = 0.3)\n\n\n\n\nTransform a raster into vector lines.\n\nlines_elevation <- as.lines(elevation_clip_utm)\n\n\nplot(lines_elevation)\n\n\n\n\n\n\n4.6.3 terra, raster, sf, stars…\nReference packages for manipulating spatial data all rely o their own object class. It is sometimes necessary to convert these objects from one class to another class to take advance of all the features offered by these different packages.\nConversion functions for raster data:\n\n\n\nFROM/TO\nraster\nterra\nstars\n\n\n\n\nraster\n\nrast()\nst_as_stars()\n\n\nterra\nraster()\n\nst_as_stars()\n\n\nstars\nraster()\nas(x, ‘Raster’) + rast()\n\n\n\n\nConversion functions for vector data:\n\n\n\nFROM/TO\nsf\nsp\nterra\n\n\n\n\nsf\n\nas(x, ‘Spatial’)\nvect()\n\n\nsp\nst_as_sf()\n\nvect()\n\n\nterra\nst_as_sf()\nas(x, ‘Spatial’)\n\n\n\n\n\n\n\n\nHijmans, Robert J. 2022. “Terra: Spatial Data Analysis.” https://CRAN.R-project.org/package=terra.\n\n\nLi, Xingong. 2009. “Map Algebra and Beyond : 1. Map Algebra for Scalar Fields.” https://slideplayer.com/slide/5822638/.\n\n\nMadelin, Malika. 2021. “Analyse d’images Raster (Et Télédétection).” https://mmadelin.github.io/sigr2021/SIGR2021_raster_MM.html.\n\n\nMennis, Jeremy. 2015. “Fundamentals of GIS : Raster Operations.” https://cupdf.com/document/gus-0262-fundamentals-of-gis-lecture-presentation-7-raster-operations-jeremy.html.\n\n\nNowosad, Jakub. 2021. “Image Processing and All Things Raster.” https://nowosad.github.io/SIGR2021/workshop2/workshop2.html.\n\n\nRacine, Etienne B. 2016. “The Visual Raster Cheat Sheet.” https://rpubs.com/etiennebr/visualraster.\n\n\nTomlin, C. Dana. 1990. Geographic Information Systems and Cartographic Modeling. Prentice Hall."
},
{
"objectID": "05-mapping_with_r.html",
"href": "05-mapping_with_r.html",
"title": "5 Mapping With R",
"section": "",
"text": "The fonction mf_map() is the central function of the package mapsf (Giraud 2022a). It makes it possible to carry out most of the usual representations in cartography. These main arguments are:\n\nx, an sf object ;\nvar, the name of variable to present ;\ntype, the type of presentation.\n\n\n\nThe following lines import the spatial information layers located in the geopackage cambodia.gpkg file.\n\nlibrary(sf)\n\n#Import Cambodia country border\ncountry = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"country\", quiet = TRUE)\n#Import provincial administrative border of Cambodia\neducation = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"education\", quiet = TRUE)\n#Import district administrative border of Cambodia\ndistrict = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"district\", quiet = TRUE)\n#Import roads data in Cambodia\nroad = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"road\", quiet = TRUE)\n#Import health center data in Cambodia\nhospital = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"hospital\", quiet = TRUE)\n\n\nlibrary(mapsf)\n\nmf_map(x = district, border = \"white\")\nmf_map(x = country,lwd = 2, col = NA, add = TRUE)\nmf_map(x = road, lwd = .5, col = \"ivory4\", add = TRUE)\nmf_map(x = hospital, pch = 20, cex = 1, col = \"#FE9A2E\", add = TRUE) \n\n\n\n\n\n\n\nProportional symbol maps are used to represent inventory variables (absolute quantitative variables, sum and average make sense). The function mf_map(..., type = \"prop\") proposes this representation.\n\n#District\nmf_map(x = district) \n\n# Proportional symbol \nmf_map(\n x = district, \n var = \"T_POP\",\n val_max = 700000,\n type = \"prop\",\n col = \"#148F77\", \n leg_title = \"Population 2019\"\n)\n\n# Title\nmf_title(\"Distribution of population in provincial level\")\n\n\n\n\n\n\nIt is possible to fix the dimensions of the largest symbol corresponding to a certain value with the arguments inches and val_max. We can use construct maps with comparable proportional symbols.\n\npar(mfrow = c(1,2)) #Displaying two maps facing each other\n\n#district\nmf_map(x = district, border = \"grey90\", lwd = .5) \n# Add male Population\nmf_map(\n x = district, \n var = \"Male\", \n type = \"prop\",\n col = \"#1F618D\",\n inches = 0.2, \n val_max = 300000, \n leg_title = \"Male\", \n leg_val_cex = 0.5,\n)\nmf_title(\"Male Population by Distict\") #Adding map title\n\n#district\nmf_map(x = district, border = \"grey90\", lwd = .5) \n# Add female Population\nmf_map(\n x = district, \n var = \"Female\", \n type = \"prop\",\n col = \"#E74C3C\",\n inches = 0.2, \n val_max = 300000, \n leg_title =\"Female\", \n leg_val_cex = 0.5\n)\nmf_title(\"Female Population by Distict\") #Adding map title\n\n\n\n\nHere we have displayed two maps facing each other, see the point Displaying several maps on the same figure for more details.\n\n\n\n\nChoropleth maps are used to represent ratio variables (relative quantitative variables, mean has meaning, sum has no meaning).\nFor this type of representation, you must first:\n\nchoose a discretization method to transform a continuous statistical series into classes defined by intervals,\nchoose a number of classes,\nchoose a color palette.\n\nThe function mf_map(…, type = “choro”)makes it possible to create choroplete maps. The arguments nbreaks and breaks are used to parameterize the discretizations, and the function mf_get_breaks() makes it possible to work on the discretizations outside the function mf_map(). Similarly, the argument palis used to fill in a color palette, but several functions can be used to set the palettes apart from the (mf_get_pal…) function.\n\n# Population density (inhabitants/km2) using the sf::st_area() function\ndistrict$DENS <- 1e6 * district$T_POP / as.numeric(st_area(district)) #Calculate population density \nmf_map(\n x = district,\n var = \"DENS\",\n type = \"choro\",\n breaks = \"quantile\",\n pal = \"BuGn\",\n lwd = 1,\n leg_title = \"Distribution of population\\n(inhabitants per km2)\", \n leg_val_rnd = 0\n)\nmf_title(\"Distribution of the population in (2019)\")\n\n\n\n\n\ncases = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"cases\", quiet = TRUE) # load cases layer\ncases = subset(cases, Disease == \"W fever\") # subset data to only keep W fever cases\npopulation <- read.csv(\"data_cambodia/khm_admpop_adm2_2016_v2.csv\") # read population data\npopulation <- population[, c(\"ADM2_PCODE\", \"T_TL\")] # just select few columns\npopulation$T_TL <- as.numeric(gsub(\",\",\"\",population$T_TL)) # Remove commas (not supposed to be in the dataframe)\ndistrict$cases <- lengths(st_intersects(district, cases)) # count points in polygons\ndistrict <- merge(district,\n population,\n by = \"ADM2_PCODE\") # merge shape with population data\ndistrict$incidence <- district$cases / district$T_TL * 100000 # calculate incidence\n\nmf_map(x = district,\n var = \"incidence\",\n type = \"choro\",\n leg_title = \"Incidence (per 100 000)\")\nmf_layout(title = \"Incidence of W Fever in Cambodia\")\n\n\n\n\n\n\nThe fonction mf_get_breaks() provides the methods of discretization of classic variables: quantiles, average/standard deviation, equal amplitudes, nested averages, Fisher-Jenks, geometric, etc.\n\neducation$enrol_g_pct = 100 * education$enrol_girl/education$t_enrol #Calculate percentage of enrolled girl student\n\nd1 = mf_get_breaks(education$enrol_g_pct, nbreaks = 6, breaks = \"equal\", freq = TRUE)\nd2 = mf_get_breaks(education$enrol_g_pct, nbreaks = 6, breaks = \"quantile\")\nd3 = mf_get_breaks(education$enrol_g_pct, nbreaks = 6, breaks = \"geom\")\nd4 = mf_get_breaks(education$enrol_g_pct, breaks = \"msd\", central = FALSE)\n\n\n\n\n\n\n\n\n\nThe argument pal de mf_map() is dedicated to choosing a color palette. The palettes provided by the function hcl.colors() can be used directly.\n\nmf_map(x = education, var = \"enrol_g_pct\", type = \"choro\",\n breaks = d3, pal = \"Reds 3\")\n\n\n\n\n\n\n\n\n\nThe fonction mf_get_pal() allows you to build a color palette. This function is especially useful for creating balanced asymmetrical diverging palettes.\n\nmypal <- mf_get_pal(n = c(4,6), palette = c(\"Burg\", \"Teal\"))\nimage(1:10, 1, as.matrix(1:10), col=mypal, xlab = \"\", ylab = \"\", xaxt = \"n\",\n yaxt = \"n\",bty = \"n\")\n\n\n\n\n\n\n\nIt is possible to use this mode of presentation in specific implementation also.\n\ndist_c <- st_centroid(district)\nmf_map(district)\nmf_map(\n x = dist_c,\n var = \"DENS\",\n type = \"choro\",\n breaks = \"quantile\",\n nbreaks = 5,\n pal = \"PuRd\",\n pch = 23,\n cex = 1.5,\n border = \"white\",\n lwd = .7,\n leg_pos = \"topleft\",\n leg_title = \"Distribution of population\\n(inhabitants per km2)\", \n leg_val_rnd = 0, \n add = TRUE\n)\nmf_title(\"Distribution of population in (2019)\")\n\n\n\n\n\n\n\n\nTypology maps are used to represent qualitative variables. The function mf_map(..., type = \"typo\") proposes this representation.\n\nmf_map(\n x = district,\n var=\"Status\",\n type = \"typo\",\n pal = c('#E8F9FD','#FF7396','#E4BAD4','#FFE3FE'),\n lwd = .7,\n leg_title = \"\"\n)\nmf_title(\"Administrative status by size of area\")\n\n\n\n\n\n\nThe argument val_order is used to order the categories in the\n\nmf_map(\n x = district,\n var=\"Status\",\n type = \"typo\",\n pal = c('#E8F9FD','#FF7396','#E4BAD4','#FFE3FE'),\n val_order = c(\"1st largest district\", \"2nd largest district\", \"3rd largest district\",\"<4500km2\"),\n lwd = .7,\n leg_title = \"\"\n)\nmf_title(\"Administrative status by size of area\")\n\n\n\n\n\n\n\nWhen the implantation of the layer is punctual, symbols are used to carry the colors of the typology.\n\n#extract centroid point of the district\ndist_ctr <- st_centroid(district[district$Status != \"<4500km2\", ])\nmf_map(district)\nmf_map(\n x = dist_ctr,\n var = \"Status\",\n type = \"typo\",\n cex = 2,\n pch = 22,\n pal = c('#FF7396','#E4BAD4','#FFE3FE'),\n leg_title = \"\",\n leg_pos = \"bottomright\",\n add = TRUE\n)\nmf_title(\"Administrative status by size of area\")\n\n\n\n\n\n\n\n\n#Selection of roads that intersect the city of Siem Reap\npp <- district[district$ADM1_EN == \"Phnom Penh\", ]\nroad_pp <- road[st_intersects(x = road, y = pp, sparse = FALSE), ]\nmf_map(pp)\nmf_map(\n x = road_pp,\n var = \"fclass\",\n type = \"typo\",\n lwd = 1.2,\n pal = mf_get_pal(n = 6, \"Tropic\"),\n leg_title = \"Types of road\",\n leg_pos = \"topright\",\n leg_frame = T,\n add = TRUE\n)\nmf_title(\"Administrative status\")\n\n\n\n\n\n\n\n\nThe function mf_map(..., var = c(\"var1\", \"var2\"), type = \"prop_choro\") represents proportional symbols whose areas are proportional to the values of one variable and whose color is based on the discretization of a second variable. The function uses the arguments of the functions mf_map(..., type = \"prop\") and mf_map(..., type = \"choro\").\n\nmf_map(x = district)\nmf_map(\n x = district,\n var = c(\"T_POP\", \"DENS\"),\n val_max = 500000,\n type = \"prop_choro\",\n border = \"grey60\",\n lwd = 0.5,\n leg_pos = c(\"bottomright\", \"bottomleft\"),\n leg_title = c(\"Population\", \"Density of\\n population\\n(inhabitants per km2)\"),\n breaks = \"q6\",\n pal = \"Blues 3\",\n leg_val_rnd = c(0,1))\nmf_title(\"Population\")\n\n\n\n\n\n\n\nThe function mf_map(..., var = c(\"var1\", \"var2\"), type = \"prop_typo\") represents proportional symbols whose areas are proportional to the values of one variable and whose color is based on the discretization of a second variable. The function uses the arguments of the mf_map(..., type = \"prop\") and function mf_map(..., type = \"typo\").\n\nmf_map(x = district)\nmf_map(\n x = district,\n var = c(\"Area.Km2.\", \"Status\"),\n type = \"prop_typo\",\n pal = c('#E8F9FD','#FF7396','#E4BAD4','#FFE3FE'),\n val_order = c(\"<4500km2\",\"1st largest district\", \"2nd largest district\", \"3rd largest district\"),\n leg_pos = c(\"bottomleft\",\"topleft\"),\n leg_title = c(\"Population\\n(2019)\",\n \"Statut administratif\"),\n)\nmf_title(\"Population\")"
},
{
"objectID": "05-mapping_with_r.html#layout",
"href": "05-mapping_with_r.html#layout",
"title": "5 Mapping With R",
"section": "5.2 Layout",
"text": "5.2 Layout\nTo be finalized, a thematic map must contain certain additional elements such as: title, author, source, scale, orientation…\n\n5.2.1 Example data\nThe following lines import the spatial information layers located in the geopackage cambodia.gpkg file.\n\nlibrary(sf)\ncountry = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"country\", quiet = TRUE) #Import Cambodia country border\neducation = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"education\", quiet = TRUE) #Import provincial administrative border of Cambodia\ndistrict = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"district\", quiet = TRUE) #Import district administrative border of Cambodia\nroad = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"road\", quiet = TRUE) #Import roads data in Cambodia\nhospital = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"hospital\", quiet = TRUE) #Import hospital data in Cambodia\ncases = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"cases\", quiet = TRUE) #Import example data of fever_cases in Cambodia\n\n\n\n5.2.2 Themes\nThe function mf_theme() defines a cartographic theme. Using a theme allows you to define several graphic parameters which are then applied to the maps created with mapsf. These parameters are: the map margins, the main color, the background color, the position and the aspect of the title. A theme can also be defined with the mf_init() and function mf_export().\n\n5.2.2.1 Use a predefined theme\nA series of predefined themes are available by default (see ?mf_theme).\n\nlibrary(mapsf)\n# use of a background color for the figure, to see the use of margin\nopar <- par(mfrow = c(2,2))\n# Using a predefined theme\nmf_theme(\"default\")\nmf_map(district)\nmf_title(\"Theme : 'default'\")\n\nmf_theme(\"darkula\")\nmf_map(district)\nmf_title(\"Theme : 'darkula'\")\n\nmf_theme(\"candy\")\nmf_map(district)\nmf_title(\"Theme : 'candy'\")\n\nmf_theme(\"nevermind\")\nmf_map(district)\nmf_title(\"Theme : 'nevermind'\")\npar(opar)\n\n\n\n\n\n\n5.2.2.2 Modify an existing theme\nIt is possible to modify an existing theme. In this example, we are using the “default” theme and modifying a few settings.\n\nlibrary(mapsf)\nopar <- par(mfrow = c(1,2))\nmf_theme(\"default\")\nmf_map(district)\nmf_title(\"default\")\n\nmf_theme(\"default\", tab = FALSE, font = 4, bg = \"grey60\", pos = \"center\")\nmf_map(district)\nmf_title(\"modified default\")\npar(opar)\n\n\n\n\n\n\n5.2.2.3 Create a theme\nIt is also possible to create a theme.\n\nmf_theme(\n bg = \"lightblue\", # background color\n fg = \"tomato1\", # main color\n mar = c(1,0,1.5,0), # margin\n tab = FALSE, # \"tab\" style for the title\n inner = FALSE, # title inside or outside of map area\n line = 1.5, # space dedicated to title\n pos = \"center\", # heading position\n cex = 1.5, # title size\n font = 2 # font types for title\n)\nmf_map(district)\nmf_title(\"New theme\")\n\n\n\n\n\n\n\n5.2.3 Titles\nThe function mf_title() adds a title to a map.\n\nmf_theme(\"default\")\nmf_map(district)\nmf_title(\"Map title\")\n\n\n\n\nIt is possible to customize the appearance of the title\n\nmf_map(district)\nmf_title(\n txt = \"Map title\", \n pos = \"center\", \n tab = FALSE, \n bg = \"tomato3\", \n fg = \"lightblue\", \n cex = 1.5, \n line = 1.7, \n font = 1, \n inner = FALSE\n)\n\n\n\n\n\n\n5.2.4 Arrow orientation\nThe function mf_arrow() allows you to choose the position and aspect of orientation arrow.\n\nmf_map(district)\nmf_arrow()\n\n\n\n\n\n\n5.2.5 Scale\nThe function mf_scale() allows you to choose the position and the aspect of the scale.\n\nmf_map(district)\nmf_scale(\n size = 60,\n lwd = 1,\n cex = 0.7\n)\n\n\n\n\n\n\n5.2.6 Credits\nThe function mf_credits() displays a line of credits (sources, author, etc.).\n\nmf_map(district)\nmf_credits(\"IRD\\nInstitut Pasteur du Cambodge, 2022\")\n\n\n\n\n\n\n5.2.7 Complete dressing\nThe function mf_layout() displays all these elements.\n\nmf_map(district)\nmf_layout(\n title = \"Cambodia\",\n credits = \"IRD\\nInstitut Pasteur du Cambodge, 2022\",\n arrow = TRUE\n)\n\n\n\n\n\n\n5.2.8 Annotations\n\nmf_map(district)\nmf_annotation(district[district$ADM2_EN == \"Bakan\",], txt = \"Bakan\", col_txt = \"darkred\", halo = TRUE, cex = 1.5)\n\n\n\n\n\n\n5.2.9 Legends\n\nmf_map(district)\nmf_legend(\n type = \"prop\", \n val = c(1000,500,200,10), \n inches = .2, \n title = \"Population\", \n pos = \"topleft\"\n)\nmf_legend(\n type = \"choro\", \n val = c(0,10,20,30,40),\n pal = \"Greens\", \n pos = \"bottomright\", \n val_rnd = 0\n)\n\n\n\n\n\n\n5.2.10 Labels\nThe function mf_label() is dedicated to displaying labels.\n\ndist_selected <- district[st_intersects(district, district[district$ADM2_EN == \"Bakan\", ], sparse = F), ]\n\nmf_map(dist_selected)\nmf_label(\n x = dist_selected,\n var = \"ADM2_EN\",\n col= \"darkgreen\",\n halo = TRUE,\n overlap = FALSE, \n lines = FALSE\n)\nmf_scale()\n\n\n\n\nThe argument halo = TRUE allows to display a slight halo around the labels and the argument overlap = FALSE allows to create non-overlapping labels.\n\n\n5.2.11 Center the map on a region\nThe function mf_init() allows you to initialize a map by centering it on a spatial object.\n\nmf_init(x = dist_selected)\nmf_map(district, add = TRUE)\nmf_map(dist_selected, col = NA, border = \"#29a3a3\", lwd = 2, add = TRUE)\n\n\n\n\n\n\n5.2.12 Displaying several maps on the same figure\nHere you have to use mfrow of the function par(). The first digit represents the number of of rows and second the number of columns.\n\n# define the figure layout (1 row, 2 columns)\npar(mfrow = c(1, 2))\n\n# first map\nmf_map(district)\nmf_map(district, \"Male\", \"prop\", val_max = 300000)\nmf_title(\"Population, male\")\n\n# second map\nmf_map(district)\nmf_map(district, \"Female\", \"prop\", val_max = 300000)\nmf_title(\"Population, female\")\n\n\n\n\n\n\n5.2.13 Exporting maps\nIt is quite difficult to export figures (maps or others) whose height/width ratio is satisfactory. The default ratio of figures in png format is 1 (480x480 pixels):\n\ndist_filter <- district[district$ADM2_PCODE == \"KH0808\", ]\npng(\"img/dist_filter_1.png\")\nmf_map(dist_filter)\nmf_title(\"Filtered district\")\ndev.off()\n\n\n\n\n\n\nOn this map a lot of space is lost to the left and right of the district.\nThe function mf_export() allows exports of maps whose height/width ratio is controlled and corresponds to that of a spatial object.\n\nmf_export(dist_filter, \"img/dist_filter_2.png\", width = 480)\nmf_map(dist_filter)\nmf_title(\"Filtered district\")\ndev.off()\n\n\n\n\n\n\nThe extent of this map is exactly that of the displayed region.\n\n\n5.2.14 Adding an image to a map\nThis can be useful for adding a logo, a pictograph. The function readPNG() of package png allows the additional images on the figure.\n\nmf_theme(\"default\", mar = c(0,0,0,0))\nlibrary(png)\n\nlogo <- readPNG(\"img/ird_logo.png\") #Import image\npp <- dim(logo)[2:1]*200 #Image dimension in map unit (width and height of the original image)\n\n#The upper left corner of the department's bounding box\nxy <- st_bbox(district)[c(1,4)]\nmf_map(district, col = \"#D1914D\", border = \"white\")\nrasterImage(\n image = logo,\n xleft = xy[1] ,\n ybottom = xy[2] - pp[2],\n xright = xy[1] + pp[1],\n ytop = xy[2]\n)\n\n\n\n\n\n\n5.2.15 Place an item precisely on the map\nThe function locator() allows clicking on the figure and obtaining the coordinate of a point in the coordinate system of the figure (of the map).\n\n# locator(1) # click to get coordinate on map\n# points(locator(1)) # click to plot point on map\n# text(locator(1), # click to place the item on map\n# labels =\"Located any texts on map\", \n# adj = c(0,0))\n\n\nVideo\nlocator()peut être utilisée sur la plupart des graphiques (pas ceux produits avec ggplot2).\n\n\n\n\n\n\nHow to interactively position legends and layout elements on a map with cartography\n\n\n\n\n\n5.2.16 Add shading to a layer\nThe function mf_shadow() allows to create a shadow to a layer of polygons.\n\nmf_shadow(district)\nmf_map(district, add=TRUE)\n\n\n\n\n\n\n5.2.17 Creating Boxes\nThe function mf_inset_on() allows to start creation a box. You must then “close” the box with mf_inset_off().\n\nmf_init(x = dist_selected, theme = \"agolalight\", expandBB = c(0,.1,0,.5)) \nmf_map(district, add = TRUE)\nmf_map(dist_selected, col = \"tomato4\", border = \"tomato1\", lwd = 2, add = TRUE)\n\n# Cambodia inset box\nmf_inset_on(x = country, pos = \"topright\", cex = .3)\nmf_map(country, lwd = .5, border= \"grey90\")\nmf_map(dist_selected, col = \"tomato4\", border = \"tomato1\", lwd = .5, add = TRUE)\nmf_scale(size = 100, pos = \"bottomleft\", cex = .6, lwd = .5)\nmf_inset_off()\n\n# District inset box\nmf_inset_on(x = district, pos = \"bottomright\", cex = .3)\nmf_map(district, lwd = 0.5, border= \"grey90\")\nmf_map(dist_selected, col = \"tomato4\", border = \"tomato1\", lwd = .5, add = TRUE)\nmf_scale(size = 100, pos = \"bottomright\", cex = .6, lwd = .5)\nmf_inset_off()\n\n# World inset box\nmf_inset_on(x = \"worldmap\", pos = \"topleft\")\nmf_worldmap(dist_selected, land_col = \"#cccccc\", border_col = NA, \n water_col = \"#e3e3e3\", col = \"tomato4\")\n\nmf_inset_off()\nmf_title(\"Bakan district and its surroundings\")\nmf_scale(10, pos = 'bottomleft')"
},
{
"objectID": "05-mapping_with_r.html#d-maps",
"href": "05-mapping_with_r.html#d-maps",
"title": "5 Mapping With R",
"section": "5.3 3D maps",
"text": "5.3 3D maps\n\n5.3.1 linemap\nThe package linemap (Giraud 2021) allows you to make maps made up of lines.\n\nlibrary(linemap)\nlibrary(mapsf)\nlibrary(sf)\nlibrary(dplyr)\n\npp = st_read(\"data_cambodia/PP.gpkg\", quiet = TRUE) # import Phnom Penh administrative border\npp_pop_dens <- getgrid(x = pp, cellsize =1000, var = \"DENs\") # create population density in grid format (pop density/1km)\n\nmf_init(pp)\n\nlinemap(\n x = pp_pop_dens, \n var = \"DENs\",\n k = 1,\n threshold = 5, \n lwd = 1,\n col = \"ivory1\",\n border = \"ivory4\",\n add = T)\n\nmf_title(\"Phnom Penh Population Density, 2019\")\nmf_credits(\"Humanitarian Data Exchange, 2022\\nunit data:km2\")\n\n\n\n# url = \"https://data.humdata.org/dataset/1803994d-6218-4b98-ac3a-30c7f85c6dbc/resource/f30b0f4b-1c40-45f3-986d-2820375ea8dd/download/health_facility.zip\"\n# health_facility.zip = \"health_facility.zip\"\n# download.file(url, destfile = health_facility.zip)\n# unzip(health_facility.zip) # Unzipped files are in a new folder named Health\n# list.files(path=\"Health\")\n\n\n\n5.3.2 Relief Tanaka\nWe use the tanaka package (Giraud 2022b) which provides a method (Tanaka 1950) used to improve the perception of relief.\n\nlibrary(tanaka)\nlibrary(terra)\n\nrpop <- rast(\"data_cambodia/khm_pd_2019_1km_utm.tif\") # Import population raster data (in UTM)\ndistrict = st_read(\"data_cambodia/cambodia.gpkg\", layer = \"district\", quiet = TRUE) # Import Cambodian districts layer\ndistrict <- st_transform(district, st_crs(rpop)) # Transform data into the same coordinate system\n\nmat <- focalMat(x = rpop, d = c(1500), type = \"Gauss\") # Raster smoothing\nrpopl <- focal(x = rpop, w = mat, fun = sum, na.rm = TRUE)\n\n# Mapping\ncols <- hcl.colors(8, \"Reds\", alpha = 1, rev = T)[-1]\nmf_theme(\"agolalight\")\nmf_init(district)\ntanaka(x = rpop, breaks = c(0,10,25,50,100,250,500,64265),\n col = cols, add = T, mask = district, legend.pos = \"n\")\nmf_legend(type = \"choro\", pos = \"bottomright\", \n val = c(0,10,25,50,100,250,500,64265), pal = cols,\n bg = \"#EDF4F5\", fg = NA, frame = T, val_rnd = 0,\n title = \"Population\\nper km2\")\nmf_title(\"Population density of Cambodia, 2019\")\nmf_credits(\"Humanitarian Data Exchange, 2022\",\n bg = \"#EDF4F5\")\n\n\n\n\n\n\n\n\n\n\nThe tanaka package"
},
{
"objectID": "05-mapping_with_r.html#cartographic-transformation",
"href": "05-mapping_with_r.html#cartographic-transformation",
"title": "5 Mapping With R",
"section": "5.4 Cartographic Transformation",
"text": "5.4 Cartographic Transformation\n\nclassical anamorphosis is a representation of States(or any cells) by rectangle or any polygons according to a quantities attached to them. (…) We strive to keep the general arrangement of meshes or the silhouette of the continent.”\nBrunet, Ferras, and Théry (1993)\n\n3 types of anamorphoses or cartograms are presented here:\n\nDorling’s cartograms (Dorling 1996)\nNon-contiguous cartograms (Olson 1976)\nContiguous cartograms (Dougenik, Chrisman, and Niemeyer 1985)\n\n\n\n\n\n\n\nA comprehensive course on anamorphoses : Les anamorphoses cartographiques (Lambert 2015).\n\n\n\n\n\n\n\n\n\nMake cartograms with R\n\n\n\nTo make the cartograms we use the package cartogram (Jeworutzki 2020).\n\n5.4.1 Dorling’s cartograms\nThe territories are represented by figures (circles, squares or rectangles) which do not overlap, the surface of which are proportional to a variable. The proportion of the figures are defined according to the starting positions.\n\n\n\n\n\n\n\n\nSpace is quite poorly identified.\nYou can name the circles to get your bearings and/or use the color to make clusters appear and better identify the geographical blocks.\n\n\n\n\n\nThe perception of quantities is very good. The circle sizes are really comarable.\n\n\n\nlibrary(mapsf)\nlibrary(cartogram)\ndistrict <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"district\" , quiet = TRUE)\ndist_dorling <- cartogram_dorling(x = district, weight = \"T_POP\", k = 0.7)\nmf_map(dist_dorling, col = \"#40E0D0\", border= \"white\")\nmf_label(\n x = dist_dorling[order(dist_dorling$T_POP, decreasing = TRUE), ][1:10,], \n var = \"ADM2_EN\",\n overlap = FALSE, \n # show.lines = FALSE,\n halo = TRUE, \n r = 0.15\n)\nmf_title(\"Population of District - Dorling Cartogram\")\n\n\n\n\nThe parameter k allows to vary the expansion factor of the circles.\n\n\n5.4.2 Non-continuous cartograms\nThe size of the polygons is proportional to a variable. The arrangement of the polygons relative to each other is preserved. The shape of the polygons is similar.\n\n\n\n\n\n(Cauvin, Escobar, and Serradj 2013)\n\n\n\nThe topology of the regions is lost.\n\n\n\n\n\nThe converstion of the polygons shape is optimal.\n\n\n\ndist_ncont <- cartogram_ncont(x = district, weight = \"T_POP\", k = 1.2)\nmf_map(district, col = NA, border = \"#FDFEFE\", lwd = 1.5)\nmf_map(dist_ncont, col = \"#20B2AA\", border= \"white\", add = TRUE)\nmf_title(\"Population of District - Non-continuous cartograms\")\n\n\n\n\nThe parameter k allows to vary the expansion of the polygons.\n\n\n5.4.3 Continuous cartograms\nThe size of the polygons is proportional to variable. The arrangement of the polygons relative to each other is preserved. To maintain contiguity, the sape of the polygons is heavily transformed.\n\n\n\n\n\n(Paull and Hennig 2016)\n\n\n\nThe shape of the polygond is strongly distorted.\n\n\n\n\n\nIt is a “real geographical map”: topology and contiguity are preserved.\n\n\n\ndist_ncont <- cartogram_cont(x = district, weight = \"DENs\", maxSizeError = 6)\n\nMean size error for iteration 1: 15.8686749410166\n\n\nMean size error for iteration 2: 12.1107731631101\n\n\nMean size error for iteration 3: 9.98940057337996\n\n\nMean size error for iteration 4: 8.62323208787643\n\n\nMean size error for iteration 5: 7.60706404894655\n\n\nMean size error for iteration 6: 6.83561617758241\n\n\nMean size error for iteration 7: 10.1399490743501\n\n\nMean size error for iteration 8: 5.79418495291592\n\nmf_map(dist_ncont, col = \"#66CDAA\", border= \"white\", add = FALSE)\nmf_title(\"Population of District - Continuous cartograms\")\nmf_inset_on(district, cex = .2, pos = \"bottomleft\")\nmf_map(district, lwd = .5)\nmf_inset_off()\n\n\n\n\n\n\n5.4.4 Stengths and weaknessses of cartograms\ncartograms are cartographic representations perceived as innovative (although the method is 40 years old). These very generalize images capture quantities and gradients well. These are real communication images that provoke, arouse interest, convey a strong message, challenge.\nBut cartograms induce a loss of visual cues (difficult to find one’s country or region on the map), require a reading effort which can be significant and do not make it possible to manage missing data.\n\n\n\n\nBrunet, Roger, Robert Ferras, and Hervé Théry. 1993. Les Mots de La géographie: Dictionnaire Critique. 03) 911 BRU.\n\n\nCauvin, Colette, Francisco Escobar, and Aziz Serradj. 2013. Thematic Cartography, Cartography and the Impact of the Quantitative Revolution. Vol. 2. John Wiley & Sons.\n\n\nDorling, Daniel. 1996. Area Cartograms: Their Use and Creation, Concepts and Techniques in Modern Geography. Vol. 59. CATMOG: Concepts and Techniques in Modern Geography. Institute of British Geographers.\n\n\nDougenik, James A, Nicholas R Chrisman, and Duane R Niemeyer. 1985. “An Algorithm to Construct Continuous Area Cartograms.” The Professional Geographer 37 (1): 75–81.\n\n\nGiraud, Timothée. 2021. “Linemap: Line Maps.” https://CRAN.R-project.org/package=linemap.\n\n\n———. 2022a. “Mapsf: Thematic Cartography.” https://CRAN.R-project.org/package=mapsf.\n\n\n———. 2022b. “Tanaka: Design Shaded Contour Lines (or Tanaka) Maps.” https://CRAN.R-project.org/package=tanaka.\n\n\nJeworutzki, Sebastian. 2020. “Cartogram: Create Cartograms with r.” https://CRAN.R-project.org/package=cartogram.\n\n\nLambert, Nicolas. 2015. “Les Anamorphoses Cartographiques.” Blog. Carnet Néocartographique. https://neocarto.hypotheses.org/366.\n\n\nOlson, Judy M. 1976. “Noncontiguous Area Cartograms.” The Professional Geographer 28 (4): 371–80.\n\n\nPaull, John, and Benjamin Hennig. 2016. “Atlas of Organics: Four Maps of the World of Organic Agriculture.” Journal of Organics 3 (1): 25–32.\n\n\nTanaka, Kitiro. 1950. “The Relief Contour Method of Representing Topography on Maps.” Geographical Review 40 (3): 444. https://doi.org/10.2307/211219."
},
{
"objectID": "07-basic_statistics.html",
"href": "07-basic_statistics.html",
"title": "6 Basic statistics for spatial analysis",
"text": "This section aims at providing some basic statistical tools to study the spatial distribution of epidemiological data. If you wish to go further into spatial statistics applied to epidemiology and their limitations you can consult the tutorial “Spatial Epidemiology” from M. Kramer from which the statistical analysis of this section was adapted."
},
{
"objectID": "07-basic_statistics.html#import-and-visualize-epidemiological-data",
"href": "07-basic_statistics.html#import-and-visualize-epidemiological-data",
"title": "6 Basic statistics for spatial analysis",
"section": "6.1 Import and visualize epidemiological data",
"text": "6.1 Import and visualize epidemiological data\nIn this section, we load data that reference the cases of an imaginary disease, the W fever, throughout Cambodia. Each point corresponds to the geo-localization of a case.\n\nlibrary(dplyr)\nlibrary(sf)\n\n#Import Cambodia country border\ncountry <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"country\", quiet = TRUE)\n#Import provincial administrative border of Cambodia\neducation <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"education\", quiet = TRUE)\n#Import district administrative border of Cambodia\ndistrict <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"district\", quiet = TRUE)\n\n# Import locations of cases from an imaginary disease\ncases <- st_read(\"data_cambodia/cambodia.gpkg\", layer = \"cases\", quiet = TRUE)\ncases <- subset(cases, Disease == \"W fever\")\n\nThe first step of any statistical analysis always consists on visualizing the data to check they were correctly loaded and to observe general pattern of the cases.\n\n# View the cases object\nhead(cases)\n\nSimple feature collection with 6 features and 2 fields\nGeometry type: MULTIPOINT\nDimension: XY\nBounding box: xmin: 255891 ymin: 1179092 xmax: 506647.4 ymax: 1467441\nProjected CRS: WGS 84 / UTM zone 48N\n id Disease geom\n1 0 W fever MULTIPOINT ((280036.2 12841...\n2 1 W fever MULTIPOINT ((451859.5 11790...\n3 2 W fever MULTIPOINT ((255891 1467441))\n4 5 W fever MULTIPOINT ((506647.4 12322...\n5 6 W fever MULTIPOINT ((440668 1197958))\n6 7 W fever MULTIPOINT ((481594.5 12714...\n\n# Map the cases\nlibrary(mapsf)\n\nmf_map(x = district, border = \"white\")\nmf_map(x = country,lwd = 2, col = NA, add = TRUE)\nmf_map(x = cases, lwd = .5, col = \"#990000\", pch = 20, add = TRUE)\nmf_layout(title = \"W Fever infections in Cambodia\")\n\n\n\n\nIn epidemiology, the true meaning of point is very questionable. If it usually gives the location of an observation, we cannot precisely tell if this observation represents an event of interest (e.g., illness, death, …) or a person at risk (e.g., a participant that may or may not experience the disease). If you can consider that the population at risk is uniformly distributed in small area (within a city for example), this is likely not the case at a country scale. Considering a ratio of event compared to a population at risk is often more informative than just considering cases. Administrative divisions of countries appear as great areal units for cases aggregation since they make available data on population count and structures. In this study, we will use the district as the areal unit of the study.\n\n# Aggregate cases over districts\ndistrict$cases <- lengths(st_intersects(district, cases))\n\nThe incidence (\\(\\frac{cases}{population}\\)) expressed per 100,000 population is commonly use to represent cases distribution related to population density but other indicators exists. As example, the standardized incidence ratios (SIRs) represent the deviation of observed and expected number of cases and is expressed as \\(SIR = \\frac{Y_i}{E_i}\\) with \\(Y_i\\), the observed number of cases and \\(E_i\\), the expected number of cases. In this study, we computed the expected number of cases in each district by assuming infections are homogeneously distributed across Cambodia, i.e., the incidence is the same in each district. The SIR therefore represents the deviation of incidence compared to the average incidence across Cambodia.\n\n# Compute incidence in each district (per 100 000 population)\ndistrict$incidence <- district$cases/district$T_POP * 100000\n\n# Compute the global risk\nrate <- sum(district$cases)/sum(district$T_POP)\n\n# Compute expected number of cases \ndistrict$expected <- district$T_POP * rate\n\n# Compute SIR\ndistrict$SIR <- district$cases / district$expected\n\n\npar(mfrow = c(1, 3))\n# Plot number of cases using proportional symbol \nmf_map(x = district) \nmf_map(\n x = district, \n var = \"cases\",\n val_max = 50,\n type = \"prop\",\n col = \"#990000\", \n leg_title = \"Cases\")\nmf_layout(title = \"Number of cases of W Fever\")\n\n# Plot incidence \nmf_map(x = district,\n var = \"incidence\",\n type = \"choro\",\n pal = \"Reds 3\",\n breaks = exp(mf_get_breaks(log(district$incidence+1), breaks = \"pretty\"))-1,\n leg_title = \"Incidence \\n(per 100 000)\")\nmf_layout(title = \"Incidence of W Fever\")\n\n# Plot SIRs\n# create breaks and associated color palette\nbreak_SIR <- c(0,exp(mf_get_breaks(log(district$SIR), nbreaks = 8, breaks = \"pretty\")))\ncol_pal <- c(\"#273871\", \"#3267AD\", \"#6496C8\", \"#9BBFDD\", \"#CDE3F0\", \"#FFCEBC\", \"#FF967E\", \"#F64D41\", \"#B90E36\")\n\nmf_map(x = district,\n var = \"SIR\",\n type = \"choro\",\n breaks = break_SIR, \n pal = col_pal, \n cex = 2,\n leg_title = \"SIR\")\nmf_layout(title = \"Standardized Incidence Ratio of W Fever\")\n\n\n\n\nThese maps illustrate the spatial heterogeneity of the cases. The incidence shows how the disease vary from one district to another while the SIR highlight districts that have:\n\nhigher risk than average (SIR > 1) when standardized for population\nlower risk than average (SIR < 1) when standardized for population\naverage risk (SIR ~ 1) when standardized for population\n\n\n\n\n\n\n\nTo go further …\n\n\n\nIn this example, we standardized the cases distribution for population count. This simple standardization assumes that the risk of contracting the disease is similar for each person. However, assumption does not hold for all diseases and for all observed events since confounding effects can create nuisance into the interpretations (e.g., the number of childhood illness and death outcomes in a district are usually related to the age pyramid) and you should keep in mind that other standardization can be performed based on variables known to have an effect but that you don’t want to analyze (e.g., sex ratio, occupations, age pyramid).\nIn addition, one can wonder what does an SIR ~ 1 means, i.e., what is the threshold to decide whether the SIR is greater, lower or equivalent to 1. The significant of the SIR can be tested globally (to determine whether or not the incidence is homogeneously distributed) and locally in each district (to determine Which district have an SIR different than 1). We won’t perform these analyses in this tutorial but you can look at the function ?achisq.test() (from Dcluster package (Gómez-Rubio et al. 2015)) and ?probmap() (from spdep package (R. Bivand et al. 2015)) to compute these statistics."
},
{
"objectID": "07-basic_statistics.html#cluster-analysis",
"href": "07-basic_statistics.html#cluster-analysis",
"title": "6 Basic statistics for spatial analysis",
"section": "6.2 Cluster analysis",
"text": "6.2 Cluster analysis\n\n6.2.1 General introduction\nWhy studying clusters in epidemiology? Cluster analysis help identifying unusual patterns that occurs during a given period of time. The underlying ultimate goal of such analysis is to explain the observation of such patterns. In epidemiology, we can distinguish two types of process that would explain heterogeneity in case distribution:\n\nThe 1st order effects are the spatial variations of cases distribution caused by underlying properties of environment or the population structure itself. In such process individual get infected independently from the rest of the population. Such process includes the infection through an environment at risk as, for example, air pollution, contaminated waters or soils and UV exposition. This effect assume that the observed pattern is caused by a difference in risk intensity.\nThe 2nd order effects describes process of spread, contagion and diffusion of diseases caused by interactions between individuals. This includes transmission of infectious disease by proximity, but also the transmission of non-infectious disease, for example, with the diffusion of social norms within networks. This effect assume that the observed pattern is caused by correlations or co-variations.\n\nNo statistical methods could distinguish between these competing processes since their outcome results in similar pattern of points. The cluster analysis help describing the magnitude and the location of pattern but in no way could answer the question of why such patterns occurs. It is therefore a step that help detecting cluster for description and surveillance purpose and rising hypothesis on the underlying process that will lead further investigations.\nKnowledge about the disease and its transmission process could orientate the choice of the methods of study. We presented in this brief tutorial two methods of cluster detection, the Moran’s I test that test for spatial independence (likely related to 2nd order effects) and the scan statistics that test for homogeneous distribution (likely related 1st order effects). It relies on epidemiologist to select the tools that best serve the studied question.\n\n\n\n\n\n\nStatistic tests and distributions\n\n\n\nIn statistics, problems are usually expressed by defining two hypotheses: the null hypothesis (H0), i.e., an a priori hypothesis of the studied phenomenon (e.g., the situation is a random) and the alternative hypothesis (H1), e.g., the situation is not random. The main principle is to measure how likely the observed situation belong to the ensemble of situation that are possible under the H0 hypothesis.\nIn mathematics, a probability distribution is a mathematical expression that represents what we would expect due to random chance. The choice of the probability distribution relies on the type of data you use (continuous, count, binary). In general, three distribution a used while studying disease rates, the Binomial, the Poisson and the Poisson-gamma mixture (also known as negative binomial) distributions.\nMany the statistical tests assume by default that data are normally distributed. It implies that your variable is continuous and that all data could easily be represented by two parameters, the mean and the variance, i.e., each value have the same level of certainty. If many measure can be assessed under the normality assumption, this is usually not the case in epidemiology with strictly positives rates and count values that 1) does not fit the normal distribution and 2) does not provide with the same degree of certainty since variances likely differ between district due to different population size, i.e., some district have very sparse data (with high variance) while other have adequate data (with lower variance).\n\n# dataset statistics\nm_cases <- mean(district$incidence)\nsd_cases <- sd(district$incidence)\n\nhist(district$incidence, probability = TRUE, ylim = c(0, 0.4), xlim = c(-5, 16), xlab = \"Number of cases\", ylab = \"Probability\", main = \"Histogram of observed incidence compared\\nto Normal and Poisson distributions\")\ncurve(dnorm(x, m_cases, sd_cases),col = \"blue\", lwd = 1, add = TRUE)\npoints(0:max(district$incidence), dpois(0:max(district$incidence), m_cases),type = 'b', pch = 20, col = \"red\", ylim = c(0, 0.6), lty = 2)\n\nlegend(\"topright\", legend = c(\"Normal distribution\", \"Poisson distribution\", \"Observed distribution\"), col = c(\"blue\", \"red\", \"black\"),pch = c(NA, 20, NA), lty = c(1, 2, 1))\n\n\n\n\nIn this tutorial, we used the Poisson distribution in our statistical tests.\n\n\n\n\n6.2.2 Test for spatial autocorrelation (Moran’s I test)\n\n6.2.2.1 The global Moran’s I test\nA popular test for spatial autocorrelation is the Moran’s test. This test tells us whether nearby units tend to exhibit similar incidences. It ranges from -1 to +1. A value of -1 denote that units with low rates are located near other units with high rates, while a Moran’s I value of +1 indicates a concentration of spatial units exhibiting similar rates.\n\n\n\n\n\n\nMoran’s I test\n\n\n\nThe Moran’s statistics is:\n\\[I = \\frac{N}{\\sum_{i=1}^N\\sum_{j=1}^Nw_{ij}}\\frac{\\sum_{i=1}^N\\sum_{j=1}^Nw_{ij}(Y_i-\\bar{Y})(Y_j - \\bar{Y})}{\\sum_{i=1}^N(Y_i-\\bar{Y})^2}\\] with:\n\n\\(N\\): the number of polygons,\n\\(w_{ij}\\): is a matrix of spatial weight with zeroes on the diagonal (i.e., \\(w_{ii}=0\\)). For example, if polygons are neighbors, the weight takes the value \\(1\\) otherwise it takes the value \\(0\\).\n\\(Y_i\\): the variable of interest,\n\\(\\bar{Y}\\): the mean value of \\(Y\\).\n\nUnder the Moran’s test, the statistics hypotheses are:\n\nH0: the distribution of cases is spatially independent, i.e., \\(I=0\\).\nH1: the distribution of cases is spatially autocorrelated, i.e., \\(I\\ne0\\).\n\n\n\nWe will compute the Moran’s statistics using spdep(R. Bivand et al. 2015) and Dcluster(Gómez-Rubio et al. 2015) packages. spdep package provides a collection of functions to analyze spatial correlations of polygons and works with sp objects. In this example, we use poly2nb() and nb2listw(). These functions respectively detect the neighboring polygons and assign weight corresponding to \\(1/\\#\\ of\\ neighbors\\). Dcluster package provides a set of functions for the detection of spatial clusters of disease using count data.\n\n#install.packages(\"spdep\")\n#install.packages(\"DCluster\")\nlibrary(spdep) # Functions for creating spatial weight, spatial analysis\nlibrary(DCluster) # Package with functions for spatial cluster analysis\n\nqueen_nb <- poly2nb(district) # Neighbors according to queen case\nq_listw <- nb2listw(queen_nb, style = 'W') # row-standardized weights\n\n# Moran's I test\nm_test <- moranI.test(cases ~ offset(log(expected)), \n data = district,\n model = 'poisson',\n R = 499,\n listw = q_listw,\n n = length(district$cases), # number of regions\n S0 = Szero(q_listw)) # Global sum of weights\nprint(m_test)\n\nMoran's I test of spatial autocorrelation \n\n Type of boots.: parametric \n Model used when sampling: Poisson \n Number of simulations: 499 \n Statistic: 0.1566449 \n p-value : 0.006 \n\nplot(m_test)\n\n\n\n\nThe Moran’s statistics is here \\(I =\\) 0.16. When comparing its value to the H0 distribution (built under 499 simulations), the probability of observing such a I value under the null hypothesis, i.e. the distribution of cases is spatially independent, is \\(p_{value} =\\) 0.006. We therefore reject H0 with error risk of \\(\\alpha = 5\\%\\). The distribution of cases is therefore autocorrelated across districts in Cambodia.\n\n\n6.2.2.2 The Local Moran’s I LISA test\nThe global Moran’s test provides us a global statistical value informing whether autocorrelation occurs over the territory but does not inform on where does these correlations occurs, i.e., what is the locations of the clusters. To identify such cluster, we can decompose the Moran’s I statistic to extract local information of the level of correlation of each district and its neighbors. This is called the Local Moran’s I LISA statistic. Because the Local Moran’s I LISA statistic test each district for autocorrelation independently, concern is raised about multiple testing limitations that increase the Type I error (\\(\\alpha\\)) of the statistical tests. The use of local test should therefore be study in light of explore and describes clusters once the global test has detected autocorrelation.\n\n\n\n\n\n\nStatistical test\n\n\n\nFor each district \\(i\\), the Local Moran’s I statistics is:\n\\[I_i = \\frac{(Y_i-\\bar{Y})}{\\sum_{i=1}^N(Y_i-\\bar{Y})^2}\\sum_{j=1}^Nw_{ij}(Y_j - \\bar{Y}) \\text{ with } I = \\sum_{i=1}^NI_i/N\\]\n\n\nThe localmoran()function from the package spdep treats the variable of interest as if it was normally distributed. In some cases, this assumption could be reasonable for incidence rate, especially when the areal units of analysis have sufficiently large population count suggesting that the values have similar level of variances. Unfortunately, the local Moran’s test has not been implemented for Poisson distribution (population not large enough in some districts) in spdep package. However, Bivand et al. (R. S. Bivand et al. 2008) provided some code to manually perform the analysis using Poisson distribution and this code was further implemented in the course “Spatial Epidemiology”.\n\n# Step 1 - Create the standardized deviation of observed from expected\nsd_lm <- (district$cases - district$expected) / sqrt(district$expected)\n\n# Step 2 - Create a spatially lagged version of standardized deviation of neighbors\nwsd_lm <- lag.listw(q_listw, sd_lm)\n\n# Step 3 - the local Moran's I is the product of step 1 and step 2\ndistrict$I_lm <- sd_lm * wsd_lm\n\n# Step 4 - setup parameters for simulation of the null distribution\n\n# Specify number of simulations to run\nnsim <- 499\n\n# Specify dimensions of result based on number of regions\nN <- length(district$expected)\n\n# Create a matrix of zeros to hold results, with a row for each county, and a column for each simulation\nsims <- matrix(0, ncol = nsim, nrow = N)\n\n# Step 5 - Start a for-loop to iterate over simulation columns\nfor(i in 1:nsim){\n y <- rpois(N, lambda = district$expected) # generate a random event count, given expected\n sd_lmi <- (y - district$expected) / sqrt(district$expected) # standardized local measure\n wsd_lmi <- lag.listw(q_listw, sd_lmi) # standardized spatially lagged measure\n sims[, i] <- sd_lmi * wsd_lmi # this is the I(i) statistic under this iteration of null\n}\n\n# Step 6 - For each county, test where the observed value ranks with respect to the null simulations\nxrank <- apply(cbind(district$I_lm, sims), 1, function(x) rank(x)[1])\n\n# Step 7 - Calculate the difference between observed rank and total possible (nsim)\ndiff <- nsim - xrank\ndiff <- ifelse(diff > 0, diff, 0)\n\n# Step 8 - Assuming a uniform distribution of ranks, calculate p-value for observed\n# given the null distribution generate from simulations\ndistrict$pval_lm <- punif((diff + 1) / (nsim + 1))\n\nBriefly, the process consist on 1) computing the I statistics for the observed data, 2) estimating the null distribution of the I statistics by performing random sampling into a poisson distribution and 3) comparing the observed I statistic with the null distribution to determine the probability to observe such value if the number of cases were spatially independent. For each district, we obtain a p-value based on the comparison of the observed value and the null distribution.\nA conventional way of plotting these results is to classify the districts into 5 classes based on local Moran’s I output. The classification of cluster that are significantly autocorrelated to their neighbors is performed based on a comparison of the scaled incidence in the district compared to the scaled weighted averaged incidence of it neighboring districts (computed with lag.listw()):\n\nDistricts that have higher-than-average rates in both index regions and their neighbors and showing statistically significant positive values for the local \\(I_i\\) statistic are defined as High-High (hotspot of the disease)\nDistricts that have lower-than-average rates in both index regions and their neighbors and showing statistically significant positive values for the local \\(I_i\\) statistic are defined as Low-Low (cold spot of the disease).\nDistricts that have higher-than-average rates in the index regions and lower-than-average rates in their neighbors, and showing statistically significant negative values for the local \\(I_i\\) statistic are defined as High-Low(outlier with high incidence in an area with low incidence).\nDistricts that have lower-than-average rates in the index regions and higher-than-average rates in their neighbors, and showing statistically significant negative values for the local \\(I_i\\) statistic are defined as Low-High (outlier of low incidence in area with high incidence).\nDistricts with non-significant values for the \\(I_i\\) statistic are defined as Non-significant.\n\n\n# create lagged local raw_rate - in other words the average of the queen neighbors value\n# values are scaled (centered and reduced) to be compared to average\ndistrict$lag_std <- scale(lag.listw(q_listw, var = district$incidence))\ndistrict$incidence_std <- scale(district$incidence)\n\n# extract pvalues\n# district$lm_pv <- lm_test[,5]\n\n# Classify local moran's outputs\ndistrict$lm_class <- NA\ndistrict$lm_class[district$incidence_std >=0 & district$lag_std >=0] <- 'High-High'\ndistrict$lm_class[district$incidence_std <=0 & district$lag_std <=0] <- 'Low-Low'\ndistrict$lm_class[district$incidence_std <=0 & district$lag_std >=0] <- 'Low-High'\ndistrict$lm_class[district$incidence_std >=0 & district$lag_std <=0] <- 'High-Low'\ndistrict$lm_class[district$pval_lm >= 0.05] <- 'Non-significant'\n\ndistrict$lm_class <- factor(district$lm_class, levels=c(\"High-High\", \"Low-Low\", \"High-Low\", \"Low-High\", \"Non-significant\") )\n\n# create map\nmf_map(x = district,\n var = \"lm_class\",\n type = \"typo\",\n cex = 2,\n col_na = \"white\",\n #val_order = c(\"High-High\", \"Low-Low\", \"High-Low\", \"Low-High\", \"Non-significant\") ,\n pal = c(\"#6D0026\" , \"blue\", \"white\") , # \"#FF755F\",\"#7FABD3\" ,\n leg_title = \"Clusters\")\n\nmf_layout(title = \"Cluster using Local Moran's I statistic\")\n\n\n\n\n\n\n\n6.2.3 Spatial scan statistics\nWhile Moran’s indices focus on testing for autocorrelation between neighboring polygons (under the null assumption of spatial independence), the spatial scan statistic aims at identifying an abnormal higher risk in a given region compared to the risk outside of this region (under the null assumption of homogeneous distribution). The conception of a cluster is therefore different between the two methods.\nThe function kulldorff from the package SpatialEpi (Kim and Wakefield 2010) is a simple tool to implement spatial-only scan statistics.\n\n\n\n\n\n\nKulldorf test\n\n\n\nUnder the kulldorff test, the statistics hypotheses are:\n\nH0: the risk is constant over the area, i.e., there is a spatial homogeneity of the incidence.\nH1: a particular window have higher incidence than the rest of the area , i.e., there is a spatial heterogeneity of incidence.\n\n\n\nBriefly, the kulldorff scan statistics scan the area for clusters using several steps:\n\nIt create a circular window of observation by defining a single location and an associated radius of the windows varying from 0 to a large number that depends on population distribution (largest radius could include 50% of the population).\nIt aggregates the count of events and the population at risk (or an expected count of events) inside and outside the window of observation.\nFinally, it computes the likelihood ratio and test whether the risk is equal inside versus outside the windows (H0) or greater inside the observed window (H1). The H0 distribution is estimated by simulating the distribution of counts under the null hypothesis (homogeneous risk).\nThese 3 steps are repeated for each location and each possible windows-radii.\n\nWhile we test the significance of a large number of observation windows, one can raise concern about multiple testing and Type I error. This approach however suggest that we are not interest in a set of signifiant cluster but only in a most-likely cluster. This a priori restriction eliminate concern for multpile comparison since the test is simplified to a statistically significance of one single most-likely cluster.\nBecause we tested all-possible locations and window-radius, we can also choose to look at secondary clusters. In this case, you should keep in mind that increasing the number of secondary cluster you select, increases the risk for Type I error.\n\n#install.packages(\"SpatialEpi\")\nlibrary(\"SpatialEpi\")\n\nThe use of R spatial object is not implements in kulldorff() function. It uses instead matrix of xy coordinates that represents the centroids of the districts. A given district is included into the observed circular window if its centroids fall into the circle.\n\ndistrict_xy <- st_centroid(district) %>% \n st_coordinates()\n\nhead(district_xy)\n\n X Y\n1 330823.3 1464560\n2 749758.3 1541787\n3 468384.0 1277007\n4 494548.2 1215261\n5 459644.2 1194615\n6 360528.3 1516339\n\n\nWe can then call kulldorff function (you are strongly encouraged to call ?kulldorff to properly call the function). The alpha.level threshold filter for the secondary clusters that will be retained. The most-likely cluster will be saved whatever its significance.\n\nkd_Wfever <- kulldorff(district_xy, \n cases = district$cases,\n population = district$T_POP,\n expected.cases = district$expected,\n pop.upper.bound = 0.5, # include maximum 50% of the population in a windows\n n.simulations = 499,\n alpha.level = 0.2)\n\n\n\n\nThe function plot the histogram of the distribution of log-likelihood ratio simulated under the null hypothesis that is estimated based on Monte Carlo simulations. The observed value of the most significant cluster identified from all possible scans is compared to the distribution to determine significance. All outputs are saved into an R object, here called kd_Wfever. Unfortunately, the package did not develop any summary and visualization of the results but we can explore the output object.\n\nnames(kd_Wfever)\n\n[1] \"most.likely.cluster\" \"secondary.clusters\" \"type\" \n[4] \"log.lkhd\" \"simulated.log.lkhd\" \n\n\nFirst, we can focus on the most likely cluster and explore its characteristics.\n\n# We can see which districts (r number) belong to this cluster\nkd_Wfever$most.likely.cluster$location.IDs.included\n\n [1] 48 93 66 180 133 29 194 118 50 144 31 141 3 117 22 43 142\n\n# standardized incidence ratio\nkd_Wfever$most.likely.cluster$SMR\n\n[1] 2.303106\n\n# number of observed and expected cases in this cluster\nkd_Wfever$most.likely.cluster$number.of.cases\n\n[1] 122\n\nkd_Wfever$most.likely.cluster$expected.cases\n\n[1] 52.97195\n\n\n17 districts belong to the cluster and its number of cases is 2.3 times higher than the expected number of cases.\nSimilarly, we could study the secondary clusters. Results are saved in a list.\n\n# We can see which districts (r number) belong to this cluster\nlength(kd_Wfever$secondary.clusters)\n\n[1] 1\n\n# retrieve data for all secondary clusters into a table\ndf_secondary_clusters <- data.frame(SMR = sapply(kd_Wfever$secondary.clusters, '[[', 5), \n number.of.cases = sapply(kd_Wfever$secondary.clusters, '[[', 3),\n expected.cases = sapply(kd_Wfever$secondary.clusters, '[[', 4),\n p.value = sapply(kd_Wfever$secondary.clusters, '[[', 8))\n\nprint(df_secondary_clusters)\n\n SMR number.of.cases expected.cases p.value\n1 3.767698 16 4.246625 0.012\n\n\nWe only have one secondary cluster composed of one district.\n\n# create empty column to store cluster informations\ndistrict$k_cluster <- NA\n\n# save cluster information from kulldorff outputs\ndistrict$k_cluster[kd_Wfever$most.likely.cluster$location.IDs.included] <- 'Most likely cluster'\n\nfor(i in 1:length(kd_Wfever$secondary.clusters)){\ndistrict$k_cluster[kd_Wfever$secondary.clusters[[i]]$location.IDs.included] <- paste(\n 'Secondary cluster', i, sep = '')\n}\n\n#district$k_cluster[is.na(district$k_cluster)] <- \"No cluster\"\n\n\n# create map\nmf_map(x = district,\n var = \"k_cluster\",\n type = \"typo\",\n cex = 2,\n col_na = \"white\",\n pal = mf_get_pal(palette = \"Reds\", n = 3)[1:2],\n leg_title = \"Clusters\")\n\nmf_layout(title = \"Cluster using kulldorf scan statistic\")\n\n\n\n\n\n\n\n\n\n\nTo go further …\n\n\n\nIn this example, the expected number of cases was defined using the population count but note that standardization over other variables as age could also be implemented with the strata parameter in the kulldorff() function.\nIn addition, this cluster analysis was performed solely using the spatial scan but you should keep in mind that this method of cluster detection can be implemented for spatio-temporal data as well where the cluster definition is an abnormal number of cases in a delimited spatial area and during a given period of time. The windows of observation are therefore defined for a different center, radius and time-period. You should look at the function scan_ep_poisson() function in the package scanstatistic (Allévius 2018) for this analysis.\n\n\n\n\n\n\nAllévius, Benjamin. 2018. “Scanstatistics: Space-Time Anomaly Detection Using Scan Statistics.” Journal of Open Source Software 3 (25): 515.\n\n\nBivand, Roger S, Edzer J Pebesma, Virgilio Gómez-Rubio, and Edzer Jan Pebesma. 2008. Applied Spatial Data Analysis with r. Vol. 747248717. Springer.\n\n\nBivand, Roger, Micah Altman, Luc Anselin, Renato Assunção, Olaf Berke, Andrew Bernat, and Guillaume Blanchet. 2015. “Package ‘Spdep’.” The Comprehensive R Archive Network.\n\n\nGómez-Rubio, Virgilio, Juan Ferrándiz-Ferragud, Antonio López-Quı́lez, et al. 2015. “Package ‘DCluster’.”\n\n\nKim, Albert Y, and Jon Wakefield. 2010. “R Data and Methods for Spatial Epidemiology: The SpatialEpi Package.” Dept of Statistics, University of Washington."
},
{
"objectID": "references.html",
"href": "references.html",
"title": "References",
"section": "",
"text": "Agafonkin, Vladimir. 2015. “Leaflet Javascript Libary.”\n\n\nAllévius, Benjamin. 2018. “Scanstatistics: Space-Time Anomaly\nDetection Using Scan Statistics.” Journal of Open Source\nSoftware 3 (25): 515.\n\n\nAppelhans, Tim, Florian Detsch, Christoph Reudenbach, and Stefan\nWoellauer. 2022. “Mapview: Interactive Viewing of Spatial Data in\nr.” https://CRAN.R-project.org/package=mapview.\n\n\nAppelhans, Tim, Kenton Russell, and Lorenzo Busetto. 2020.\n“Mapedit: Interactive Editing of Spatial Data in r.” https://CRAN.R-project.org/package=mapedit.\n\n\nBivand, Roger S, Edzer J Pebesma, Virgilio Gómez-Rubio, and Edzer Jan\nPebesma. 2008. Applied Spatial Data Analysis with r. Vol.\n747248717. Springer.\n\n\nBivand, Roger, Micah Altman, Luc Anselin, Renato Assunção, Olaf Berke,\nAndrew Bernat, and Guillaume Blanchet. 2015. “Package\n‘Spdep’.” The Comprehensive R Archive\nNetwork.\n\n\nBivand, Roger, Tim Keitt, and Barry Rowlingson. 2022. “Rgdal:\nBindings for the ’Geospatial’ Data Abstraction Library.” https://CRAN.R-project.org/package=rgdal.\n\n\nBivand, Roger, and Colin Rundel. 2021. “Rgeos: Interface to\nGeometry Engine - Open Source (’GEOS’).” https://CRAN.R-project.org/package=rgeos.\n\n\nBrunet, Roger, Robert Ferras, and Hervé Théry. 1993. Les Mots de La\ngéographie: Dictionnaire Critique. 03) 911 BRU.\n\n\nCambon, Jesse, Diego Hernangómez, Christopher Belanger, and Daniel\nPossenriede. 2021. “Tidygeocoder: An r Package for\nGeocoding” 6: 3544. https://doi.org/10.21105/joss.03544.\n\n\nCauvin, Colette, Francisco Escobar, and Aziz Serradj. 2013. Thematic\nCartography, Cartography and the Impact of the Quantitative\nRevolution. Vol. 2. John Wiley & Sons.\n\n\nCheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2022. “Leaflet:\nCreate Interactive Web Maps with the JavaScript ’Leaflet’\nLibrary.” https://CRAN.R-project.org/package=leaflet.\n\n\nDicko, Ahmadou. 2021. R Client for the geoBoundaries API, Providing\nCountry Political Administrative Boundaries. https://dickoa.gitlab.io/rgeoboundaries/index.html.\n\n\nDorling, Daniel. 1996. Area Cartograms: Their Use and Creation,\nConcepts and Techniques in Modern Geography. Vol. 59. CATMOG:\nConcepts and Techniques in Modern Geography. Institute of British\nGeographers.\n\n\nDougenik, James A, Nicholas R Chrisman, and Duane R Niemeyer. 1985.\n“An Algorithm to Construct Continuous Area Cartograms.”\nThe Professional Geographer 37 (1): 75–81.\n\n\nDunnington, Dewey. 2021. “Ggspatial: Spatial Data Framework for\nGgplot2.” https://CRAN.R-project.org/package=ggspatial.\n\n\nGDAL/OGR contributors. n.d. GDAL/OGR Geospatial Data\nAbstraction Software Library. Open Source Geospatial Foundation. https://gdal.org.\n\n\nGilardi, Andrea, and Robin Lovelace. 2021. “Osmextract: Download\nand Import Open Street Map Data Extracts.” https://CRAN.R-project.org/package=osmextract.\n\n\nGiraud, Timothée. 2021a. “Linemap: Line Maps.” https://CRAN.R-project.org/package=linemap.\n\n\n———. 2021b. “Maptiles: Download and Display Map Tiles.” https://CRAN.R-project.org/package=maptiles.\n\n\n———. 2022a. “Mapsf: Thematic Cartography.” https://CRAN.R-project.org/package=mapsf.\n\n\n———. 2022b. “Tanaka: Design Shaded Contour Lines (or Tanaka)\nMaps.” https://CRAN.R-project.org/package=tanaka.\n\n\nGiraud, Timothée, and Nicolas Lambert. 2016. “Cartography: Create\nand Integrate Maps in Your r Workflow” 1. https://doi.org/10.21105/joss.00054.\n\n\nGombin, Joel, and Paul-Antoine Chevalier. 2022. “banR: R Client\nfor the BAN API.”\n\n\nGómez-Rubio, Virgilio, Juan Ferrándiz-Ferragud, Antonio López-Quı́lez, et\nal. 2015. “Package ‘DCluster’.”\n\n\nGuevarra, Ernest. 2021. Gadmr: An r Interface to the GADM Map\nRepository. https://github.com/SpatialWorks/gadmr.\n\n\nHijmans, Robert J. 2022a. “Raster: Geographic Data Analysis and\nModeling.” https://CRAN.R-project.org/package=raster.\n\n\n———. 2022b. “Terra: Spatial Data Analysis.” https://CRAN.R-project.org/package=terra.\n\n\nJeworutzki, Sebastian. 2020. “Cartogram: Create Cartograms with\nr.” https://CRAN.R-project.org/package=cartogram.\n\n\nKim, Albert Y, and Jon Wakefield. 2010. “R Data and Methods for\nSpatial Epidemiology: The SpatialEpi Package.” Dept of\nStatistics, University of Washington.\n\n\nLambert, Nicolas. 2015. “Les Anamorphoses Cartographiques.”\nBlog. Carnet Néocartographique. https://neocarto.hypotheses.org/366.\n\n\nLi, Xingong. 2009. “Map Algebra and Beyond : 1. Map Algebra for\nScalar Fields.” https://slideplayer.com/slide/5822638/.\n\n\nMadelin, Malika. 2021. “Analyse d’images Raster (Et\nTélédétection).” https://mmadelin.github.io/sigr2021/SIGR2021_raster_MM.html.\n\n\nMennis, Jeremy. 2015. “Fundamentals of GIS : Raster\nOperations.” https://cupdf.com/document/gus-0262-fundamentals-of-gis-lecture-presentation-7-raster-operations-jeremy.html.\n\n\nNowosad, Jakub. 2021. “Image Processing and All Things\nRaster.” https://nowosad.github.io/SIGR2021/workshop2/workshop2.html.\n\n\nOlson, Judy M. 1976. “Noncontiguous Area Cartograms.”\nThe Professional Geographer 28 (4): 371–80.\n\n\nPadgham, Mark, Bob Rudis, Robin Lovelace, and Maëlle Salmon. 2017a.\n“Osmdata” 2. https://doi.org/10.21105/joss.00305.\n\n\n———. 2017b. “Osmdata.” The Journal of Open Source\nSoftware 2 (14). https://doi.org/10.21105/joss.00305.\n\n\nPaull, John, and Benjamin Hennig. 2016. “Atlas of Organics: Four\nMaps of the World of Organic Agriculture.” Journal of\nOrganics 3 (1): 25–32.\n\n\nPebesma, Edzer. 2018b. “Simple Features for r:\nStandardized Support for Spatial Vector Data” 10. https://doi.org/10.32614/RJ-2018-009.\n\n\n———. 2018a. “Simple Features for R: Standardized Support for\nSpatial Vector Data.” The R Journal 10 (1): 439. https://doi.org/10.32614/rj-2018-009.\n\n\n———. 2021. “Stars: Spatiotemporal Arrays, Raster and Vector Data\nCubes.” https://CRAN.R-project.org/package=stars.\n\n\nPebesma, Edzer J., and Roger S. Bivand. 2005. “Classes and Methods\nfor Spatial Data in r” 5. https://CRAN.R-project.org/doc/Rnews/.\n\n\nPROJ contributors. 2021. PROJ Coordinate Transformation\nSoftware Library. Open Source Geospatial Foundation. https://proj.org/.\n\n\nRacine, Etienne B. 2016. “The Visual Raster Cheat Sheet.”\nhttps://rpubs.com/etiennebr/visualraster.\n\n\nRowlingson, Barry. 2019. Geonames: Interface to the \"Geonames\"\nSpatial Query Web Service. https://CRAN.R-project.org/package=geonames.\n\n\nSouth, Andy. 2017. “Rnaturalearth: World Map Data from Natural\nEarth.” https://CRAN.R-project.org/package=rnaturalearth.\n\n\nTanaka, Kitiro. 1950. “The Relief Contour Method of Representing\nTopography on Maps.” Geographical Review 40 (3): 444. https://doi.org/10.2307/211219.\n\n\nTennekes, Martijn. 2018. “Tmap: Thematic\nMaps in r” 84. https://doi.org/10.18637/jss.v084.i06.\n\n\nTomlin, C. Dana. 1990. Geographic Information Systems and\nCartographic Modeling. Prentice Hall.\n\n\nWeidmann, Nils B., Guy Schvitz, and Luc Girardin. 2021. Cshapes: The\nCShapes 2.0 Dataset and Utilities. https://CRAN.R-project.org/package=cshapes.\n\n\nWickham, Hadley. 2016. “Ggplot2: Elegant Graphics for Data\nAnalysis.” https://ggplot2.tidyverse.org."
},
{
"objectID": "07-basic_statistics.html#conclusion",
"href": "07-basic_statistics.html#conclusion",
"title": "6 Basic statistics for spatial analysis",
"section": "6.3 Conclusion",
"text": "6.3 Conclusion\n\npar(mfrow = c(1, 2))\n\n# create map\nmf_map(x = district,\n var = \"lm_class\",\n type = \"typo\",\n cex = 2,\n col_na = \"white\",\n pal = c(\"#6D0026\" , \"blue\", \"white\") , # \"#FF755F\",\"#7FABD3\" ,\n leg_title = \"Clusters\")\n\nmf_layout(title = \"Cluster using Local Moran's I statistic\")\n\n# create map\nmf_map(x = district,\n var = \"k_cluster\",\n type = \"typo\",\n cex = 2,\n col_na = \"white\",\n pal = mf_get_pal(palette = \"Reds\", n = 3)[1:2],\n leg_title = \"Clusters\")\n\nmf_layout(title = \"Cluster using kulldorf scan statistic\")\n\n\n\n\nBoth methods identified significant clusters. The two methods could identify a cluster around Phnom Penh after standardization for population counts. However, the identified clusters does not rely on the same assumption. While the Moran’s test wonder whether their is any autocorrelation between clusters (i.e. second order effects of infection), the Kulldorff scan statistics wonder whether their is any heterogeneity in the case distribution. None of these test can inform on the infection processes (first or second order) for the studied disease and previous knowledge on the disease will help selecting the most accurate test.\n\n\n\n\n\n\nTip\n\n\n\nIn this example, Cambodia is treated as an island, i.e. there is no data outside of its borders. In reality, some clusters can occurs across country’s borders. You should be aware that such district will likely not be detected by these analysis. This border effect is still a hot topic in spatial studies and there is no conventional ways to deal with it. You can find in the literature some suggestion on how to deals with these border effect as assigning weights, or extrapolating data.\n\n\n\n\n\n\nAllévius, Benjamin. 2018. “Scanstatistics: Space-Time Anomaly Detection Using Scan Statistics.” Journal of Open Source Software 3 (25): 515.\n\n\nBivand, Roger S, Edzer J Pebesma, Virgilio Gómez-Rubio, and Edzer Jan Pebesma. 2008. Applied Spatial Data Analysis with r. Vol. 747248717. Springer.\n\n\nBivand, Roger, Micah Altman, Luc Anselin, Renato Assunção, Olaf Berke, Andrew Bernat, and Guillaume Blanchet. 2015. “Package ‘Spdep’.” The Comprehensive R Archive Network.\n\n\nGómez-Rubio, Virgilio, Juan Ferrándiz-Ferragud, Antonio López-Quı́lez, et al. 2015. “Package ‘DCluster’.”\n\n\nKim, Albert Y, and Jon Wakefield. 2010. “R Data and Methods for Spatial Epidemiology: The SpatialEpi Package.” Dept of Statistics, University of Washington."