Skip to content
Snippets Groups Projects
02-data_acquisition.qmd 14.5 KiB
Newer Older
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
---
bibliography: references.bib
---
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
# Data Acquisition

All the data needed to complete this training can be retrieved at [this link](https://e1.pcloud.link/publink/show?code=XZQ60YZxwN09vWJ8bmuSivWezhuG8u94lCV).

lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
## Online databases

### On global scale

Since the appearance of the `sf` package, which has greatly contributed to the popularization of spatial data manipulation with R, many packages for making geographic data (geometries and/or attributes) available have been developed. Most of them are API packages that allow to query data made available on the Web, directly with R. This chapter presents a non-exhaustive list of them.

-   <img src="img/geom.svg" alt="Geometries" width="20"/> `rnaturalearth` [@R-rnaturalearth]: retrieves [Natural Earth map data](https://www.naturalearthdata.com/).\
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `gadmr` [@R-gadmr]: retrieves data from the [GADM](https://gadm.org/index.html) (national and sub-national administrative divisions of all countries in the world).\
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `rgeoboundaries` [@R-rgeoboundaries] : R client for the [geoBoundaries API](https://www.geoboundaries.org/index.html), providing political administrative boundaries of countries.
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `cshapes` [@R-cshapes]: makes available national boundaries, from 1886 to present.\
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `osmextract` [@R-osmextract]: allows importing [OpenStreetMap data](https://www.openstreetmap.org/).
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `osmdata` [@osmdata2017]: to download and use OpenStreetMap data.\
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `maptiles` [@R-maptiles] : This package downloads, composes and displays tiles from a large number of providers (*OpenStreetMap*, *Stamen*, *Esri*, *CARTO* or *Thunderforest*).\
-   <img src="img/table.svg" alt="attribute data" width="20"/> `geonames` [@R-geonames] : allows you to query the [geonames DB](http://www.geonames.org/), which provides locations in particular.
-   <img src="img/table.svg" alt="attribute data" width="20"/> `wbstats` [@wbstats2020] and `WDI` [@R-WDI]: provide access to World Bank data and statistics.\
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `sen2r` [@R-sen2r]: allows automatic download and preprocessing of Sentinel-2 satellite data.\
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `MODIStsp` [@MODIStsp2016]: find, download and process *MODIS* images.\
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `geodata` [@R-geodata]: provides access to [data](https://github.com/rspatial/geodata) on climate, elevation, soil, species occurrence and administrative boundaries.\
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `elevatr` [@R-elevatr]: provides access to elevation data made available by [*Amazon Web Services Terrain Tiles*](https://registry.opendata.aws/terrain-tiles/), the [*Open Topography Global Datasets API*](https://opentopography.org/developers/) and the [*USGS Elevation Point Query Service*](https://nationalmap.gov/epqs/).
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `rgee` [@R-rgee]: allows use of the [Google Earth Engine](https://www.google.com/intl/fr_in/earth/education/tools/google-earth-engine/) API, a public data catalog and computational infrastructure for satellite images.\
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `nasapower` [@nasapower2018]: *NASA* client API (global energy resource forecasting, meteorology, surface solar energy, and climatology).
-   <img src="img/geom.svg" alt="Geometries" width="20"/> `geoknife` [@geoknife2015]: allows processing (online) of large raster data from the *Geo Data Portal* of the *U.S. Geological Survey*.\
-   <img src="img/table.svg" alt="attribute data" width="20"/> `wopr` [@R-wopr]: provides API access to the [*WorldPop Open Population Repository*](https://wopr.worldpop.org/) database.\
-   <img src="img/geom.svg" alt="Geometries" width="20"/> <img src="img/table.svg" alt="attribute data" width="20"/> `rdhs` [@rdhs2019] : [Demographic and Health Survey (DHS) client API and data managements](https://dhsprogram.com/).

Several data sets are referenced by the ESoR (Environnement, Societies and Health Risk) research group [here](https://www.netvibes.com/geohealth?page=geohealth#Online_databases)

lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
## OpenStreetMap

::: {style="float: right"}
<img src="img/Openstreetmap_logo.svg" width="150px" padding="1em"/>
:::

Lea's avatar
Lea committed
[OpenStreetMap (OSM)](https://www.openstreetmap.org){target="_blank"} is a participatory mapping project that aims to built a free geographic database on a global scale. OpenStreetMap lets you view, edit and use geographic data around the world.
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed

**Terms of use**

> OpenStreetMap is open data : you are free to use it for ant purpose as long as you credit OpenStreetMap and its contributers. If you modify or rely data in any way, you may distribute the result only under the same license. (...)

**Contributors**

> (...) Our contributors incloude enthusiastic mapmakers, GIS professional, engineers running OSM servers, humanitarians mapping disaster-stricken areas and many mmore.(...)

### Display and interactive map

The two main packages that allow to display as interactive map based on OSM are `leaflet` [@leaflet] and `mapview` [@mapview].

#### `leaflet`

<img src="img/logo_leaflet.png" align="right" width="200"/> `leaflet` uses the javascript library Leaflet [@JS-Leaflet] to create interactive maps.

```{r leaflet, eval=TRUE, cache = FALSE, class.output="code-out", warning=FALSE, message=FALSE}
library(sf)
library(leaflet)

district <- st_read("data_cambodia/cambodia.gpkg", layer = "district", quiet = TRUE)
hospital <- st_read("data_cambodia/cambodia.gpkg", layer = "hospital", quiet = TRUE)


banan <- district[district$ADM2_PCODE == "KH0201", ]     #Select one district (Banan district: KH0201)
health_banan <- hospital[hospital$DCODE == "201", ]      #Select Health centers in Banan

banan <- st_transform(banan, 4326)                       #Transform coordinate system to WGS84
health_banan <- st_transform(health_banan, 4326)

banan_map <- leaflet(banan) %>%                          #Create interactive map
  addTiles() %>%
  addPolygons() %>%
  addMarkers(data = health_banan)
banan_map
```

::: {.callout-note appearance="minimal" icon="false"}
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
Website of `leaflet`\
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
[Leaflet for R](https://rstudio.github.io/leaflet/)
:::

#### `mapview`

<img src="img/logo_mapview.gif" align="right" width="150"/> `mapview` relies on `leaflet` to create interactive maps, its use is easier and its documentation is a bit dense.

```{r mapview, cache=FALSE, class.output="code-out", warning=FALSE, message=FALSE}
library(mapview)
mapview(banan) + mapview(health_banan)
```

::: {.callout-note appearance="minimal" icon="false"}
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
Website of `mapview`\
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
[`mapview`](https://r-spatial.github.io/mapview/)
:::

### Import basemaps

The package `maptiles` [@maptiles] allows downlaoding and displaying raster basemaps.\
The function `get_tiles()` allow you to download OSM background maps and the function `plot_tiles()` allows to display them.\
Renders are better if the input data used the same coordinate system as the tiles ([EPSG:3857](https://github.com/riatelab/maptiles#projection)).

```{r display_point, fig.width=6, fig.height=6, eval=TRUE, class.output="code-out", warning=FALSE, message=FALSE}
library(sf)
library(maptiles)
district <- st_read("data_cambodia/cambodia.gpkg", layer = "district", quiet = TRUE)
district <- st_transform(district, 3857)
osm_tiles <- get_tiles(x = district, zoom = 10, crop = TRUE)
plot_tiles(osm_tiles)
plot(st_geometry(district), border = "grey20", lwd = .7, add = TRUE)
mtext(side = 1, line = -2, text = get_credit("OpenStreetMap"), col="tomato")
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
```

### Import OSM data

#### `osmdata`

<img src="img/logo_osmdata.png" align="right" height="140" width="122/"/> The package `osmdata` [@osmdata] allows extracting vector data from OSM using the [Overpass turbo](https://wiki.openstreetmap.org/wiki/Overpass_turbo) API.

```{r osmdata, eval=TRUE, class.output="code-out", warning=FALSE, message=FALSE}
library(sf)
library(osmdata)
library(sf)

country <- st_read("data_cambodia/cambodia.gpkg", layer = "country", quiet = TRUE)
ext <- opq(bbox = st_bbox(st_transform(country, 4326)))                    #Define the bounding box
query <- add_osm_feature(opq = ext, key = 'amenity', value = "hospital")   #Health Center Extraction
hospital <- osmdata_sf(query)
hospital <- unique_osmdata(hospital)                                       #Result reduction (points composing polygon are detected)
```

lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
The result contains a point layer and a polygon layer. The polygon layer contains polygons that represent hospitals. To obtain a coherent point layer we can use the centroids of the polygons.
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
```{r echo=FALSE, eval=TRUE, warning=FALSE, error=FALSE}
sf_use_s2(FALSE)
```

lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
```{r osmdata2, class.output="code-out", warning=FALSE, message=FALSE}
hospital_point <- hospital$osm_points
hospital_poly <- hospital$osm_polygons                                                             #Extracting centroids of polygons
hospital_poly_centroid <- st_centroid(hospital_poly)

cambodia_point <- intersect(names(hospital_point), names(hospital_poly_centroid))                  #Identify fields in Cambodia boundary
hospitals <- rbind(hospital_point[, cambodia_point], hospital_poly_centroid[, cambodia_point])     #Gather the 2 objects
```

**Result display**

```{r osmdatadisplay, cache=FALSE, class.output="code-out", warning=FALSE, message=FALSE}
library(mapview)
mapview(country) + mapview(hospitals)
```

::: {.callout-note appearance="minimal" icon="false"}
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
Website of `osmdata`\
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
[`osmdata`](https://docs.ropensci.org/osmdata/)
:::

#### `osmextract`

<img src="img/logo_osmextract.svg" align="right" width="120"/> The package `osmextract` [@osmextract] allows to extract data from an OSM database directly. This package make it possible to work on very large volumes of data.

::: {.callout-note appearance="minimal" icon="false"}
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
Website of `osmextract`\
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
[`osmextract`](https://docs.ropensci.org/osmextract/)
:::

For administrative boundaries, check [here](https://wiki.openstreetmap.org/wiki/Tag:boundary%3Dadministrative#10_admin_level_values_for_specific_countries) the administrative levels by country:

```{r, class.output="code-out", warning=FALSE, message=FALSE}
library(osmextract)
library(mapsf)
province <- oe_get(
  place = "Cambodia",
  download_directory = "data_cambodia/",
  layer = "multipolygons",
  extra_tags = c("wikidata", "ISO3166-2", "wikipedia", "name:en"),
  vectortranslate_options = c(
    "-t_srs", "EPSG:32648",
    "-nlt", "PROMOTE_TO_MULTI",
    "-where", "type = 'boundary' AND boundary = 'administrative' AND admin_level = '4'"
  ))

mf_map(x = province)
```

```{r, class.output="code-out", warning=FALSE, message=FALSE}
roads <- oe_get(
  place = "Cambodia",
  download_directory = "data_cambodia/",
  layer = "lines",
  extra_tags = c("access", "service", "maxspeed"),
  vectortranslate_options = c(
  "-t_srs", "EPSG:32648",
  "-nlt", "PROMOTE_TO_MULTI",
  "-where", "
    highway IS NOT NULL
    AND
    highway NOT IN (
    'abandonded', 'bus_guideway', 'byway', 'construction', 'corridor', 'elevator',
    'fixme', 'escalator', 'gallop', 'historic', 'no', 'planned', 'platform',
    'proposed', 'cycleway', 'pedestrian', 'bridleway', 'footway',
    'steps', 'path', 'raceway', 'road', 'service', 'track'
    )
    "
),
  boundary = subset(province, name_en == "Phnom Penh"),
  boundary_type = "clipsrc"
)
mf_map(x = roads)
```

lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
## Import from lat / long file

The function `st_as_sf()` makes it possible to transform a `data.frame` container of geographic coordinates into an object `sf`. Here we use the `data.frame` `places2` created in the previous point.

```{r build_sf, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
library(sf)
place_sf <- st_as_sf(read.csv("data_cambodia/adress.csv"), coords = c("long", "lat"), crs = 4326)
place_sf
```

```{r echo=FALSE, eval=TRUE}
sf_use_s2(FALSE)
```

To create a `sf` POINT type object with only one pair of coordinate (WGS84, longitude=0.5, latitude = 45.5) :

```{r build_sf2, eval = TRUE, echo = TRUE, nm = TRUE, fig.width=8, class.output="code-out", warning=FALSE, message=FALSE}
library(sf)
test_point <- st_as_sf(data.frame(x = 0.5, y = 45.5), coords = c("x", "y"), crs = 4326)
test_point
```

We can display this object `sf` on an [OpenStreetMap](https://www.openstreetmap.org/) basesmap with the package maptiles `maptiles` [@maptiles].

```{r display_sf, eval = TRUE, echo = TRUE, nm = TRUE, out.width="788px", warning=FALSE, message=FALSE}
library(maptiles)
osm <- get_tiles(x = place_sf, zoom = 12)
plot_tiles(osm)
plot(st_geometry(place_sf), pch = 2, cex = 2, col = "red", add = TRUE)
```

lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
## Geocoding

Serveral pakages alow you to geocode addresses. <img src="img/logo_tidygeocoder.png" align="right" width="120"/> The package `tidygeocoder` [@tidygeocoder] allow the use of a [large number of online geocoding sevices](https://jessecambon.github.io/tidygeocoder/articles/geocoder_services.html). The package `banR` [@banR], which is based on the [National Address Base](https://adresse.data.gouv.fr/), is the particularly suitable for geocoding addresses in France.

### `tidygeocoder`

```{r tidygeocoder, eval=TRUE, class.output="code-out", warning=FALSE, message=FALSE}
library(tidygeocoder)
test_adresses <- data.frame(
  address = c("Phnom Penh International Airport, Phnom Penh, Cambodia",
              "Khmer Soviet Friendship Hospital, Phnom Penh, Cambodia"))
places1 <- geocode(test_adresses, address)
places1
```

::: {.callout-note appearance="minimal" icon="false"}
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
Website by `tidygeocoder` :\
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
[tidygeocoder](https://jessecambon.github.io/tidygeocoder/)
:::

### `banR` (Base Adresse Nationale)

```{r banR, eval=TRUE, class.output="code-out", warning=FALSE, message=FALSE}
# remotes::install_github("joelgombin/banR")
library(banR)
mes_adresses <- data.frame(
  address = c("19 rue Michel Bakounine, 29600 Morlaix, France",
              "2 Allee Emile Pouget, 920128 Boulogne-Billancourt")
)
places2 <- geocode_tbl(tbl = mes_adresses, adresse = address)
places2
```

::: {.callout-note appearance="minimal" icon="false"}
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
Website of `banR` :\
lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
[An R client for the BAN API](http://joelgombin.github.io/banR/)
:::

## Digitization

The package `mapedit` [@mapedit] allows you to digitize base map directly in R. Although it can be practical in some cases, in package cannot replace the functionalities of a GIS for important digitization tasks.

lucas.longour_ird.fr's avatar
lucas.longour_ird.fr committed
![<small>Gif taken from [`mapedit` website](https://r-spatial.org/r/2019/03/31/mapedit_leafpm.html)</small>](img/mapedit-leafpm-1.gif)