Preparing the weather data cube

The modspa models (SAMIR and SAFY) require preprocessed weather data to run. The SAMIR model needs spatialized precipitation and reference evapotranspiration as inputs. This requires various raw weather variables to be downloaded and processed to create a data cube with the same structure and georeferencing as the NDVI data cube. The raw weather data is downloaded and processed to produce the necessary variables.

Note

As of now, only the ERA 5 data download is implemented. It allows to download weather data all over the world with a simple python api.

The weather dataset can be automatically created with the following function:

modspa_pixel.preprocessing.download_ERA5_weather.request_ER5_weather(config_file: str, ndvi_path: str, raw_S2_image_ref: str | None = None, shapefile: str | None = None, mode: str = 'pixel') str[source]

Download ERA5 reanalysis daily weather files, concatenate and calculate ET0 to obtain a netCDF4 dataset for precipitation and ET0 values. Weather data reprojection and conversion can take some time for large spatial windows.

Arguments

  1. config_file: str

    json configuration file

  2. ndvi_path: str

    path to ndvi cube, used for weather data reprojection

  3. raw_S2_image_ref: str default = None

    unmodified sentinel-2 image at correct resolution for weather data reprojection in pixel mode

  4. shapefile: str default = None

    path to shapefile for extraction in parcel mode

  5. mode: str default = 'pixel'

    choose between 'pixel' and 'parcel' mode

Returns

  1. weather_file: str

    path to netCDF4 file containing weather data

Precipitation

No special calculation is necessary for this variable. Only the reprojection to the NDVI data cube is necessary. This variable is also used to calculate the reference evapotranspiration.

Reference evapotranspiration

Reference crop evapotranspiration or reference evapotranspiration is the estimation of the evapotranspiration from the “reference surface.” The reference surface is a hypothetical grass reference crop with an assumed crop height of 0.12 m, a fixed surface resistance of 70 s/m and an albedo of 0.23. The reference surface closely resembles an extensive surface of green, well-watered grass of uniform height, actively growing and completely shading the ground.

The calculation of the reference evapotranspiration is done using the ETo module. The following variables are downloaded for this calculation:

  • 2m_temperature: air temperature 2 meters above the surface.

  • 2m_dewpoint_temperature: air dewpoint temperature 2 meters above the surface.

  • surface_solar_radiation_downward: total downward solar radiation.

  • total_precipitation: total precipitation.

  • 10m_u_component_of_wind: north/south wind speed 10 meters above the surface.

  • 10m_v_component_of_wind: east/west wind speed 10 meters above the surface.

The following function is called by the main request_ER5_weather function to download the data. It calls other necessary sub-functions for the request. It produces netCDF4 datasets for each month of the requested time window, with daily values in each of those datasets.

modspa_pixel.preprocessing.lib_era5_land_pixel.call_era5land_daily_for_MODSPA(start_date: str, end_date: str, area: List[float], output_path: str, processes: int = 9) None[source]

request ERA5-land daily variables needed for ET0 calculus and MODSPA forcing reanalysis_era5

Information on requested variables

called land surface variables :
  • 2m_temperature

  • 2m_dewpoint_temperature

  • surface_solar_radiation_downward

  • surface_net_solar_radiation

  • surface_pressure

  • mean_sea_level_pressure

  • potential_evaporation

  • evaporation

  • total_evaporation

  • total_precipitation

  • snowfall

  • 10m_u_component_of_wind

  • 10m_v_component_of_wind

Arguments

  1. start_date: str

    start date in YYYY-MM-DD format

  2. end_date: str

    end date in YYYY-MM-DD format

  3. area: List[float]

    bounding box of the demanded area area = [lat_max, lon_min, lat_min, lon_max]

  4. output_path: str

    output file name, .nc extension

  5. processes: int default = 9

    number of logical processors on which to run the download command. can be higher than your actual number of processor cores, download operations have a low CPU demand.

Returns

None

This then produces a spatialized \(ET_0\) dataset.

Formatting and preprocessing the raw data

Pixel mode

The raw dataset now has a precipitation variable and a \(ET_0\) variable.

Downloaded data from the ERA-5 Land server is projected on a WGS 84 lattitude/longitude projection, at 0.1° resolution. This data then needs to be reprojected on the same grid as the NDVI data, which is usually at a much higher resolution (of the order of 10 meters). The reprojection and clipping is done by the OTB **Superimpose** application, which has the advantage of being very efficient for large datasets. However, it only handles Geotiff images and not netCDF4 files. The low resolution dataset first has to be saved as Geotiff files (one per variable), reprojected and clipped with otbcli_Superimpose and then opened and converted to a netCDF4 file. This can take some time for large datasets.

The output dataset is the one used by the processing chain to run the models. This dataset can be create using this function:

modspa_pixel.preprocessing.lib_era5_land_pixel.era5Land_daily_to_yearly_pixel(list_era5land_files: List[str], output_file: str, raw_S2_image_ref: str, ndvi_path: str, h: float = 10, max_ram: int = 8, weather_overwrite: bool = False, remove: bool = True) str[source]

Calculate ET0 values from the ERA5 netcdf weather variables. Output netcdf contains the ET0 and precipitation values for each day in the selected time period and reprojected (reprojection run on two processors) on the same grid as the NDVI values.

Arguments

  1. list_era5land_files: List[str]

    list of netcdf files containing the necessary variables

  2. output_file: str

    output file name without extension

  3. raw_S2_image_ref: str

    raw Sentinel 2 image at right resolution for reprojection

  4. ndvi_path: str

    path to ndvi dataset, used for attributes and coordinates

  5. h: float default = 10

    height of ERA5 wind measurements in meters

  6. max_ram: int default = 8

    max ram (in GiB) for reprojection and conversion. Two subprocesses are spawned for OTB, each receiviving half of requested memory.

  7. weather_overwrite: bool default = False

    boolean to choose to overwrite weather netCDF

  8. remove: bool default = True

    weather to remove temporary files

Returns

  1. output_file_final: str

    path to netCDF4 file containing precipitation and ET0 data

\(ET_0\) calculation is done with this function:

modspa_pixel.preprocessing.lib_era5_land_pixel.calculate_ET0_pixel(pixel_dataset: Dataset, lat: float, lon: float, h: float = 10) ndarray[source]

Calculate ET0 over the year for a single pixel of the ERA5 weather dataset.

Arguments

  1. pixel_dataset: xr.Dataset

    extracted dataset that contains all information for a single pixel

  2. lat: float

    latitudinal coordinate of that pixel

  3. lon: float

    longitudinal coordinate of that pixel

  4. h: float default = 10

    height of ERA5 wind measurement in meters

Returns

  1. ET0_values: np.ndarray

    numpy array containing the ET0 values for each day

And the Geotiff conversion to netCDF4 files is done with this function:

modspa_pixel.preprocessing.lib_era5_land_pixel.combine_weather2netcdf(rain_file: str, ET0_tile: str, ndvi_path: str, save_path: str, available_ram: int) None[source]

Convert the Rain and ET0 geotiffs into a single weather netcdf dataset.

Arguments

  1. rain_file: str

    path to Rain tif

  2. ET0_tile: str

    path to ET0 tif

  3. ndvi_path: str

    path to ndvi cube

  4. save_path: str

    save path of weather netcdf dataset

  5. available_ram: int

    available ram in GiB for conversion

Returns

None

Parcel mode

In this mode the reprojection to the NDVI data grid is not necessary. A similar function to the era5Land_daily_to_yearly_pixel (it uses the same function for math:ET_0 calculation) is used to create a multiband Geotiff:

modspa_pixel.preprocessing.lib_era5_land_pixel.era5Land_daily_to_yearly_parcel(list_era5land_files: List[str], output_file: str, h: float = 108) str[source]

Calculate ET0 values from the ERA5 netcdf weather variables. Output netcdf contains the ET0 and precipitation values for each day in the selected time period.

Arguments

  1. list_era5land_files: List[str]

    list of netcdf files containing the necessary variables

  2. output_file: str

    output file name without extension

  3. h: float default = 10

    height of ERA5 wind measurements in meters

Returns

  1. output_file_rain: str

    path to Geotiff file containing precipitation data

  2. output_file_ET0: str

    path to Geotiff file containing ET0 data

Zonal extraction for each polygon is done directly on the low resolution data. An output dataframe containing the weather variables for each polygon and each date is produced (same structure as the NDVI dataframe). This is done with this function:

modspa_pixel.preprocessing.lib_era5_land_pixel.extract_weather_dataframe(rain_path: str, ET0_path: str, shapefile: str, config_file: str, save_path: str) None[source]

Extract a weather dataframe for each variable (Rain, ET0) and merge them in one dataframe. This dataframe is saved as csv file.

Arguments

  1. rain_path: str

    path to rain Geotiff file

  2. ET0_path: str

    path to ET0 Geotiff file

  3. shapefile: str

    path to shapefile

  4. config_file: str

    path to config file

  5. save_path: str

    save path for weather dataframe

Returns

None

Which calls this function with multiprocessing:

modspa_pixel.preprocessing.lib_era5_land_pixel.extract_rasterstats(args: tuple) List[float][source]

Generate a dataframe for a given raster and a geopandas shapefile object. It iterates over the features of the shapefile geometry (polygons). This information is stored in a list.

It returns a list that contains the raster values, a feature id and the date for the image and every polygon in the shapefile geometry. It also has identification data relative to the shapefile: landcover (LC), land cover identifier (id) This list is returned to be later agregated in a DataFrame.

This function is used to allow multiprocessing for weather extraction.

Arguments (packed in args: tuple)

  1. raster_path: str

    path to multiband Geotiff

  2. shapefile: str

    path to shapefile

  3. config_file: str

    path to config file

Returns

  1. raster_stats: List[float]

    list containing weather values and feature information for every polygon in the shapefile

The dataframe is then converted to an xarray dataset using this function:

modspa_pixel.preprocessing.parcel_to_pixel.convert_dataframe_to_xarray(dataframe_in: str | DataFrame, save_path: str, variables: List[str], data_types: List[str], time_dimension: bool = True) None[source]

Convert pandas dataframes of the parcel mode into xarray datasets for the model calculations. The resulting xarray dataset has dimensions: time: number of dates, y: 1, x: number of poygons (to make a 3D dataset),

or dimensions: y: 1, x: number of poygons (to make a 2D dataset)

Arguments

  1. dataframe_in: Union[str, pd.DataFrame]

    dataframe or path to dataframe to convert

  2. save_path: str

    save path of output xarray dataset

  3. variables: List[str]

    name of variables (or variable, list can have one element) to put in the ouput dataset

  4. data_types: List[str]

    xarray datatypes corresponding the the variable names, for correct saving of the dataset

  5. time_dimension: bool default = True

    boolean to indicate if the dataframe has a time dimension

Returns

None

This dataset can then be used for the model calculation.