Preparing the weather data cube
The modspa models (SAMIR and SAFY) require preprocessed weather data to run. The SAMIR model needs spatialized precipitation and reference evapotranspiration as inputs. This requires various raw weather variables to be downloaded and processed to create a data cube with the same structure and georeferencing as the NDVI data cube. The raw weather data is downloaded and processed to produce the necessary variables.
Note
As of now, only the ERA 5 data download is implemented. It allows to download weather data all over the world with a simple python api.
The weather dataset can be automatically created with the following function:
- modspa_pixel.preprocessing.download_ERA5_weather.request_ER5_weather(config_file: str, ndvi_path: str, raw_S2_image_ref: str | None = None, shapefile: str | None = None, mode: str = 'pixel') str [source]
Download ERA5 reanalysis daily weather files, concatenate and calculate ET0 to obtain a netCDF4 dataset for precipitation and ET0 values. Weather data reprojection and conversion can take some time for large spatial windows.
Arguments
- config_file:
str
json configuration file
- config_file:
- ndvi_path:
str
path to ndvi cube, used for weather data reprojection
- ndvi_path:
- raw_S2_image_ref:
str
default = None
unmodified sentinel-2 image at correct resolution for weather data reprojection in pixel mode
- raw_S2_image_ref:
- shapefile:
str
default = None
path to shapefile for extraction in parcel mode
- shapefile:
- mode:
str
default = 'pixel'
choose between
'pixel'
and'parcel'
mode
- mode:
Returns
- weather_file:
str
path to netCDF4 file containing weather data
- weather_file:
Precipitation
No special calculation is necessary for this variable. Only the reprojection to the NDVI data cube is necessary. This variable is also used to calculate the reference evapotranspiration.
Reference evapotranspiration
Reference crop evapotranspiration or reference evapotranspiration is the estimation of the evapotranspiration from the “reference surface.” The reference surface is a hypothetical grass reference crop with an assumed crop height of 0.12 m, a fixed surface resistance of 70 s/m and an albedo of 0.23. The reference surface closely resembles an extensive surface of green, well-watered grass of uniform height, actively growing and completely shading the ground.
The calculation of the reference evapotranspiration is done using the ETo module. The following variables are downloaded for this calculation:
2m_temperature: air temperature 2 meters above the surface.
2m_dewpoint_temperature: air dewpoint temperature 2 meters above the surface.
surface_solar_radiation_downward: total downward solar radiation.
total_precipitation: total precipitation.
10m_u_component_of_wind: north/south wind speed 10 meters above the surface.
10m_v_component_of_wind: east/west wind speed 10 meters above the surface.
The following function is called by the main request_ER5_weather
function to download the data. It calls other necessary sub-functions for the request. It produces netCDF4 datasets for each month of the requested time window, with daily values in each of those datasets.
- modspa_pixel.preprocessing.lib_era5_land_pixel.call_era5land_daily_for_MODSPA(start_date: str, end_date: str, area: List[float], output_path: str, processes: int = 9) None [source]
request ERA5-land daily variables needed for ET0 calculus and MODSPA forcing reanalysis_era5
Information on requested variables
- called land surface variables :
2m_temperature
2m_dewpoint_temperature
surface_solar_radiation_downward
surface_net_solar_radiation
surface_pressure
mean_sea_level_pressure
potential_evaporation
evaporation
total_evaporation
total_precipitation
snowfall
10m_u_component_of_wind
10m_v_component_of_wind
Arguments
- start_date:
str
start date in YYYY-MM-DD format
- start_date:
- end_date:
str
end date in YYYY-MM-DD format
- end_date:
- area:
List[float]
bounding box of the demanded area area = [lat_max, lon_min, lat_min, lon_max]
- area:
- output_path:
str
output file name,
.nc
extension
- output_path:
- processes:
int
default = 9
number of logical processors on which to run the download command. can be higher than your actual number of processor cores, download operations have a low CPU demand.
- processes:
Returns
None
This then produces a spatialized \(ET_0\) dataset.
Formatting and preprocessing the raw data
Pixel mode
The raw dataset now has a precipitation variable and a \(ET_0\) variable.
Downloaded data from the ERA-5 Land server is projected on a WGS 84 lattitude/longitude projection, at 0.1° resolution. This data then needs to be reprojected on the same grid as the NDVI data, which is usually at a much higher resolution (of the order of 10 meters). The reprojection and clipping is done by the OTB
**Superimpose** application, which has the advantage of being very efficient for large datasets. However, it only handles Geotiff images and not netCDF4 files. The low resolution dataset first has to be saved as Geotiff files (one per variable), reprojected and clipped with otbcli_Superimpose
and then opened and converted to a netCDF4 file. This can take some time for large datasets.
The output dataset is the one used by the processing chain to run the models. This dataset can be create using this function:
- modspa_pixel.preprocessing.lib_era5_land_pixel.era5Land_daily_to_yearly_pixel(list_era5land_files: List[str], output_file: str, raw_S2_image_ref: str, ndvi_path: str, h: float = 10, max_ram: int = 8, weather_overwrite: bool = False, remove: bool = True) str [source]
Calculate ET0 values from the ERA5 netcdf weather variables. Output netcdf contains the ET0 and precipitation values for each day in the selected time period and reprojected (reprojection run on two processors) on the same grid as the NDVI values.
Arguments
- list_era5land_files:
List[str]
list of netcdf files containing the necessary variables
- list_era5land_files:
- output_file:
str
output file name without extension
- output_file:
- raw_S2_image_ref:
str
raw Sentinel 2 image at right resolution for reprojection
- raw_S2_image_ref:
- ndvi_path:
str
path to ndvi dataset, used for attributes and coordinates
- ndvi_path:
- h:
float
default = 10
height of ERA5 wind measurements in meters
- h:
- max_ram:
int
default = 8
max ram (in GiB) for reprojection and conversion. Two subprocesses are spawned for OTB, each receiviving half of requested memory.
- max_ram:
- weather_overwrite:
bool
default = False
boolean to choose to overwrite weather netCDF
- weather_overwrite:
- remove:
bool
default = True
weather to remove temporary files
- remove:
Returns
- output_file_final:
str
path to
netCDF4
file containing precipitation and ET0 data
- output_file_final:
\(ET_0\) calculation is done with this function:
- modspa_pixel.preprocessing.lib_era5_land_pixel.calculate_ET0_pixel(pixel_dataset: Dataset, lat: float, lon: float, h: float = 10) ndarray [source]
Calculate ET0 over the year for a single pixel of the ERA5 weather dataset.
Arguments
- pixel_dataset:
xr.Dataset
extracted dataset that contains all information for a single pixel
- pixel_dataset:
- lat:
float
latitudinal coordinate of that pixel
- lat:
- lon:
float
longitudinal coordinate of that pixel
- lon:
- h:
float
default = 10
height of ERA5 wind measurement in meters
- h:
Returns
- ET0_values:
np.ndarray
numpy array containing the ET0 values for each day
- ET0_values:
And the Geotiff conversion to netCDF4 files is done with this function:
- modspa_pixel.preprocessing.lib_era5_land_pixel.combine_weather2netcdf(rain_file: str, ET0_tile: str, ndvi_path: str, save_path: str, available_ram: int) None [source]
Convert the Rain and ET0 geotiffs into a single weather netcdf dataset.
Arguments
- rain_file:
str
path to Rain tif
- rain_file:
- ET0_tile:
str
path to ET0 tif
- ET0_tile:
- ndvi_path:
str
path to ndvi cube
- ndvi_path:
- save_path:
str
save path of weather netcdf dataset
- save_path:
- available_ram:
int
available ram in GiB for conversion
- available_ram:
Returns
None
Parcel mode
In this mode the reprojection to the NDVI data grid is not necessary. A similar function to the era5Land_daily_to_yearly_pixel
(it uses the same function for math:ET_0 calculation) is used to create a multiband Geotiff:
- modspa_pixel.preprocessing.lib_era5_land_pixel.era5Land_daily_to_yearly_parcel(list_era5land_files: List[str], output_file: str, h: float = 108) str [source]
Calculate ET0 values from the ERA5 netcdf weather variables. Output netcdf contains the ET0 and precipitation values for each day in the selected time period.
Arguments
- list_era5land_files:
List[str]
list of netcdf files containing the necessary variables
- list_era5land_files:
- output_file:
str
output file name without extension
- output_file:
- h:
float
default = 10
height of ERA5 wind measurements in meters
- h:
Returns
- output_file_rain:
str
path to
Geotiff
file containing precipitation data
- output_file_rain:
- output_file_ET0:
str
path to
Geotiff
file containing ET0 data
- output_file_ET0:
Zonal extraction for each polygon is done directly on the low resolution data. An output dataframe containing the weather variables for each polygon and each date is produced (same structure as the NDVI dataframe). This is done with this function:
- modspa_pixel.preprocessing.lib_era5_land_pixel.extract_weather_dataframe(rain_path: str, ET0_path: str, shapefile: str, config_file: str, save_path: str) None [source]
Extract a weather dataframe for each variable (Rain, ET0) and merge them in one dataframe. This dataframe is saved as
csv
file.Arguments
- rain_path:
str
path to rain Geotiff file
- rain_path:
- ET0_path:
str
path to ET0 Geotiff file
- ET0_path:
- shapefile:
str
path to shapefile
- shapefile:
- config_file:
str
path to config file
- config_file:
- save_path:
str
save path for weather dataframe
- save_path:
Returns
None
Which calls this function with multiprocessing:
- modspa_pixel.preprocessing.lib_era5_land_pixel.extract_rasterstats(args: tuple) List[float] [source]
Generate a dataframe for a given raster and a geopandas shapefile object. It iterates over the features of the shapefile geometry (polygons). This information is stored in a list.
It returns a list that contains the raster values, a feature
id
and the date for the image and every polygon in the shapefile geometry. It also has identification data relative to the shapefile: landcover (LC
), land cover identifier (id
) This list is returned to be later agregated in aDataFrame
.This function is used to allow multiprocessing for weather extraction.
Arguments (packed in args:
tuple
)- raster_path:
str
path to multiband Geotiff
- raster_path:
- shapefile:
str
path to shapefile
- shapefile:
- config_file:
str
path to config file
- config_file:
Returns
- raster_stats:
List[float]
list containing weather values and feature information for every polygon in the shapefile
- raster_stats:
The dataframe is then converted to an xarray dataset using this function:
- modspa_pixel.preprocessing.parcel_to_pixel.convert_dataframe_to_xarray(dataframe_in: str | DataFrame, save_path: str, variables: List[str], data_types: List[str], time_dimension: bool = True) None [source]
Convert
pandas dataframes
of the parcel mode intoxarray datasets
for the model calculations. The resulting xarray dataset has dimensions:time: number of dates
,y: 1
,x: number of poygons
(to make a 3D dataset),or dimensions:
y: 1
,x: number of poygons
(to make a 2D dataset)Arguments
- dataframe_in:
Union[str, pd.DataFrame]
dataframe or path to dataframe to convert
- dataframe_in:
- save_path:
str
save path of output xarray dataset
- save_path:
- variables:
List[str]
name of variables (or variable, list can have one element) to put in the ouput dataset
- variables:
- data_types:
List[str]
xarray datatypes corresponding the the variable names, for correct saving of the dataset
- data_types:
- time_dimension:
bool
default = True
boolean to indicate if the dataframe has a time dimension
- time_dimension:
Returns
None
This dataset can then be used for the model calculation.