Understanding the json config file

All the global paramters for the modspa processing chain are located in the json (/modspa_pixel/config/config_modspa.json) configuration file.

This allows you to choose where and how to run the modspa_pixel processing chain. Different parts of the processing chain are detailed in the inputs and samir sections. You can have multiple configuration files with name extensions (.name_ext corresponds to the file config_modspa.name_ext.json), to select one configuration file for the run, use the select_config_modspa.py (in the modspa_pixel/config: directory) as follows:

(modspa_pixel) $ python select_config_modspa.py name_ext
source file: /modspa_pixel/config/config_modspa.name_ext.json
destination file: /modspa_pixel/config/config_modspa.json
/modspa_pixel/config/config_modspa.json updated

the name extension argument name_ext is used to select the file content you want and copy it in the regular /modspa_pixel/config/config_modspa.json file.

Example of a configuration file:

{
    "_comment": "Sart date of the period on which the model will run",
    "start_date": "2019-01-01",

    "_comment": "End date of the period on which the model will run",
    "end_date": "2019-12-31",

    "_comment": "Choose between parcel or pixel mode for the run. You need a parcel shapefile for the parcel mode, and a boundary shapefile for the pixel mode.",
    "mode": "parcel",

    "_comment": "Name of the current run, all output files will be saved under a subdirectory of Saves/ with that name, log file will also have that name",
    "run_name": "Aurade_parcel",

    "_comment": "Path to eodag configuration file for sentinel-2 image download",
    "path_to_eodag_config_file": "/home/auclairj/.config/eodag/eodag.yml",

    "_comment": "Prefered S2 data provider, choices = theia, copernicus",
    "preferred_provider": "copernicus",

    "_comment": "Maximum cloud cover percentage to download data, images with higher cloud cover will not be downloaded",
    "cloud_cover_limit": 80,

    "_comment": "Path to the directory on which the satellite image data will be downloaded",
    "download_path": "/mnt/e/DATA",

    "_comment": "Output path for output netcdf files (outpus will be stored in a subdirectory carrying the run_name)",
    "output_path": "/mnt/e/DATA/OUTPUT",

    "_comment": "Output path for netcdf era5 files (Weather)",
    "era5_path": "/mnt/e/DATA/WEATHER",

    "_comment": "Path to soil netCDF4 dataset",
    "soil_path": "/mnt/e/DATA/SOIL/Aurade_parcel/Soil_interpolated.nc",

    "_comment": "Path to Land Cover file (netCDF4 or GeoTiff)",
    "land_cover_path": "/mnt/e/DATA/LAND_COVER/Aurade_parcel/Aurade_parcel_new_LC.nc",

    "_comment": "Path to the shapefile to run the model on",
    "shapefile_path": "/home/auclairj/notebooks/Shapefiles/Aurade_parcel/Aurade_parcel.shp",

    "_comment": "Path to SAMIR parameter csv file",
    "param_csv_file": "/home/auclairj/GIT/modspa_pixel/parameters/csv_files/params_samir.Aurade_test.csv",

    "_comment": "Resolution in meters in which to run the processing chain. Choice is either 10 or 20.",
    "resolution": 10,

    "_comment": "Overwrite NDVI images or not (set to true if you want the code to rewrite NDVI images, takes longer)",
    "ndvi_overwrite": false,

    "_comment": "Overwrite Weather dataset or not (set to true if you want the code to rewrite weather dataset, takes longer)",
    "weather_overwrite": false,

    "_comment": "Boolean to choose to automatically open the Dask Dashboard on default brower",
    "open_browser": false,

    "_comment": "Max number of processor cores to use for multiprocessing calculations",
    "max_cpu": 3,

    "_comment": "Max amount of RAM memory (in GiB) to use for calculations",
    "max_ram": 16,

    "_comment": "List of variables to save as shapefiles for the parcel mode",
    "parcel_shapefile_vars": ["ETR"]
}

Parameter detail:

  1. start_date: str, start date of the simulation in YYYY-MM-DD format.

  2. end_date: str, end date of the simulation in YYYY-MM-DD format.

  3. mode: str, parameter to choose between the 'pixel' or 'parcel' mode. Both modes require a shapefile. A simple box is enough for the 'pixel' mode (python only takes the total bounds of the shapefile). For the 'parcel' mode, a shapefile with polygons representing each parcel and an attribute representing the land cover (for each polygon) is necessary.

  4. run_name: all preprocessed input files and output files will be saved in a directory with this name. Files will also carry the run_name in their name.

  5. path_to_eodag_config_file: str, path to the configuration file of the eodag module. This is needed to download sentinel-2 images to your machine.

  6. preferred_provider: str, choose between Copernicus or Theia sentinel-2 images (the atmospheric corrections are different for the two providers).

  7. cloud_cover_limit: int, the eodag module has an option to filter out images with more than a certain percentage of cloud cover (0 % is no clouds). These images will not be downloaded.

  8. download_path: str, path to directory where the raw input data will be downloaded. Subdirectories will be created for each input type (optical imagery, weather data, soil data, land cover data).

Warning

For large spatial windows or over long time windows, make sure to have enough disk space for all the input (raw and preprocessed).

  1. output_path: str, output path for output netcdf files (outpus will be stored in a subdirectory carrying the run_name).

  2. era5_path: str, path to directory where ERA5-Land data will be downloaded and processed.

  3. soil_path: str, path to soil netCDF4 file containing the Wilting Point and Field Capacity rasters.

  4. land_cover_path: str, path to land cover raster (netCDF4 or GeoTiff). The land cover should have integer values ranging from 1 to the number of classes, each integer representing a class.

  5. shapefile_path: str, path to open the shapefile that defines your parcels (in parcel mode) or delimits the window on which you want to run the model (in pixel mode).

  6. param_csv_file: str, path to SAMIR csv parameter file.

  7. resolution: int, resolution in meters in which to run the processing chain.

Warning

only 10 and 20 meters are currently handled.

  1. ndvi_overwrite: bool, choose weather to rewrite the NDVI data cube (true = rewrite, false = don't rewrite).

Warning

The calculation of the NDVI data cube can take some time for large spatial windows and long time windows depending on your machine (up to an hour or more).

  1. open_browser: bool, boolean to choose to automatically open the Dask Dashboard on default brower when running the 'pixel' ndvi calculation functions.

  2. max_cpu: int, maximum number of processor cores to use for calculation, the more the faster operations will be completed. Beware not to overload your machine. Set wisely when using the chain on a high performance cluster.

  3. max_ram: int, maximum amount of RAM memory (in GiB) to allocate for calculation, the more the faster operations will be completed. Large spatial windows or long time windows can make the chain very hungry for RAM, make sure to set an amount that your machine/cluster can support. If the amount you set is higher than the actual available memory on your machine (if you have other programs using memory for example), the script will tell you so and exit without running anything. Make sure that the amount you set is in coherence with the available memory at the moment you run the scripts.

  4. parcel_shapefile_vars: List[str]; list of variables to save as shapefiles for the parcel mode. One shapefile per variable will be written in the output directory.

Note

A graphical interface to modify the json configuration file is planned for later versions.