Skip to content
Snippets Groups Projects
Jobs.rst 5.54 KiB
Newer Older
TGermain's avatar
TGermain committed
====================================
Setting up recurrent routines : Jobs
====================================

Sen2Chain uses jobs to execute whole processing operations (downloading L1C, computing L2A with Sen2Cor, masking clouds, and producing indices) on any tile.
All of Sen2Chain's functions parameters can be specified, allowing for the production of multiple products in one go.


Jobs can be launched once or at scheduled hours using crontab.


Jobs config files are stored in your ``~/sen2chain_data/config/jobs/``.

Each job is made of 2 files :

- job_jid.cfg to configure the job
- job_jid.py automatically created once job is configured

With *jid* the job identifier.


Job listing
***********

The *Jobs* class is used to list all jobs created in your Sen2Chain install.

.. code-block::

    >>> from sen2chain import Jobs
	>>> Jobs()
		   job_id  config_file  python_script logging       timing cron_status cron_timing
	0  0123456789         True           True   False    0 5 * * *      absent        None
	1         335         True           True   False  10 10 * * *      absent        None
	2         012         True          False   False    * * * * *      absent        None
	3         tes         True          False   False    * * * * *      absent        None
	```

Jobs can be removed using the *remove* function and their *jid* identifier

.. code-block::

    >>> Jobs().remove("335")
	10094:2022-03-17 17:03:35:INFO:sen2chain.jobs:Removing Python script...
	10094:2022-03-17 17:03:35:INFO:sen2chain.jobs:Removing config file...


Job
***

Create a new Job
-----------------

To create a new Job or select an existing one use the command line ``Job(jid="jid")``

.. code-block:: python
    
    from sen2chain import Job
    j=Job(jid="test")
    
    
This command creates a configuration file in ``~/sen2chain_data/config/jobs/`` :

.. code-block:: bash

	logs = True
	timing = 0 20 * * *
	provider = peps
	tries = 2
	sleep = 4
	nb_proc = 18
	copy_L2A_sideproducts = False
	clean_before = True
	clean_after = True

	tile;date_min;date_max;max_clouds;l1c;l2a;cloudmasks;indices;remove;comments
	40KCB;;today;80;True;True;CM004-CSH1-CMP1-CHP1-TCI1-ITER1;NDVI/NDWIGAO/MNDWI/NDRE/IRECI/BIGR/BIRNIR/BIBG/EVI/NBR;l1c/l2a;

The configuration file first section is a list of global parameters for the job execution :

- **logs**: True | False
- **timing**: Recurrence of the job when added to cron, in cron format
- **provider**: Provider to download L1C products from, Default peps, values: peps | scihub
- **tries**: Number of tries the download should loop before stopping, to download OFFLINE products
- **sleep**: Time in min to wait between download tries
- **nb_proc**: Number of cpu cores to use for this job, Default 8
- **copy_L2A_side_products**: Duplicate *msk_cldprb_20m* and *scl_20m* from L2A folder to Cloudmask folder after L2A production. Interesting if you plan to remove L2A to save disk space, but want to keep these 2 files for cloudmask generation and better extraction. Possible values: True | False
- **clean_before / clean_after**: Clean corrupted files in Tile folder, before and/or after job execution : True | False

The second section of the configuration file is a list of tasks that will be processed on a single Tile when the Job is executed :

- **tile**: tile identifier, format --XXX, comment line using ! before tile name
- **date_min**: Start date for this task, possible values: empty (2015-01-01 will be used) | any date | today-xx (xx nb of days before today to consider)
- **date_max**: Last date for this task, possible values: empty (9999-12-31 will be used) | any date | today
- **max_clouds**: Max cloud cover to consider for downloading images
- **l1c**: Download l1c: True|False
- **l2a**: Compute L2A with sen2chain: True | False
- **cloudmasks**: Cloudmask(s) to compute and mask indice(s). Possible values range from none (False) to multiple cloudmasks: False | CM001/CM002/CM003-PRB1-ITER5/CM004-CSH1-CMP1-CHP1-TCI1-ITER0/etc.
- **indices**: False | All | NDVI/NDWIGAO/etc.
- **remove**: Remove downloaded L1C and/or produced L2A when task is done, possible values: False | l1c | l2a | l1c/l2a
- **comments**: free user comments, ie tile name, etc.

Configure Job
---------------

To configure a job with a large number of tasks on different tiles, we recommend manually editing the configuration file  ``nano ~/sen2chain_data/config/jobs/job_jid.cfg``


Make sure to keep the same table structure. A job can also be configured with command lines. 


Add a task to a job config file with the *task_add()* method :

.. code-block:: python

    >>>from sen2chain import Job
    >>>j=Job(jid="jid")
    INFO:sen2chain.jobs:Reading existing config...
    >>>j.task_add()
    
Edit a task with *task_edit(task_id, **kwargs)* :

.. code-block:: python

    >>> j.task_edit(task_id=0,tile='40KEC',remove='l1C')

Remove a task with *task_remove(task_id)*

.. code-block:: python

    >>> j.task_remove(task_id=0)
    

Save and Launch a Job
---------------------
Save the config file to your local database. If the job you created is not saved, you will not be able to load it next time.

.. code-block:: python

    j.save()
    
To launch a job directly from the command line :

.. code-block:: python

    j.run()
    
    
Job in cron
-----------

You can add, deactivate and delete a job in cron with *cron_enable()*, *cron_disable()* and *cron_remove()*.

Cron will run the job at the frequency specified by the cron parameter (minute - hour - day - month - weekday) in ``job_jid.cfg``.

.. code-block:: python

    j.save()
    j.cron_enable()
    j.cron_disable()
    j.cron_remove()