Skip to content
Snippets Groups Projects
README.md 3.8 KiB
Newer Older
# GrAnnoT

GrAnnoT is an annotation transfer tool for pangenome graphs. It can transfer linear genome annotations to a pangenome graph containing the genome, and also transfer the pangenome graph's annotations on the genomes it contains. It also outputs complementary information such as the alignments of the transfered genes, or a presence-absence matrix.
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
This project is young and in development, some errors will be corrected, improvements will be made and new features will appear in the near future.
## Installation

To install GrAnnoT, you simply need to download a file from [/GrAnnoT/dist](/GrAnnoT/dist/), and run ```pip install downloaded_file``` in your terminal.

To get the latest version you can also clone grannot repository and run ```python3 -m pip install path/to/grannot/```.

GrAnnoT requires [python 3.10](https://www.python.org/downloads/release/python-3100/) and [bedtools](https://bedtools.readthedocs.io/en/latest/content/installation.html).
GrAnnoT requires the graph to be in GFA format with W-lines or P-lines that describe the walks or paths of the genomes. For example minigraph-cactus or VG give that kind of output. If the graph has P-lines, the names of the paths must look like this : ```IRGSP#0#Chr1```, with three fields being the sample name, the haplotype, and the sequence name (chromosome or contig). 

The sequence names in the paths or walks (4th field of the W lines) must match the GFF sequence names (1st field of the GFF). 

The graph must be full (option ```-gfa full``` for minigraph cactus).

Please note that graphs built with PGGB are not yet supported. Improvements will be made in the future to include these graphs.
You can run GrAnnoT by typing ```grannot``` in your terminal with the following arguments : 
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
- GFA file  with walks
- The annotation file of a genome embedded in the graph
- The name of the source genome, that the annotation file refers to
- The option corresponding to the operation you want GrAnnoT to do. For example the option for transfering the annotation on the graph is ```-gaf``` and the option for transfering the annotation on the target genomes is ```-ann```.
- Optionnaly ```-t``` followed by the names of the target genomes (these names must be included in the walks or paths of the graph). If no name is specified, the annotation will be transfered on all the genomes in the graph.

The other options can be found [here](GrAnnoT/grannot_help.md), or by running ```grannot --help```.
### Output options
It can output the following files :
- For the graph :

    - a GFF and a GAF file with the annotation of the graph

- For each target genome :

    - the GFF file with the annotation of the target genome
    - a text file with the details of the variations within the annotated regions
    - a text file with the alignments of the genes from the source genome and from the target genome

- For all the target genomes :

    - A PAV matrix recapitulating the transfers

### Example use : 
To transfer the annotation on the graph in gaf format : \
```grannot graph.gfa annotation.gff source_genome -gaf```
To transfer the annotation on genomes in the graph with a coverage and identity > 90 : \
```grannot graph.gfa annotation.gff source_genome -ann -t target_genome1 target_genome2 -cov 90 -id 90```
To transfer the annotation on all the genomes and output a presence-absence matrix for all these transfers : \
```grannot graph.gfa annotation.gff source_genome -ann -pav```
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed

<!-- ## Future improvements : 
- [ ] Adapt the code to handle PGGB graphs
- [ ] Allow parallelization
- [ ] Transfer multiple annotations in the graph -->
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed

If you have any question or suggestion, feel free to open an issue or contact me by email : nina.marthe@ird.fr
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
If you use this program please cite Nina Marthe, Francois Sabot and Matthias Zytnicki.
GNU GPLv3 License