GrAnnoT
GrAnnoT is an annotation transfer tool for pangenome graphs. It can transfer linear genome annotations to a pangenome graph containing the genome, and also transfer the pangenome graph's annotations on the genomes it contains. It also outputs complementary information such as the alignments of the transfered genes, or a presence-absence matrix.
This project is young and in development, some errors will be corrected, improvements will be made and new features will appear in the near future.
Installation
To install GrAnnoT, you simply need to download a file from /GrAnnoT/dist, and run pip install downloaded_file
in your terminal.
To get the latest version you can also clone grannot repository and run python3 -m pip install path/to/grannot/
.
GrAnnoT requires python 3.10 and bedtools.
Usage
GrAnnoT requires the graph to be in GFA format with W-lines or P-lines that describe the walks or paths of the genomes. For example minigraph-cactus or VG give that kind of output. If the graph has P-lines, the names of the paths must look like this : IRGSP#0#Chr1
, with three fields being the sample name, the haplotype, and the sequence name (chromosome or contig).
The sequence names in the paths or walks (4th field of the W lines) must match the GFF sequence names (1st field of the GFF).
The graph must be full (option -gfa full
for minigraph cactus).
Please note that graphs built with PGGB are not yet supported. Improvements will be made in the future to include these graphs.
You can run GrAnnoT by typing grannot
in your terminal with the following arguments :
- GFA file with walks
- The annotation file of a genome embedded in the graph
- The name of the source genome, that the annotation file refers to
- The option corresponding to the operation you want GrAnnoT to do. For example the option for transfering the annotation on the graph is
-gaf
and the option for transfering the annotation on the target genomes is-ann
. - Optionnaly
-t
followed by the names of the target genomes (these names must be included in the walks or paths of the graph). If no name is specified, the annotation will be transfered on all the genomes in the graph.
The other options can be found here, or by running grannot --help
.
Output options
It can output the following files :
-
For the graph :
- a GFF and a GAF file with the annotation of the graph
-
For each target genome :
- the GFF file with the annotation of the target genome
- a text file with the details of the variations within the annotated regions
- a text file with the alignments of the genes from the source genome and from the target genome
-
For all the target genomes :
- A PAV matrix recapitulating the transfers
Example use :
To transfer the annotation on the graph in gaf format :
grannot graph.gfa annotation.gff source_genome -gaf
To transfer the annotation on genomes in the graph with a coverage and identity > 90 :
grannot graph.gfa annotation.gff source_genome -ann -t target_genome1 target_genome2 -cov 90 -id 90
To transfer the annotation on all the genomes and output a presence-absence matrix for all these transfers :
grannot graph.gfa annotation.gff source_genome -ann -pav
Support
If you have any question or suggestion, feel free to open an issue or contact me by email : nina.marthe@ird.fr
Authors
If you use this program please cite Nina Marthe, Francois Sabot and Matthias Zytnicki.
GNU GPLv3 License