Skip to content
Snippets Groups Projects
Name Last commit Last update
GrAnnoT
.gitignore
LICENSE
README.md
pyproject.toml

GrAnnoT

GrAnnoT is an annotation transfer tool for pangenome graphs. It can transfer genome annotations to a pangenome graph containing the genome, and also transfer the pangenome graph's annotations on the genomes it contains. It also outputs complementary information such as the alignments of the transfered genes, or a presence-absence matrix.

This project is young and in development, some errors will be corrected, improvements will be made and new features will appear in the near future.

Usage

GrAnnoT requires the graph to be in GFA format with W-lines or P-lines that describe the walks or paths of the genomes. For example minigraph-cactus or VG give that kind of output. If the graph has P-lines, the names of the paths must have the form IRGSP#0#Chr1 with three fields being the genome name, the haplotype, and the chromosome/contig name. The graphs must be full (option -gfa full for minigraph cactus). Please note that graphs build with PGGB are not supported. Improvements will be made in the future to include these graphs.

GrAnnoT also requires bedtools ot be installed.

You can run GrAnnoT via the script main.py with the following arguments :

  • GFA file with walks
  • annotation file of a genome embedded in the graph
  • the name of the source genome, that the annotation file refers to
  • optionnaly the names of the target genomes (these names must be included in the walks or paths of the graph). If no name is specified, the annotation will be transfered on all the genomes in the graph.

The other options can be found if you run "main.py -h".

It can output the following files :

For the graph :

  • a GFF and a GAF file with the annotation of the graph

For each target genome :

  • the GFF file with the annotation of the target genome
  • a text file with the details of the variations within the annotated regions
  • a text file with the alignments of the genes from the source genome and from the target genome

For all the target genomes :

  • A PAV matrix recapitulating the transfers

Support

If you have any question or suggestion, feel free to contact me by email : nina.marthe@ird.fr

Authors

If you use this program please cite Nina Marthe, Francois Sabot and Matthias Zytnicki.

MIT License