Skip to content
Snippets Groups Projects
README.md 2.2 KiB
Newer Older
# GrAnnoT

GrAnnoT is an annotation transfer tool for pangenome graphs. It can transfer genome annotations to a pangenome graph containing the genome, and also transfer the pangenome graph's annotations on the genomes it contains. It also outputs complementary information such as the alignments of the transfered genes, or a presence-absence matrix.

nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
This project is young and in development, some errors will be corrected, improvements will be made and new features will appear in the near future.
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
## Usage
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
GrAnnoT requires the graph to be in GFA format with W-lines or P-lines that describe the walks or paths of the genomes. For example minigraph-cactus or VG give that kind of output. If the graph has P-lines, the names of the paths must have the form IRGSP#0#Chr1 with three fields being the genome name, the haplotype, and the chromosome/contig name. The graphs must be full (option -gfa full for minigraph cactus).
Please note that graphs build with PGGB are not supported. Improvements will be made in the future to include these graphs.
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
GrAnnoT also requires bedtools ot be installed.


You can run GrAnnoT via the script main.py with the following arguments : 
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
- GFA file  with walks
- annotation file of a genome embedded in the graph
- the name of the source genome, that the annotation file refers to
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
- optionnaly the names of the target genomes (these names must be included in the walks or paths of the graph). If no name is specified, the annotation will be transfered on all the genomes in the graph.
The other options can be found if you run "main.py -h".
It can output the following files :
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
- a GFF and a GAF file with the annotation of the graph
For each target genome :
- the GFF file with the annotation of the target genome
- a text file with the details of the variations within the annotated regions
- a text file with the alignments of the genes from the source genome and from the target genome
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
For all the target genomes :

- A PAV matrix recapitulating the transfers

nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
If you have any question or suggestion, feel free to contact me by email : nina.marthe@ird.fr
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
If you use this program please cite Nina Marthe, Francois Sabot and Matthias Zytnicki.
nina.marthe_ird.fr's avatar
nina.marthe_ird.fr committed
MIT License