Skip to content
Snippets Groups Projects
Commit 07284910 authored by nina.marthe_ird.fr's avatar nina.marthe_ird.fr
Browse files

updated readme and added a doc file in md

parent b3c12d0f
No related branches found
No related tags found
No related merge requests found
# GrAnnoT usage
To run GrAnnoT, your command line must look like this : ```grannot graph.gfa annotation.gff source_genome [options]```
### Positionnal arguments
GrAnnoT has 3 positionnal arguments :
- A pangenome graph in GFA format.
- The annotation of a genome included in the graph in GFF/GTF format.
- The name of the annotated genome.
The graph must have W or P lines with no overlap between the segments. If the graph has P-lines, the names of the paths must look like this : ```IRGSP#0#Chr1```, with three fields being the genome name, the haplotype, and the chromosome/contig name.\
The name of the annotated genome must be in the second second field of the W line, or in the first part of the second field of the P line (see [GFA format specification](https://gfa-spec.github.io/GFA-spec/GFA1.html)).
If the annotation file refers to a specify haplotype of your source genome, use the option ```-sh``` or ```--source_haplotype``` to specify the name of the haplotype to consider as source of the annotation. This option requires the option ```-ht``` or ```--haplotype```.
### Graph annotation transfer
To transfer annotations on the graph, you have 2 options :
- ```-gff``` or ```--graph_gff``` : outputs the annotation of the graph on GFF format. Not recommended.
- ```-gaf``` or ```--graph_gaf``` : outputs the annotation of the graph in [GAF](https://github.com/lh3/gfatools/blob/master/doc/rGFA.md#the-graph-alignment-format-gaf) format.
### Genome annotation transfer
To transfer annotations on the choosen genomes, you must use the option
```-ann``` or ```--annotation```.
To choose where the annotations will be transfered, use these options:
- ```-t``` or ```--target``` : this option can take several arguments if you want to transfer on several genomes. If the option is not specified, GrAnnoT will transfer the annotation on all the genomes in the graph.
- ```-ht``` or ```--haplotype``` : by default GrAnnoT doesn't look at the haplotype field in the walks or paths. With this option the annotation will be transfered separately on the different haplotypes of the target genomes.
Additionnal files that give informations about the transfer can be obtained with these options :
- ```-aln``` or ```--alignment``` : outputs the alignments of the annotated features in the source genome and in a target genome.
- ```-var``` or ```--variation``` : outputs the detail of the variations between the features in the source genome and in the target genome.
- ```-pav``` or ```--pav_matrix``` : outputs a presence-absence variation matrix to recapitulate the annotation transfers. Requires at least one option among ```-ann```, ```-aln``` and ```-var```.
<!-- could change the -pav requirements -->
You can filter the annotations you transfer with these options :
- ```-cov``` or ```--coverage``` : specify the minimum requested coverage percentage between the original feature and the feature found in the target genome. Default is 80.
- ```-id``` or ```--identity``` : specify the minimum requested sequence identity percentage between the original feature and the feature found in the target genome. Default is 80.
### General options
- ```-coord``` or ```--segment_coordinates_path``` : specify the path of the directory ```seg_coord/``` made by GrAnnoT in a previous run. Use this only if you run GrAnnoT multiple times on the same graph for the same genomes. This directory must contain bed files for all the walks or paths of the genomes you need (source and target), and the files ```segments.txt``` and ```walks.txt```. If not specified GrAnnoT will recreate these files.
- ```-o``` or ```--outdir``` : stores all the output files in the given directory. By default they will be stored in the current directory.
- ```-v``` or ```--verbose``` : makes GrAnnoT more verbose.
- ```-V``` or ```--version``` : displays GrAnnoT version and exits.
- ```-h``` or ```--help``` : displays a help message and exits.
\ No newline at end of file
# GrAnnoT # GrAnnoT
GrAnnoT is an annotation transfer tool for pangenome graphs. It can transfer genome annotations to a pangenome graph containing the genome, and also transfer the pangenome graph's annotations on the genomes it contains. It also outputs complementary information such as the alignments of the transfered genes, or a presence-absence matrix. GrAnnoT is an annotation transfer tool for pangenome graphs. It can transfer linear genome annotations to a pangenome graph containing the genome, and also transfer the pangenome graph's annotations on the genomes it contains. It also outputs complementary information such as the alignments of the transfered genes, or a presence-absence matrix.
This project is young and in development, some errors will be corrected, improvements will be made and new features will appear in the near future. This project is young and in development, some errors will be corrected, improvements will be made and new features will appear in the near future.
## Usage ## Installation
To install GrAnnoT, you simply need to download a file from [/GrAnnoT/dist](/GrAnnoT/dist/), and run ```pip install downloaded_file``` in your terminal.
GrAnnoT requires [python 3.10](https://www.python.org/downloads/release/python-3100/) and [bedtools](https://bedtools.readthedocs.io/en/latest/content/installation.html).
GrAnnoT requires the graph to be in GFA format with W-lines or P-lines that describe the walks or paths of the genomes. For example minigraph-cactus or VG give that kind of output. If the graph has P-lines, the names of the paths must have the form IRGSP#0#Chr1 with three fields being the genome name, the haplotype, and the chromosome/contig name. The graphs must be full (option -gfa full for minigraph cactus).
Please note that graphs build with PGGB are not supported. Improvements will be made in the future to include these graphs.
GrAnnoT also requires bedtools ot be installed. ## Usage
GrAnnoT requires the graph to be in GFA format with W-lines or P-lines that describe the walks or paths of the genomes. For example minigraph-cactus or VG give that kind of output. If the graph has P-lines, the names of the paths must look like this : ```IRGSP#0#Chr1```, with three fields being the genome name, the haplotype, and the chromosome/contig name. The graph must be full (option ```-gfa full``` for minigraph cactus).\
Please note that graphs built with PGGB are not yet supported. Improvements will be made in the future to include these graphs.
You can run GrAnnoT via the script main.py with the following arguments : You can run GrAnnoT by typing ```grannot``` in your terminal with the following arguments :
- GFA file with walks - GFA file with walks
- annotation file of a genome embedded in the graph - The annotation file of a genome embedded in the graph
- the name of the source genome, that the annotation file refers to - The name of the source genome, that the annotation file refers to
- optionnaly the names of the target genomes (these names must be included in the walks or paths of the graph). If no name is specified, the annotation will be transfered on all the genomes in the graph. - Optionnaly the names of the target genomes (these names must be included in the walks or paths of the graph). If no name is specified, the annotation will be transfered on all the genomes in the graph.
- The option corresponding to the operation you want GrAnnoT to do. For example the option for transfering the annotation on the graph is ```-gaf``` and the option for transfering the annotation on the target genomes is ```-ann```.
The other options can be found [here](GrAnnoT/grannot_help.md), or by running ```grannot --help```.
The other options can be found if you run "main.py -h". ### Output options
It can output the following files : It can output the following files :
For the graph : - For the graph :
- a GFF and a GAF file with the annotation of the graph
- For each target genome :
- the GFF file with the annotation of the target genome
- a text file with the details of the variations within the annotated regions
- a text file with the alignments of the genes from the source genome and from the target genome
- For all the target genomes :
- A PAV matrix recapitulating the transfers
### Example use :
To transfer the annotation on the graph in gaf format : \
```grannot graph.gfa annotation.gff source_genome -gaf```
- a GFF and a GAF file with the annotation of the graph
For each target genome : To transfer the annotation on genomes in the graph with a coverage and identity > 90 : \
```grannot graph.gfa annotation.gff source_genome -ann -t target_genome1 target_genome2 -cov 90 -id 90```
- the GFF file with the annotation of the target genome To transfer the annotation on all the genomes and output a presence-absence matrix for all these transfers : \
- a text file with the details of the variations within the annotated regions ```grannot graph.gfa annotation.gff source_genome -ann -pav```
- a text file with the alignments of the genes from the source genome and from the target genome
For all the target genomes :
- A PAV matrix recapitulating the transfers <!-- ## Future improvements :
- [ ] Adapt the code to handle PGGB graphs
- [ ] Allow parallelization
- [ ] Transfer multiple annotations in the graph -->
## Support ## Support
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment