Skip to content
Snippets Groups Projects
Commit 2ac24a64 authored by christine.tranchant_ird.fr's avatar christine.tranchant_ird.fr
Browse files

Integrating plot ref

parent e96b93da
No related branches found
No related tags found
No related merge requests found
%% Cell type:code id:6c56e93f tags:
%% Cell type:code id:f0516f8e tags:
``` python
out_dir = '/scratch/tranchant/rice-output'
fastq_dir = '/scratch/tranchant/data_test/fastq'
group_file = '/scratch/tranchant/data_test/rice_group.txt'
ref_file = '/scratch/tranchant/data_test/ref.fasta'
vec_file = '/scratch/tranchant/data_test/bank/UniVec_Core'
ref_png = '/scratch/tranchant/rice-output/04-stats/04-plots/00_ref.png'
ref_csv = '/scratch/tranchant/rice-output/04-stats/04-summary/00_ref.txt'
```
%% Cell type:markdown id: tags:
***
[<img src="Images/up-arrow.png" alt="Top" width=2% align="right">](#home "Go back to the top")
# <span style="color: #3987C4;">I - Workflow configuration <a class="anchor" id="workflow"></a></span>
### <span style="color: #919395"> _Parameters_ <a class="anchor" id="configinput"></a></span>
%% Cell type:code id: tags:
``` python
print(project_name, out_dir, ref_file, vec_file, group_file, fastq_dir) #,cpus)
print(out_dir, ref_file, vec_file, group_file, fastq_dir) #,cpus)
```
%% Cell type:markdown id: tags:
### <span style="color: #919395">_Preparing Genome Reference for next analysis_
#### __Genome indexation__ and __Genome dashboard__
This step is done with `bwa index` if index are absent. Indexation is required before performing reads mapping against genome reference.
%% Cell type:code id: tags:
``` python
#from pathlib import Path
import sys
sys.path.append("/home/christine/Documents/Dev/frangiPANe_snake/workflow")
from scripts import generate_stats as gs
gs.dashboard_genome2("400",png,csv)
```
%% Cell type:markdown id: tags:
### <span style="color: #919395">_Analyzing Group File_</span>
%% Cell type:code id: tags:
``` python
# Reading group file
id_dict, df_group = read_group_file(group_file.value,logger)
# Group file dashboard
dashboard_group(df_group)
bgc('LightBlue')
```
%% Output
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-e7c314ec7bb8> in <module>
1 # Reading group file
----> 2 id_dict, df_group = read_group_file(group_file.value,logger)
3
4 # Group file dashboard
5 dashboard_group(df_group)
NameError: name 'read_group_file' is not defined
%% Cell type:markdown id: tags:
***
[<img src="Images/up-arrow.png" alt="Top" width=2% align="right">](#home "Go back to the top")
# <span style="color: #3987C4;">II - frangiPANe Workflow <a class="anchor" id="workflow"></a></span>
### <span style="color: #919395"> _1 - Stats about raw data (fastq files)_
#### __Generating fastq statistics with `fastq_stats`__
After this stat analysis, several files have been created and saved into 00_fastq_stats directory :
* one file (fastq-stat) by fastq file
* one file with all stats : all_fastq-stats.csv
%% Cell type:code id: tags:
``` python
#Raw data dashboard
dashboard_fastq(fastqstat_csv,total_genome_size,df_group)
```
%% Cell type:markdown id: tags:
### <span style="color: #919395">_2 - Mapping the individuals reads against the reference genome_ <a class="anchor" id="mapping"></a></span>
%% Cell type:markdown id: tags:
#### __Generating mapping stats <a class="anchor" id="mappingstat">__
Statistics are generated by `samtools flagstat` and they are saved into the directory _01_mapping-against_reference_ and the subdirectory _stat_
* One "flagtstat file" is generated for each bam file (http://www.htslib.org/doc/samtools-flagstat.html).
* _all_flagstat.csv_ file compiling all the stats
%% Cell type:code id: tags:
``` python
### Dashboard
dashboard_flagstat(stat_file,df_group)
bgc('LightBlue')
```
%% Cell type:markdown id: tags:
[<img src="Images/up-arrow.png" alt="Top" width=2% align="right">](#home "Go back to the top")
### <span style="color: #919395">3 - Assembly of the individuals' reads that do not map (properly) on the reference genome <a class="anchor" id="assembly"></a></span>
%% Cell type:code id: tags:
``` python
dashboard_ab(stat_len,stats_N,stats_L,output_assembly_testplots)
bgc('LightBlue')
```
%% Cell type:markdown id: tags:
#### __Assembly step 2 : assembly with the final k value__
### Running ABySS for each individual
%% Cell type:code id: tags:
``` python
dashboard_assembly(stat_file,df_group)
```
%% Cell type:markdown id: tags:
[<img src="Images/up-arrow.png" alt="Top" width=2% align="right">](#home "Go back to the top")
### <span style="color: #919395"> 4 - Removing contamination<a class="anchor" id="contamination"></a></span>
#### __VecScreen__
%% Cell type:code id: tags:
``` python
dashboard_ass(final_stat_file,df_group)
bgc('LightBlue')
```
%% Cell type:markdown id: tags:
[<img src="Images/up-arrow.png" alt="Top" width=2% align="right">](#home "Go back to the top")
### <span style="color: #919395"> 5 - Reducing Sequence Redundancy<a class="anchor" id="redundancy"></a></span>
frangiPANe uses CD-HIT to cluster sequences and to reduce sequence redundancy (inter and intra-species).
%% Cell type:code id: tags:
``` python
#Dashboard
dashboard_cdhit(df_cdhit)
bgc('LightBlue')
```
%% Cell type:markdown id: tags:
[<img src="Images/up-arrow.png" alt="Top" width=2% align="right">](#home "Go back to the top")
### <span style="color: #919395"> 6 - Anchoring Clusters on Reference Genome<a class="anchor" id="anchoring"></a></span>
#### __Generating panreference__
%% Cell type:code id: tags:
``` python
dashboard_flagstat(stat2_file,df_group)
bgc('LightBlue')
```
%% Cell type:markdown id: tags:
#### __Panreference dashboard__
%% Cell type:code id: tags:
``` python
dashboard_anchoring(cdhit_fasta,panref_keep_file,panref_bed_file, output_dir, anc_stat_dict)
bgc('LightBlue')
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment