'Initial' commit for these copy, after reached the free storage limit of 10 GB...

'Initial' commit for these copy, after reached the free storage limit of 10 GB on main first one... :s

'Initial' commit for these copy, after reached the free storage limit of 10 GB...
289f33c5 · Nicolas FERNANDEZ NUÑEZ · 289f33c5 · 289f33c5 · 289f33c5 · 289f33c5
Commit 289f33c5 authored 3 years ago by Nicolas FERNANDEZ NUÑEZ
--- a/.gitignore
+++ b/.gitignore
+*~
+*#
+.DS_Store
+#.snakemake
+*.temp
+*.fastq
+*.fastq.gz
+*.done
+*.pdf
+*.txt
+/results
\ No newline at end of file
--- a/README.md
+++ b/README.md
+# RQCP: Reads Quality Control Pipeline #
+## Description ##
+RQCP check NGS (illumina) reads quality and clean it if needed, as you set, using: 
+- Cutadapts to trim NGS sequencing adapters  
+- Sickle-trim to trim reads on base-calling quality score  
+- Fastq-join to join mates reads (forward R1 and Reverse R2) when it's possible  
+- FastQC to check global quality
+- FastqScreen to check putative contamination(s)
+- MultiQC to generate HTML reports  
+## Badges ##
+![Maintener](<https://badgen.net/badge/Maintener/Nicolas Fernandez/blue?scale=0.9>)
+![MacOS](<https://badgen.net/badge/icon/Hight Sierra (10.13),Catalina (10.15),Big Sure (11)/cyan?icon=apple&label&list=|&scale=0.9>)
+![Issues closed](<https://badgen.net/badge/Issues closed/0/green?scale=0.9>)
+![Issues opened](<https://badgen.net/badge/Issues opened/0/yellow?scale=0.9>)
+![Maintened](<https://badgen.net/badge/Maintened/No/red?scale=0.9>)
+![Wiki](<https://badgen.net/badge/icon/Wiki/pink?icon=wiki&label&scale=0.9>)
+![Open Source](<https://badgen.net/badge/icon/Open Source/purple?icon=https://upload.wikimedia.org/wikipedia/commons/4/44/Corazón.svg&label&scale=0.9>)
+![GNU AGPL v3](<https://badgen.net/badge/Licence/GNU AGPL v3/grey?scale=0.9>)
+![Bash](<https://badgen.net/badge/icon/Bash 3.2.57/black?icon=terminal&label&scale=0.9>)
+![Python](<https://badgen.net/badge/icon/Python 3.8.7/black?icon=https://upload.wikimedia.org/wikipedia/commons/0/0a/Python.svg&label&scale=0.9>)
+![Snakemake](<https://badgen.net/badge/icon/Snakemake 5.11.2/black?icon=https://upload.wikimedia.org/wikipedia/commons/d/d3/Python_icon_%28black_and_white%29.svg&label&scale=0.9>)
+![Conda](<https://badgen.net/badge/icon/Conda 4.10.3/black?icon=codacy&label&scale=0.9>)
+## Visuals ##
+_Good idea to include screenshots or GIFs (see ttygif or Asciinema)_  
+## Installation ##
+### Command-Line ###
+Clone with SSH when you want to authenticate only one time  
+`git clone git@gitlab.com:ird_transvihmi/Reads_Quality_Control_Pipeline.git`  
+Authenticate with GitLab by following the instructions in the [SSH documentation] (https://docs.gitlab.com/ee/ssh/index.html)  
+Clone with HTTPS when you want to authenticate each time you perform an operation between your computer and GitLab  
+`git clone https://gitlab.com/ird_transvihmi/Reads_Quality_Control_Pipeline.git`  
+### Web-Interface ###
+Download reads_quality_control project directory  
+![Image of download option](./Download_screen.png)  
+### Difference between download and clone ###
+To create a copy of a remote repository’s files on your computer, you can either download or clone the repository  
+If you download it, you cannot sync the repository with the remote repository on GitLab  
+Cloning a repository is the same as downloading, except it preserves the Git connection with the remote repository  
+You can then modify the files locally and upload the changes to the remote repository on GitLab  
+## Usage ##
+Make sure your bash script is executable  
+_Edit config.yaml file on config directory (option)_  
+_Edit fastq-screen config file on config directory (option)_  
+Run RQC.sh bash script by double-clicking on it  
+###  manual ###
+_to do quick !_
+## Support ##
+1. RTFM! (Read The Fabulous Manual! ^^.)
+2. Read de awsome wiki ;)
+3. Create a new issue: Issues > New issue > Describe your issue
+4. Send an email to [nicolas.fernandez@ird.fr](url)
+5. Call me to `+33.(0)4.67.41.55.°°` ; No, don't please ! O_o 
+## Roadmap ##
+Add new features    
+## Contributing ##
+Open to contributions :)  
+Testing code, finding issues, asking for update, proposing new features ...  
+Use Git tools to share !  
+## Authors and acknowledgment ##
+- Nicolas Fernandez (Maintener)  
+- Christelle Butel for testing all versions of this script  
+## License ##
+GPLv3 : https://www.gnu.org/licenses/gpl-3.0.html  
+## Project status ##
+I'm out of time for this project, development has slowed down, close to stopped completely  
+You can be volunteer to step in as a maintainer ;)  
+Or choose to fork this project allowing this project to keep going !  
+For information :  
+- **Guests** are _not active contributors_ in private projects, they can only see, and leave comments and issues.  
+- **Reporters** are _read-only contributors_, they can't write to the repository, but can on issues.  
+- **Developers** are _direct contributors_, they have access to everything to go from idea to production, unless something has been explicitly restricted.  
+- **Maintainers** are _super-developers_, they are able to push to master, deploy to production. This role is often held by maintainers and engineering managers.  
+- **Owners** are essentially _group-admins_, they can give access to groups and have destructive capabilities.  
+## Project status
+If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.
--- a/RQCP.sh
+++ b/RQCP.sh
+#!/bin/bash
+##### About #####
+echo ""
+echo "##### ABOUT #####"
+echo "-----------------"
+echo "Name: Reads Quality Control Pipeline"
+echo "Author: Nicolas Fernandez"
+echo "Affiliation: IRD_U233_TransVIHMI"
+echo "Aim: Reads Quality Control and Cleaning"
+echo "Date: 2021.04.30"
+echo "Run: snakemake --use-conda -s readsquality_control_pipeline.smk --cores"
+echo "Latest modification: 2021.10.05"
+echo "Todo: na"
+echo "________________________________________________________________________"
+###### Hardware check #####
+echo ""
+echo "##### HARDWARE #####"
+echo "--------------------"
+physicalcpu=$(sysctl -n hw.physicalcpu)   # Get physical cpu
+echo "Physical CPU: ${physicalcpu}"       # Print physical cpu
+logicalcpu=$(sysctl -n hw.logicalcpu)     # Get logical cpu
+echo "Logical CPU: ${logicalcpu}"         # Print logical cpu
+hwmemsize=$(sysctl -n hw.memsize)         # Get memory size
+ramsize=$(expr $hwmemsize / $((1024**3))) # 1024**3 = GB
+echo "System Memory: ${ramsize} GB"       # Print RAM size
+echo "________________________________________________________________________"
+###### Working directory
+echo ""
+echo "##### WORKING DIRECTORY #####"
+echo "-----------------------------"
+WORKDIR=${0%/*}
+echo "CWD: ${WORKDIR}"
+echo "________________________________________________________________________"
+###### Rename samples, comment first line if you want do keep barcode-ID in name
+rename -v 's/_S\d+_/_/' ${WORKDIR}/resources/reads/*.fastq.gz 2> /dev/null                # Remove barcode-ID like {_S001_}
+rename -v 's/_L\d+_/_/' ${WORKDIR}/resources/reads/*.fastq.gz 2> /dev/null                # Remove line-ID ID like {_L001_}
+rename -v 's/_\d+.fastq.gz/.fastq.gz/' ${WORKDIR}/resources/reads/*.fastq.gz 2> /dev/null # Remove end-name ID like {_001}.fastq.gz
+###### Create links for rax samples
+mkdir ${WORKDIR}/results/ ${WORKDIR}/results/reads/ ${WORKDIR}/results/reads/raw/ 2> /dev/null
+for FASTQ in ${WORKDIR}/resources/reads/*.fastq.gz ; do ln -s $FASTQ ${WORKDIR}/results/reads/raw/$(basename $FASTQ) 2> /dev/null ; done # Create Symbol links
+##### Create dir for QC
+mkdir ${WORKDIR}/results/reads/cutadapt/ ${WORKDIR}/results/reads/sickle-trim/ ${WORKDIR}/results/reads/fastq-join/ 2> /dev/null
+###### Call snakemake pipeline
+snakemake --directory ${WORKDIR} --use-conda --keep-going --rerun-incomplete -s ${WORKDIR}/workflow/rules/reads_quality_control_pipeline.smk --cores --unlock # unlock first, if previous error
+snakemake --directory ${WORKDIR} --use-conda --keep-going --rerun-incomplete -s ${WORKDIR}/workflow/rules/reads_quality_control_pipeline.smk --cores --dryrun # then dry run, check error like missing file
+snakemake --directory ${WORKDIR} --use-conda --keep-going --rerun-incomplete -s ${WORKDIR}/workflow/rules/reads_quality_control_pipeline.smk --cores          # at last, real run 
+###### Create usefull graphs and summary
+mkdir ${WORKDIR}/results/Graphics/ 2> /dev/null
+snakemake --directory ${WORKDIR} --use-conda -s ${WORKDIR}/workflow/rules/reads_quality_control_pipeline.smk --dag       | dot -Tpdf > ${WORKDIR}/results/Graphics/DAGraph.pdf
+snakemake --directory ${WORKDIR} --use-conda -s ${WORKDIR}/workflow/rules/reads_quality_control_pipeline.smk --rulegraph | dot -Tpdf > ${WORKDIR}/results/Graphics/RuleGraph.pdf
+snakemake --directory ${WORKDIR} --use-conda -s ${WORKDIR}/workflow/rules/reads_quality_control_pipeline.smk --filegraph | dot -Tpdf > ${WORKDIR}/results/Graphics/FileGraph.pdf
+snakemake --directory ${WORKDIR} --use-conda -s ${WORKDIR}/workflow/rules/reads_quality_control_pipeline.smk --summary   >             ${WORKDIR}/results/Graphics/Summary.txt
--- a/config/config.yaml
+++ b/config/config.yaml
+---
+###############################################################################
+# Author: Nicolas Fernandez
+# Affiliation: IRD_TransVIHMI
+# Aim: Config file for the Reads Quality Control Pipeline (RQCP)
+# Date: 2021.04.30
+# Use: Edit or (de)comment values you want modify
+# Latest modification: 2021.10.04
+# Todo: ...
+###############################################################################
+## CLUSTER -------------------------------------------------------------------------------------------
+resources:
+  cpus: 12      # cpus
+  mem_mb: 64000 # mem in Mb
+  mem_gb: 64    # mem in Gb
+  time: 1140    # time in Min
+## ENVIRONNEMENTS ------------------------------------------------------------------------------------
+conda:
+  cutadapt:     '../envs/cutadapt-3.4.yaml'
+  sickle-trim:  '../envs/sickle-trim-1.33.yaml'
+  fastq-join:   '../envs/fastq-join-1.3.1.yaml'
+  fastqc:       '../envs/fastqc-0.11.9.yaml'
+  fastq-screen: '../envs/fastq-screen-0.14.0.yaml'
+  multiqc:      '../envs/multiqc-1.10.1.yaml'
+## DATASETS ------------------------------------------------------------------------------------------
+datasets:
+  quality-directory:
+    - 'raw'
+    - 'cutadapt'
+    - 'sickle-trim'
+    #- 'sickle-single'
+    - 'fastq-join'
+    #- 'fastq-join-unique'
+  quality-tool:
+    - 'fastqc'
+    - 'fastq-screen'
+## CUTADAPT ------------------------------------------------------------------------------------------
+cutadapt:
+  length: '75' # Discard reads shorter than length, after trim
+  kits:        # Sequence of an adapter ligated to the 3' end of the first read
+    truseq:      'AGATCGGAAGAGC'                     # Illumina TruSeq / ScriptSeq based kits libraries
+    nextera:     'CTGTCTCTTATACACATC'                # Illumina Nextera / TruSight based kits libraries
+    small:       'TGGAATTCTCGGGTGCCAAGG'             # Illumina Small based kits libraries
+    #TruSeqR1:    'AGATCGGAAGAGCACACGTCTGAACTCCAGTCA' # TruSeq-LT and TruSeq-HT based kits R1
+    #TruSeqR2:    'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT' # TruSeq-LT and TruSeq-HT based kits R2
+    #ScriptSeqR1: 'AGATCGGAAGAGCACACGTCTGAAC'         # ScriptSeq and TruSeq DNA Methylation based kits R1
+    #ScriptSeqR2: 'AGATCGGAAGAGCGTCGTGTAGGGA'         # ScriptSeq and TruSeq DNA Methylation based kits R2
+    #TruSeqRibo:  'AGATCGGAAGAGCACACGTCT'             # TruSeq Ribo Profile based kits
+## SICKLE-TRIM ---------------------------------------------------------------------------------------
+sickle-trim:
+  command: # choose one
+    #- 'se' # if single-end sequence
+    - 'pe' # if paired-end sequence
+  encoding: # choose one
+    - 'sanger'   # if sanger   (CASAVA >= 1.8)
+    #- 'illumina' # if illumina (CASAVA 1.3 to 1.7)
+    #- 'solexa'   # if solexa   (CASAVA < 1.3)
+  quality: '30'   # phred score limit
+  length: '75'    # read length limit, after trim
+## FASTQ-JOIN ----------------------------------------------------------------------------------------
+fastq-join:
+  percent: '10' # N-percent maximum difference (default: 8)
+  overlap: '10' # N-minimum overlap (default: 6)
+## FASTQSCREEN ---------------------------------------------------------------------------------------
+fastq-screen:
+  config: 'config/fastq-screen.conf' # path to the fastq-screen
+  subset: '1000' #
+  aligner:
+    - 'bwa'      #
+    #- 'bowtie'  #
+    #- 'bowtie2' #
+## ---------------------------------------------------------------------------------------------------
--- a/config/fastq-screen.conf
+++ b/config/fastq-screen.conf
+# This is Nicolas Fernandez (IRD) version of configuration file for FastQ-Screen
+############################
+## Bowtie, Bowtie 2 or BWA #
+############################
+## If the Bowtie, Bowtie 2 or BWA binary is not in your PATH, you can set  this value to tell the program where to find your chosen aligner.
+## Uncomment the relevant line below and set the appropriate location.
+## Please note, this path should INCLUDE the executable filename.
+#BOWTIE /usr/local/bowtie
+#BOWTIE2    /usr/local/bowtie2-2.3.4.1/bowtie2
+BWA .snakemake/conda/db59fc2b/bin/bwa
+############################################
+## Bismark (for bisulfite sequencing only) #
+############################################
+## If the Bismark binary is not in your PATH then you can set this value to tell the program where to find it.
+## Uncomment the line below and set the appropriate location.
+## Please note, this path should INCLUDE the executable filename.
+#BISMARK	/usr/local/bin/bismark/bismark
+############
+## Threads #
+############
+## Genome aligners can be made to run across multiple CPU cores to speed up searches.
+## Set this value to the number of cores you want for mapping reads.
+THREADS 1
+##############
+## DATABASES #
+##############
+## This section enables you to configure multiple genomes databases (aligner index files) to search against in your screen.
+## For each genome you need to provide a database name (which can't contain spaces) and the location of the aligner index files.
+##
+## The path to the index files SHOULD INCLUDE THE BASENAME of the index, e.g: /opt/Genomes/Human_Bowtie/GRCh37/Homo_sapiens.GRCh37
+## Thus, the index files (Homo_sapiens.GRCh37.1.bt2, Homo_sapiens.GRCh37.2.bt2, etc.) are found in a folder named 'GRCh37'.
+##
+## If, for example, the Bowtie, Bowtie2 and BWA indices of a given genome reside in the SAME FOLDER, a SINLGE path may be provided to ALL the of indices.
+## The index used will be the one compatible with the chosen aligner (as specified using the --aligner flag).  
+##
+## The entries shown below are only suggested examples, you can add as many DATABASE sections as required, and you can comment out or remove as many of the existing entries as desired.
+## We suggest including genomes and sequences that may be sources of contamination either because they where run on your sequencer previously, or may have contaminated your sample during the library preparation step.
+##
+## Human h38
+#DATABASE	Human	 resources/databases/bwa/Human/Homo_sapiens_h38
+##
+## Mouse m39
+#DATABASE	Mouse	 resources/databases/bwa/Mouse/Mus_musculus_m39
+##
+## Gorilla g4	
+#DATABASE	Gorilla	 resources/databases/bwa/Gorilla/Gorilla_gorilla_g4
+##
+## Chimpanzee
+#DATABASE	Chimpanzee	 resources/databases/bwa/Chimpanzee/Pan_troglodytes_t3
+##
+## Bat 10
+#DATABASE	Bat	 resources/databases/bwa/Bat/Pteropus_vampyrus_v1
+##
+## Ecoli - sequence available from EMBL accession U00096.2
+DATABASE	Ecoli	 resources/databases/bwa/Ecoli/Echerichia_coli_U00096
+##
+## HIV - HXB2
+DATABASE	HIV	 resources/databases/bwa/HIV/HXB2
+##
+## Ebola - sequence available from NCBI accession NC_002549
+DATABASE	Ebola	 resources/databases/bwa/Ebola/ZEBOV
+##
+## SARS-CoV-2 - sequence from Whuhan available from NCBI accession NC_045512.2
+DATABASE	SARS-CoV-2	 resources/databases/bwa/SARS-CoV-2/Wuhan-Hu-1
+##
+## PhiX - sequence available from Refseq accession NC_001422.1
+DATABASE	PhiX	 resources/databases/bwa/Phix/PhiX
+##
+## Adapters - sequence derived from the FastQC contaminats file found at: www.bioinformatics.babraham.ac.uk/projects/fastqc
+DATABASE	Adapters        resources/databases/bwa/Adapters/Adapters
+##
+## Vector - Sequence taken from the UniVec database: http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html, BUT, without phi-X174
+DATABASE	Vectors		 resources/databases/bwa/Vectors/UniVec_wo_phi-X174
+##
+## Arabidopsis thaliana - sequence from NCBI: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/735/GCF_000001735.3_TAIR10/GCF_000001735.3_TAIR10_genomic.fna.gz
+DATABASE	Arabidopsis	 resources/databases/bwa/Arabidopsis/Arabidopsis_thaliana_t10
--- a/resources/reads/.gitkeep
+++ b/resources/reads/.gitkeep
--- a/visuals/download_button.png
+++ b/visuals/download_button.png
--- a/workflow/envs/cutadapt-3.4.yaml
+++ b/workflow/envs/cutadapt-3.4.yaml
+name: cutadapt-3.4
+channels:
+  - bioconda
+  - conda-forge
+  - r
+  - anaconda
+  - defaults
+dependencies:
+  - ca-certificates=2021.4.13=hecd8cb5_1
+  - certifi=2020.12.5=py39h6e9494a_1
+  - cutadapt=3.4=py39ha492530_1
+  - dnaio=0.5.1=py39ha492530_0
+  - isa-l=2.30.0=h0d85af4_4
+  - libcxx=11.1.0=habf9029_0
+  - libffi=3.3=h046ec9c_2
+  - ncurses=6.2=h2e338ed_4
+  - openssl=1.1.1k=h0d85af4_0
+  - pigz=2.6=h5dbffcc_0
+  - pip=21.1=pyhd8ed1ab_0
+  - python=3.9.4=h88f2d9e_0
+  - python-isal=0.10.0=py39h89e85a6_0
+  - python_abi=3.9=1_cp39
+  - readline=8.1=h05e3726_0
+  - setuptools=52.0.0=py39hecd8cb5_0
+  - sqlite=3.35.5=h44b9ce1_0
+  - tk=8.6.10=h0419947_1
+  - tzdata=2021a=he74cb21_0
+  - wheel=0.36.2=pyhd3deb0d_0
+  - xopen=1.1.0=py39h6e9494a_2
+  - xz=5.2.5=haf1e3a3_1
+  - zlib=1.2.11=h7795811_1010
+prefix: /Users/2021nf001/miniconda3/envs/cutadapt-3.4
--- a/workflow/envs/fastq-join-1.3.1.yaml
+++ b/workflow/envs/fastq-join-1.3.1.yaml
+name: fastq-join-1.3.1
+channels:
+  - bioconda
+  - conda-forge
+  - r
+  - anaconda
+  - defaults
+dependencies:
+  - fastq-join=1.3.1=hb280591_4
+  - libcxx=11.1.0=habf9029_0
+prefix: /Users/2021nf001/miniconda3/envs/fastq-join-1.3.1
--- a/workflow/envs/fastq-screen-0.14.0.yaml
+++ b/workflow/envs/fastq-screen-0.14.0.yaml
+name: fastq-screen
+channels:
+  - bioconda
+  - conda-forge
+  - r
+  - anaconda
+  - defaults
+dependencies:
+  - bowtie=1.0.0=1
+  - bowtie2=2.4.2=py39h7f6fa2c_2
+  - bwa=0.7.17=h188c3c3_8
+  - ca-certificates=2021.4.13=hecd8cb5_1
+  - certifi=2020.12.5=py39h6e9494a_1
+  - expat=2.3.0=he49afe7_0
+  - fastq-screen=0.14.0=pl5262hdfd78af_1
+  - fontconfig=2.13.1=h10f422b_1005
+  - freetype=2.10.4=h4cff582_1
+  - giflib=5.2.1=hbcb3906_2
+  - icu=67.1=hb1e8313_0
+  - jpeg=9d=hbcb3906_0
+  - libcxx=11.1.0=habf9029_0
+  - libffi=3.3=h046ec9c_2
+  - libgd=2.2.5=h190906e_1008
+  - libiconv=1.16=haf1e3a3_0
+  - libpng=1.6.37=h7cec526_2
+  - libtiff=4.2.0=h7c11950_1
+  - libwebp=1.2.0=h1648767_0
+  - libwebp-base=1.2.0=h0d85af4_2
+  - libxml2=2.9.10=h2c6e4a5_2
+  - lz4-c=1.9.3=h046ec9c_0
+  - ncurses=6.2=h2e338ed_4
+  - openssl=1.1.1k=h0d85af4_0
+  - perl=5.26.2=hbcb3906_1008
+  - perl-gd=2.71=pl526h5c9b4e4_0
+  - perl-gdgraph=1.54=pl526_0
+  - perl-gdtextutil=0.86=pl526h1de35cc_5
+  - pip=21.1=pyhd8ed1ab_0
+  - python=3.9.4=h88f2d9e_0
+  - python_abi=3.9=1_cp39
+  - readline=8.1=h05e3726_0
+  - setuptools=52.0.0=py39hecd8cb5_0
+  - sqlite=3.35.5=h44b9ce1_0
+  - tbb=2020.3=h879752b_0
+  - tk=8.6.10=h0419947_1
+  - tzdata=2021a=he74cb21_0
+  - wheel=0.36.2=pyhd3deb0d_0
+  - xz=5.2.5=haf1e3a3_1
+  - zlib=1.2.11=h7795811_1010
+  - zstd=1.4.9=h582d3a0_0
+prefix: /Users/2021nf001/miniconda3/envs/fastq-screen
--- a/workflow/envs/fastqc-0.11.9.yaml
+++ b/workflow/envs/fastqc-0.11.9.yaml
+name: fastqc-0.11.9
+channels:
+  - bioconda
+  - conda-forge
+  - r
+  - anaconda
+  - defaults
+dependencies:
+  - fastqc=0.11.9=hdfd78af_1
+  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
+  - fontconfig=2.13.1=h10f422b_1005
+  - freetype=2.10.4=h4cff582_1
+  - icu=68.1=h74dc148_0
+  - libcxx=11.1.0=habf9029_0
+  - libiconv=1.16=haf1e3a3_0
+  - libpng=1.6.37=h7cec526_2
+  - libxml2=2.9.10=h93ec3fd_4
+  - openjdk=11.0.9.1=hcf210ce_1
+  - perl=5.32.0=hbcb3906_0
+  - xz=5.2.5=haf1e3a3_1
+  - zlib=1.2.11=h7795811_1010
+prefix: /Users/2021nf001/miniconda3/envs/fastqc
--- a/workflow/envs/multiqc-1.10.1.yaml
+++ b/workflow/envs/multiqc-1.10.1.yaml
+name: multiqc-1.10.1
+channels:
+  - bioconda
+  - conda-forge
+  - r
+  - anaconda
+  - defaults
+dependencies:
+  - brotlipy=0.7.0=py39hcbf5805_1001
+  - ca-certificates=2021.4.13=hecd8cb5_1
+  - certifi=2020.12.5=py39h6e9494a_1
+  - cffi=1.14.5=py39h319c39b_0
+  - chardet=4.0.0=py39h6e9494a_1
+  - click=7.1.2=pyh9f0ad1d_0
+  - colorama=0.4.4=pyh9f0ad1d_0
+  - coloredlogs=15.0=py39h6e9494a_0
+  - colormath=3.0.0=py_2
+  - commonmark=0.9.1=py_0
+  - cryptography=3.4.7=py39ha2c9959_0
+  - cycler=0.10.0=py_2
+  - decorator=5.0.7=pyhd8ed1ab_0
+  - freetype=2.10.4=h4cff582_1
+  - future=0.18.2=py39h6e9494a_3
+  - humanfriendly=9.1=py39h6e9494a_0
+  - idna=2.10=pyh9f0ad1d_0
+  - importlib-metadata=4.0.1=py39h6e9494a_0
+  - jinja2=2.11.3=pyh44b312d_0
+  - jpeg=9d=hbcb3906_0
+  - kiwisolver=1.3.1=py39hedf5dff_1
+  - lcms2=2.12=h577c468_0
+  - libblas=3.9.0=8_openblas
+  - libcblas=3.9.0=8_openblas
+  - libcxx=11.1.0=habf9029_0
+  - libffi=3.3=h046ec9c_2
+  - libgfortran=5.0.0=9_3_0_h6c81a4c_22
+  - libgfortran5=9.3.0=h6c81a4c_22
+  - liblapack=3.9.0=8_openblas
+  - libopenblas=0.3.12=openmp_h54245bb_1
+  - libpng=1.6.37=h7cec526_2
+  - libtiff=4.2.0=h7c11950_1
+  - libwebp-base=1.2.0=h0d85af4_2
+  - llvm-openmp=11.1.0=hda6cdc1_1
+  - lz4-c=1.9.3=h046ec9c_0
+  - lzstring=1.0.4=py_1001
+  - markdown=3.3.4=pyhd8ed1ab_0
+  - markupsafe=1.1.1=py39hcbf5805_3
+  - matplotlib-base=3.4.1=py39hb07454d_0
+  - multiqc=1.10.1=py_0
+  - ncurses=6.2=h2e338ed_4
+  - networkx=2.5=py_0
+  - numpy=1.20.2=py39h7eed0ac_0
+  - olefile=0.46=pyh9f0ad1d_1
+  - openssl=1.1.1k=h0d85af4_0
+  - pillow=8.2.0=py39h5270095_0
+  - pip=21.1=pyhd8ed1ab_0
+  - pycparser=2.20=pyh9f0ad1d_2
+  - pygments=2.8.1=pyhd8ed1ab_0
+  - pyopenssl=20.0.1=pyhd8ed1ab_0
+  - pyparsing=2.4.7=pyh9f0ad1d_0
+  - pysocks=1.7.1=py39h6e9494a_3
+  - python=3.9.4=h88f2d9e_0
+  - python-dateutil=2.8.1=py_0
+  - python_abi=3.9=1_cp39
+  - pyyaml=5.4.1=py39hcbf5805_0
+  - readline=8.1=h05e3726_0
+  - requests=2.25.1=pyhd3deb0d_0
+  - rich=10.1.0=py39h6e9494a_0
+  - setuptools=52.0.0=py39hecd8cb5_0
+  - simplejson=3.17.2=py39hcbf5805_2
+  - six=1.15.0=pyh9f0ad1d_0
+  - spectra=0.0.11=py_1
+  - sqlite=3.35.5=h44b9ce1_0
+  - tk=8.6.10=h0419947_1
+  - tornado=6.1=py39hcbf5805_1
+  - typing_extensions=3.7.4.3=py_0
+  - tzdata=2021a=he74cb21_0
+  - urllib3=1.26.4=pyhd8ed1ab_0
+  - wheel=0.36.2=pyhd3deb0d_0
+  - xz=5.2.5=haf1e3a3_1
+  - yaml=0.2.5=haf1e3a3_0
+  - zipp=3.4.1=pyhd8ed1ab_0
+  - zlib=1.2.11=h7795811_1010
+  - zstd=1.4.9=h582d3a0_0
+prefix: /Users/2021nf001/miniconda3/envs/multiqc
--- a/workflow/envs/sickle-trim-1.33.yaml
+++ b/workflow/envs/sickle-trim-1.33.yaml
+name: sickle-trim-1.33
+channels:
+  - bioconda
+  - conda-forge
+  - r
+  - anaconda
+  - defaults
+dependencies:
+  - sickle-trim=1.33=h188c3c3_6
+  - zlib=1.2.11=h7795811_1010
+prefix: /Users/2021nf001/miniconda3/envs/sickle-1.33
--- a/workflow/rules/reads_quality_control_pipeline.smk
+++ b/workflow/rules/reads_quality_control_pipeline.smk
+###############################################################################
+# Name: Reads Quality Control Pipeline
+# Author: Nicolas Fernandez
+# Affiliation: IRD_U233_TransVIHMI
+# Aim: Reads Quality Control
+# Date: 2021.04.30
+# Run: snakemake --use-conda -s reads_quality_control_pipeline.smk --cores 
+# Latest modification: 2021.10.04
+# Todo: ND
+###############################################################################
+# CONFIGURATION #
+configfile: "config/config.yaml"
+# FUNCTIONS #
+# WILDCARDS #
+SAMPLE, = glob_wildcards("results/reads/raw/{sample}_R1.fastq.gz")
+QCDIR = config["datasets"]["quality-directory"]
+# ENVIRONMENT #
+CUTADAPT = config["conda"]["cutadapt"]      # Cutadapt
+SICKLETRIM = config["conda"]["sickle-trim"] # Sickle-trim
+FASTQJOIN = config["conda"]["fastq-join"]   # Fastq-join
+FASTQC = config["conda"]["fastqc"]            # FastQC
+FASTQSCREEN = config["conda"]["fastq-screen"] # FastQ Screen
+MULTIQC = config["conda"]["multiqc"]          # MultiQC
+# RESOURCES #
+CPUS = config["resources"]["cpus"]     # Resources thread
+MEM_GB = config["resources"]["mem_gb"] # Resources mem in Gb
+# PARAMETERS #
+LENGTHC = config["cutadapt"]["length"]          # Cutadapt --minimum-length
+TRUSEQ = config["cutadapt"]["kits"]["truseq"]   # Cutadapt --adapter Illumina TruSeq
+NEXTERA = config["cutadapt"]["kits"]["nextera"] # Cutadapt --adapter Illumina Nextera
+SMALL = config["cutadapt"]["kits"]["small"]     # Cutadapt --adapter Illumina Small
+COMMAND = config["sickle-trim"]["command"]   # Sickle-trim command
+ENCODING = config["sickle-trim"]["encoding"] # Sickle-trim --qual-type 
+QUALITY = config["sickle-trim"]["quality"]   # Sickle-trim --qual-threshold
+LENGTHS = config["sickle-trim"]["length"]    # Sickle-trim --length-treshold
+PERCENT = config["fastq-join"]["percent"]  # Fastq-join -p (percent maximum difference)
+OVERLAP = config ["fastq-join"]["overlap"] # Fastq-join -m (minimum overlap)
+CONFIG = config["fastq-screen"]["config"]   # Fastq-screen --conf
+ALIGNER = config["fastq-screen"]["aligner"] # Fastq-screen --aligner
+SUBSET = config["fastq-screen"]["subset"]   # Fastq-screen --subset
+###############################################################################
+rule all:
+    input:
+        joined = expand("results/reads/fastq-join/{sample}_cutadapt_sickle-trim_fastq-join_Mate.fastq.gz",
+                         sample = SAMPLE),
+        multiqc = expand("results/quality/{qcdir}/multiqc/",
+                         qcdir = QCDIR)
+###############################################################################
+rule multiqc:
+    # Aim: aggregates bioinformatics analyses results into a single report
+    # Use: multiqc [OPTIONS] [--output dir] <analysis directory>
+    message:
+        "MultiQC aggregate quality results from {wildcards.qcdir} reads"
+    conda:
+        MULTIQC
+    log:
+       "results/reports/multiqc/{qcdir}.log"
+    input:
+        fastqc = "results/quality/{qcdir}/fastqc/",
+        fastqscreen = "results/quality/{qcdir}/fastqscreen/"
+    output:
+        multiqc = directory("results/quality/{qcdir}/multiqc/")
+    log:
+        "results/reports/multiqc/{qcdir}.log"
+    shell:
+        "multiqc "                   # Multiqc, searches in given directories for analysis & compiles a HTML report
+        "--quiet "                   # -q: Only show log warning
+        "--outdir {output.multiqc} " # -o: Create report in the specified output directory
+        "{input.fastqc} "            # Input FastQC files
+        "{input.fastqscreen} "       # Input Fastq-Screen files
+        #"--no-ansi "                 # Disable coloured log
+        "&> {log}"                   # Add redirection for log
+###############################################################################
+rule fastqscreen:
+    # Aim: screen if the composition of the library matches with  what you expect
+    # Use fastq_screen [OPTIONS] <fastq.1>... <fastq.n>
+    message:
+        "Fastq-Screen on samples {wildcards.qcdir} reads"
+    conda:
+        FASTQSCREEN
+    resources:
+        cpus = CPUS
+    params:
+        config = CONFIG,
+        aligner = ALIGNER,
+        subset = SUBSET
+    input:
+        fastq = "results/reads/{qcdir}/"
+    output:
+        fastqscreen = directory("results/quality/{qcdir}/fastqscreen/")
+    log:
+        "results/reports/fastq-screen/{qcdir}.log"
+    shell:
+        "fastq_screen "                  # FastqScreen, what did you expect ?
+        "--quiet "                       # -q: Only show log warning
+        "--threads {resources.cpus} "    # -t: Specifies across how many threads bowtie will be allowed to run
+        "--conf {params.config} "        # path to configuration file
+        "--aligner {params.aligner} "    # -a: choose aligner 'bowtie', 'bowtie2', 'bwa'
+        "--subset {params.subset} "      # Don't use the whole sequence file, but create a subset of specified size
+        "--outdir {output.fastqscreen} " # Output directory
+        "{input.fastq}/*.fastq.gz "      # Input file.fastq
+        "&> {log}"                       # Add redirection for log
+###############################################################################
+rule fastqc:
+    # Aim: reads sequence files and produces a quality control report
+    # Use: fastqc [OPTIONS] [--output dir] <fastq.1> ... <fastq.n>
+    message:
+        "FastQC on samples {wildcards.qcdir} reads"
+    conda:
+        FASTQC
+    resources:
+        cpus = CPUS
+    input:
+        fastq = "results/reads/{qcdir}/"
+    output:
+        fastqc = directory("results/quality/{qcdir}/fastqc/")
+    log:
+        "results/reports/fastqc/{qcdir}.log"
+    shell:
+        "fastqc "                     # FastQC, a high throughput sequence QC analysis tool
+        "--quiet "                    # -q: Supress all progress messages on stdout and only report errors
+        "--threads {resources.cpus} " # -t: Specifies files number which can be processed simultaneously
+        "--outdir {output.fastqc} "   # -o: Create all output files in the specified output directory
+        "{input.fastq}/*.fastq.gz "   # Input file.fastq
+        "&> {log}"                    # Add redirection for log
+###############################################################################
+rule fastqjoin:
+    # Aim: joins two paired-end reads on the overlapping ends
+    # Use: fastq-join [OPTIONS] <read1.fastq> <read2.fastq> [mate.fastq] -o <read.fastq> -o <read.fastq> -o <read.fastq>
+    priority: 1
+    message:
+        "Fastq-join assemble R1 and R2 from {wildcards.sample} reads"
+    conda:
+        FASTQJOIN
+    params:
+        percent = PERCENT,
+        overlap = OVERLAP
+    input:
+        forward = "results/reads/sickle-trim/{sample}_cutadapt_sickle-trim_R1.fastq.gz",
+        reverse = "results/reads/sickle-trim/{sample}_cutadapt_sickle-trim_R2.fastq.gz"
+    output:
+        forward= "results/reads/fastq-join-unique/{sample}_cutadapt_sickle-trim_fastq-join_Unique_R1.fastq.gz",
+        reverse = "results/reads/fastq-join-unique/{sample}_cutadapt_sickle-trim_fastq-join_Unique_R2.fastq.gz",
+        joined = "results/reads/fastq-join/{sample}_cutadapt_sickle-trim_fastq-join_Mate.fastq.gz"
+    log:
+        "results/reports/fastq-join/{sample}.log"
+    shell:
+        "fastq-join "          # Fastq-Join, joins two paired-end reads on the overlapping ends
+        "{input.forward} "     # Input forward
+        "{input.reverse} "     # Input reverse
+        "-p {params.percent} " # Percent maximum difference (default: 8)
+        "-m {params.overlap} " # Minimum overlap (default: 6)
+        "-o {output.forward} " # for unique forward files
+        "-o {output.reverse} " # for unique reverse files
+        "-o {output.joined} "  # for join files
+        #"-r {log} "            # Verbose stitch length report
+        "&> {log}"             # Add redirection for log
+###############################################################################
+rule sickletrim:
+    # Aim: windowed adaptive trimming tool for FASTQ files using quality
+    # Use: sickle <command> [OPTIONS]
+    message:
+        "Sickle-trim remove low quality sequences from  {wildcards.sample} reads"
+    conda:
+        SICKLETRIM
+    params:
+        command = COMMAND,
+        encoding = ENCODING,
+        quality = QUALITY,
+        length = LENGTHS
+    input:
+        forward = "results/reads/cutadapt/{sample}_cutadapt_R1.fastq.gz",
+        reverse = "results/reads/cutadapt/{sample}_cutadapt_R2.fastq.gz"
+    output:
+        forward = "results/reads/sickle-trim/{sample}_cutadapt_sickle-trim_R1.fastq.gz",
+        reverse = "results/reads/sickle-trim/{sample}_cutadapt_sickle-trim_R2.fastq.gz",
+        single = "results/reads/sickle-single/{sample}_cutadapt_sickle-trim_Single.fastq.gz"
+    log:
+        "results/reports/sickle-trim/{sample}.log"
+    shell:
+       "sickle "                # Sickle, a windowed adaptive trimming tool for FASTQ files using quality
+        "{params.command} "     # Paired-end or single-end sequence trimming
+        "-t {params.encoding} " # --qual-type: Type of quality values, solexa ; illumina ; sanger ; CASAVA, < 1.3 ; 1.3 to 1.7 ; >= 1.8
+        "-q {params.quality} "  # --qual-threshold: Threshold for trimming based on average quality in a window (default: 20)
+        "-l {params.length} "   # --length-treshold: Threshold to keep a read based on length after trimming (default: 20)
+        "-f {input.forward} "   # --pe-file1: Input paired-end forward fastq file
+        "-r {input.reverse} "   # --pe-file2: Input paired-end reverse fastq file
+        "-g "                   # --gzip-output: Output gzipped files
+        "-o {output.forward} "  # --output-pe1: Output trimmed forward fastq file
+        "-p {output.reverse} "  # --output-pe2: Output trimmed reverse fastq file (must use -s option)
+        "-s {output.single} "   # --output-single: Output trimmed singles fastq file
+        "&> {log}"              # Add redirection for log
+###############################################################################
+rule cutadapt:
+    # Aim: finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads
+    # Use: cutadapt -a ADAPTER [OPTIONS] [-o output.forward] [-p output.reverse] <input.forward> <input.reverse>
+    # Rmq: multiple adapter sequences can be given using further -a options, but only the best-matching adapter will be removed
+    message:
+        "Cutadapt remove adapters from {wildcards.sample} reads"
+    conda:
+        CUTADAPT
+    resources:
+        cpus = CPUS
+    params:
+        length = LENGTHC,
+        truseq = TRUSEQ,
+        nextera = NEXTERA,
+        small = SMALL
+    input:
+        forward = "results/reads/raw/{sample}_R1.fastq.gz",
+        reverse = "results/reads/raw/{sample}_R2.fastq.gz"
+    output:
+        forward = "results/reads/cutadapt/{sample}_cutadapt_R1.fastq.gz",
+        reverse = "results/reads/cutadapt/{sample}_cutadapt_R2.fastq.gz"
+    log:
+        "results/reports/cutadapt/{sample}.log"
+    shell:
+        "cutadapt "                         # Cutadapt, finds and removes unwanted sequence from your HT-seq reads
+        "--cores {resources.cpus} "         # -j: Number of CPU cores to use. Use 0 to auto-detect (default: 1)
+        "--trim-n "                         # --trim-n: Trim N's on ends of reads
+        "--minimum-length {params.length} " # -m: Discard reads shorter than length
+        "--adapter {params.truseq} "        # -a: Sequence of an adapter ligated to the 3' end of the first read
+        "-A {params.truseq} "               # -A: 3' adapter to be removed from second read in a pair
+        "--adapter {params.nextera} "       # -a: Sequence of an adapter ligated to the 3' end of the first read
+        "-A {params.nextera} "              # -A: 3' adapter to be removed from second read in a pair
+        "--adapter {params.small} "         # -a: Sequence of an adapter ligated to the 3' end of the first read
+        "-A {params.small} "                # -A: 3' adapter to be removed from second read in a pair
+        "--output {output.forward} "        # -o: Write trimmed reads to FILE
+        "--paired-output {output.reverse} " # -p: Write second read in a pair to FILE
+        "{input.forward} "                  # Input forward reads R1.fastq
+        "{input.reverse} "                  # Input reverse reads R2.fastq
+        "&> {log} "                         # Add redirection for log
+###############################################################################