Skip to content
Snippets Groups Projects

RQC: Reads Quality Control

Maintener MacOSX Intel/M1/M2 GNU-Linux Ubuntu WSL/WSL2 Issues closed Issues opened Maintened Wiki Open Source GNU AGPL v3 Gitlab Bash Python Snakemake Conda

~ ABOUT ~

RQC

RQC is a FAIR, open-source, scalable, modulable and traceable snakemake pipeline, used for Illumina Inc. short reads quality controls.
RQC is included as first step of GeVarLi workflow.

Genomic sequencing, a public health tool

The establishment of a surveillance and sequencing network is an essential public health tool for detecting and containing pathogens with epidemic potential. Genomic sequencing mak
es it possible to identify pathogens, monitor the emergence and impact of variants, and adapt public health policies accordingly.

The Covid-19 epidemic has highlighted the disparities that remain between continents in terms of surveillance and sequencing systems. At the end of October 2021, of the 4,600,000 s
equences shared on the public and free GISAID tool worldwide, only 49,000 came from the African continent, i.e. less than 1% of the cases of Covid-19 diagnosed on this continent.

Features

  • Reads quality control
    • Fastq-Screen
    • FastQC
    • MultiQC (html report)

Version

V.2022.11

Rulegraph

~ SUPPORT ~

  1. Read The Fabulous Manual!
  2. Read de Awsome Wiki!
  3. Create a new issue: Issues > New issue > Describe your issue
  4. Send an email to nicolas.fernandez@ird.fr

~ CITATION ~

If you use this pipeline, please cite this RQC, GitLab IRDForge repository and authors:

GitLab IRDForge repository: https://forge.ird.fr/transvihmi/nfernandez/RQC

RQC, a FAIR, open-source, scalable, modulable and traceable snakemake pipeline, for Illumina Inc. short reads quality controls.

Nicolas FERNANDEZ NUÑEZ (1)
(1) UMI 233 - Recherches Translationnelles sur le VIH et les Maladies Infectieuses endémiques et émergentes (TransVIHMI), University of Montpellier (UM), French Institute
of Health and Medical Research (INSERM), French National Research Institute for Sustainable Development (IRD)

~ AUTHORS & ACKNOWLEDGMENTS ~

  • Nicolas Fernandez - IRD (Developer and Maintener)
  • Christelle Butel - IRD (Reporter)
  • DALL•E mini - OpenAI Git (Repo. avatar)

~ LICENSE ~

Licencied under GPLv3
Intellectual property belongs to IRD and authors.

~ ROADMAP ~

  • Add MultiQC config template

~ PROJECT STATUS ~

This project is regularly update and actively maintened
However, you can be volunteer to step in as developer or maintainer

~ CONTRIBUTING ~

Open to contributions!

  • Asking for update
  • Proposing new feature
  • Reporting issue
  • Fixing issue
  • Sharing code
  • Citing tool

~ INSTALLATIONS ~

Conda (dependencies)

RQC use the usefull Conda environment manager
So, if and only if, it's required (Conda not already installed), please, first install Conda!

Download and install your OS adapted version of Latest Miniconda Installer

e.g. for MacOSX-64-bit systems:

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ~/Miniconda3-latest-MacOSX-x86_64.sh && \
bash ~/Miniconda3-latest-MacOSX-x86_64.sh -b -p ~/miniconda3/ && \
rm -f ~/Miniconda3-latest-MacOSX-x86_64.sh && \
~/miniconda3/condabin/conda update conda --yes && \
~/miniconda3/condabin/conda init && \
exit

e.g. for Linux-64-bit systems:

curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o ~/Miniconda3-latest-Linux-x86_64.sh && \
bash ~/Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3/ && \
rm -f ~/Miniconda3-latest-Linux-x86_64.sh && \
~/miniconda3/condabin/conda update conda --yes && \
~/miniconda3/condabin/conda init && \
exit

Update Conda:

conda update -n base -c defaults conda

RQC

Clone to your home/ RQC GitLab IRDForg repository (ID: 404):

git https://forge.ird.fr/transvihmi/nfernandez/RQC.git ~/RQC/

Update RQC:

cd ~/RQC/ && git reset --hard HEAD && git pull --verbose

~ USAGE ~

  1. Copy your reads (single or paired-ends) in .fastq.gz or fastq formats files into: ./resources/reads/ directory

  2. Execute Start_RQC.sh bash script to run GeVarLi pipeline (according to your choice):

    • or with a Double-click on it (if you make .sh files executable files with Terminal.app)
    • or with a Right-click > Open with > Terminal.app
    • or with CLI from a terminal:
bash Start_RQC.sh
  1. Yours analyzes will start, with default configuration settings

Option-1: Edit config.yaml file in ./config/ directory
Option-2: Edit fastq-screen.conf file in ./config/ directory

First run will auto-created (only once): - Snakemake-Base conda environment (Snakemake, Mamba, Rename, GraphViz) - RQC-conda environments (for each tools used by RQC) - Indexes for BWA aligner (for each fasta genomes in resources)

This may take some time, depending on your internet connection and your computer

~ RESULTS ~

Yours results are available in ./results/ directory, as follow:

 🧩 Reads_Quality_Control/
  └── 📂 results/
       ├── 🌐 All_readsQC_reports.html
       ├── 📂 00_Quality_Control/
       │    ├── 📂 fastq-screen/
       │    │    ├── 🌐 {SAMPLE}_R{1/2}_screen.html
       │    │    ├── 📈 {SAMPLE}_R{1/2}_screen.png
       │    │    └── 📄 {SAMPLE}_R{1/2}_screen.txt
       │    ├── 📂 fastqc/
       │    │    ├── 🌐 {SAMPLE}_R{1/2}_fastqc.html
       │    │    └── 📦 {SAMPLE}_R{1/2}_fastqc.zip
       │    └── 📂 multiqc/
       │         ├── 🌐 multiqc_report.html
       │         └──📂 multiqc_data/
       │             ├── 📝 multiqc.log
       │             ├── 📄 multiqc_citations.txt
       │             ├── 🌀 multiqc_data.json
       │             ├── 📄 multiqc_fastq_screen.txt
       │             ├── 📄 multiqc_fastqc.txt
       │             ├── 📄 multiqc_general_stats.txt
       |             └── 📄 multiqc_sources.txt
       └── 📂 10_Reports/
            ├── ⚙️  config.log
            ├── 📝 settings.log
            ├── 🍜 RQC-Base_v.{VERSION}.yaml
            ├── 📂 files-summaries
            │    └── 📄 Reads_Quality_Control_files-summary.txt
            ├── 📂 graphs/
            │    ├── 📈 Reads_Quality_Control_dag.{PNG/PDF}
            │    ├── 📈 Reads_Quality_Control_filegraph.{PNG/PDF}
            │    └── 📈 Reads_Quality_Control_rulegraph.{PNG/PDF}
            └── 📂 tools-log/
                 ├── 📂 bowtie2/
                 ├── 📂 bwa/
                 ├── 📝 fastq-screen.log
                 ├── 📝 fastqc.log
                 └── 📝 multiqc.log

fastq-screen

Search in your libraries if the genomes of organisms you work on, along with PhiX, Vectors,
or other contaminants commonly seen in sequencing experiments.
More about fastq-screen

fastqc

Modular set of analyses which you can use to give a quick impression of whether
your data has any problems of which you should be aware before doing any further analysis.
More about fastqc

multiqc

Compiled HTML report. More about multiqc

~ CONFIGURATION ~

You can edit default settings in config.yaml file into ./config/ directory:

Resources

Edit to match your hardware configuration

  • cpus: for tools that can (i.e. bwa), could be use at most n cpus to run in parallel (default config: '8')
    Note: snakemake (with default Start bash script) will always use all cpus to parallelize jobs
  • ram: for tools that can (i.e. samtools), limit memory usage to max n Gb (default config: '16' Gb)
  • tmpdir: for tools that can (i.e. pangolin), specify where you want the temp stuff (default config: '$TMPDIR')

Environments

Edit if you want change some environments (e.g. test a new version) in ./workflow/envs/{tools}_v.{version}.yaml files

Fastq-Screen

  • config: path to the fastq-screen configuration file (default config: ./config/fastq-screen.conf)
  • subset: do not use the whole sequence file, but create a temporary dataset of this specified number of read (default config: '1000')
  • aligner: specify the aligner to use for the mapping. Valid arguments are 'bowtie', bowtie2' or 'bwa' (default config: 'bwa')

fastq-screen.conf

  • databases: enables you to configure multiple genomes databases (aligner index files) to search against

RQC map

 🧩 Reads_Quality_Control/
 ├── 🖥️  Start_GeVarLi.sh
 ├── 📚 README.md
 ├── 🪪 LICENSE
 ├── 🛑 .gitignore
 ├── 📂 .git/
 ├── 📂 .snakemake/
 ├── 📂 config/
 │    ├── ⚙️  config.yaml
 │    └── ⚙️  fastq-screen.conf
 ├── 📂 resources/
 │    ├── 📂 genomes/
 │    │    ├── 🧬 SARS-CoV-2_Wuhan_MN908947-3.fasta
 │    │    ├── 🧬 Monkeypox-virus_Zaire_AF380138-1.fasta
 │    │    ├── 🧬 Monkeypox-virus_UK_MT903345-1.fasta
 │    │    ├── 🧬 Swinepox-virus_India_MW036632-1.fasta
 │    │    ├── 🧬 Ebola-virus_Zaire_AF272001-1.fasta
 │    │    ├── 🧬 Nipah-virus_Malaysia_AJ564622-1.fasta
 │    │    ├── 🧬 HIV-1_HXB2_K03455-1.fasta.fasta
 │    │    ├── 🧬 (your_favorite_genome_reference}.fasta
 │    │    ├── 🧬 QC_Echerichia-coli_CP060121-1.fasta
 │    │    ├── 🧬 QC_Kanamycin-Resistance-Gene.fasta
 │    │    ├── 🧬 QC_NGS-adapters.fasta
 │    │    ├── 🧬 QC_phi-X174_Coliphage_NC-001422-1.fasta
 │    │    ├── 🧬 QC_UniVec_wo_phiX_and_kanamycin.fasta
 │    │    └── 🧬 {your_favorite_qc_reference}.fasta
 │    ├── 📂 indexes/
 │    │    └── 📂 bwa/
 │    │         ├── 🗂️  {GENOME}.amb
 │    │         ├── 🗂️  {GENOME}.ann
 │    │         ├── 🗂️  {GENOME}.bwt
 │    │         ├── 🗂️  {GENOME}.pac
 │    │         └── 🗂️  {GENOME}.sa
 │    ├── 📂 reads/
 │    │    ├── 🛡️  .gitkeep
 │    │    ├── 📦 {SAMPLE}_R1.fastq.gz
 │    │    └── 📦 {SAMPLE}_R2.fastq.gz
 │    └── 📂 visuals/
 │         └── 📈 quality_control_rulegraph.png
 └── 📂 workflow/
      ├── 📂 envs/
	  │    ├── 📂 linux/
      │    │    ├── 🍜 bwa_v.0.7.17.yaml
      │    │    ├── 🍜 fastq-screen_v.0.15.2.yaml
      │    │    ├── 🍜 fastqc_v.0.11.9.yaml
      │    │    ├── 🍜 multiqc_v.1.12.yaml
      │    │    └── 🍜 snakemake-base_v.2023.02.yaml
      │    └── 📂 osx/
      │         ├── 🍜 bwa_v.0.7.17.yaml
      │         ├── 🍜 fastq-screen_v.0.15.2.yaml
      │         ├── 🍜 fastqc_v.0.11.9.yaml
      │         ├── 🍜 multiqc_v.1.12.yaml
      │         └── 🍜 snakemake-base_v.2023.02.yaml
      └── 📂 rules/
           ├── 📜 indexing_genomes.smk
		   └── 📜 quality_control.smk

~ REFERENCES ~

Sustainable data analysis with Snakemake
Felix Mölder, Kim Philipp Jablonski, Brice Letcher, Michael B. Hall, Christopher H. Tomkins-Tinch, Vanessa Sochat, Jan Forster, Soohyun Lee, Sven O. Twardziok, Alexander Kanitz, Andreas Wilm, Manuel Holtgrewe, Sven Rahmann, Sven Nahnsen, Johannes Köster
F1000Research (2021)
DOI: https://doi.org/10.12688/f1000research.29032.2
Publication: https://f1000research.com/articles/10-33/v1
Source code: https://github.com/snakemake/snakemake
Documentation: https://snakemake.readthedocs.io/en/stable/index.html

Anaconda Software Distribution
Team
Computer software (2016)
DOI:
Publication: https://www.anaconda.com
Source code: https://github.com/snakemake/snakemake (conda)
Documentation: https://snakemake.readthedocs.io/en/stable/index.html (conda)
Source code: https://github.com/mamba-org/mamba (mamba) Documentation: https://mamba.readthedocs.io/en/latest/index.html (mamba)

Fast and accurate short read alignment with Burrows-Wheeler Transform
Heng Li and Richard Durbin
Bioinformatics, Volume 25, Aricle 1754-60 (2009)
DOI: https://doi.org/10.1093/bioinformatics/btp324
Publication: https://pubmed.ncbi.nlm.nih.gov/19451168@
Source code: https://github.com/lh3/bwa
Documentation: http://bio-bwa.sourceforge.net

MultiQC: summarize analysis results for multiple tools and samples in a single report
Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
Bioinformatics, Volume 32, Issue 19 (2016)
DOI: https://doi.org/10.1093/bioinformatics/btw354
Publication: https://academic.oup.com/bioinformatics/article/32/19/3047/2196507
Source code: https://github.com/ewels/MultiQC
Documentation: https://multiqc.info

FastQ Screen: A tool for multi-genome mapping and quality control
Wingett SW and Andrews S
F1000Research (2018)
DOI: https://doi.org/10.12688/f1000research.15931.2
Publication: https://f1000research.com/articles/7-1338/v2
Source code: https://github.com/StevenWingett/FastQ-Screen
Documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen

FastQC: A quality control tool for high throughput sequence data
Simon Andrews
Online (2010)
DOI: https://doi.org/
Publication:
Source code: https://github.com/s-andrews/FastQC
Documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc

###############################################################################