RQC: Reads Quality Control
~ ABOUT ~
RQC pipeline used to check reads qualities from NGS sequencing.
Features
- Control reads quality (multiQC html report)
Version
V.2022.11
Citation
none
Rulegraph

~ INSTALLATIONS ~
Conda (mandatory)
RQC (with Snakemake) use the usefull Conda environment manager
So, if and only if, it's required (Conda not already installed), please, first install Conda!
Download and install your OS adapted version of Latest Miniconda Installer
e.g. for MacOSX-64-bit systems:
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ~/Miniconda3-latest-MacOSX-x86_64.sh && \
bash ~/Miniconda3-latest-MacOSX-x86_64.sh -b -p ~/miniconda3/ && \
rm -f ~/Miniconda3-latest-MacOSX-x86_64.sh && \
~/miniconda3/condabin/conda update conda --yes && \
~/miniconda3/condabin/conda init && \
exit
e.g. for Linux-64-bit systems:
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o ~/Miniconda3-latest-Linux-x86_64.sh && \
bash ~/Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3/ && \
rm -f ~/Miniconda3-latest-Linux-x86_64.sh && \
~/miniconda3/condabin/conda update conda --yes && \
~/miniconda3/condabin/conda init && \
exit
RQC
Clone (with HTTPS) the RQC repository on GitLab (ID: 404):
git clone https://forge.ird.fr/transvihmi/RQC.git ~/Reads_Quality_Control/ && \
cd ~/Reads_Quality_Control/
Difference between Download and Clone:
- To create a copy of a remote repository’s files on your computer, you can either Download or Clone the repository
- If you download it, you cannot sync the repository with the remote repository on GitLab
- Cloning a repository is the same as downloading, except it preserves the Git connection with the remote repository
- You can then modify the files locally and upload the changes to the remote repository on GitLab
- You can then update the files locally and download the changes from the remote repository on GitLab
git reset --hard HEAD && \
git pull --verbose
~ USAGE ~
-
Copy your reads (single or paired-ends) in .fastq.gz or fastq formats files into: ./resources/reads/ directory
-
Execute Start_RQC.sh bash script to run GeVarLi pipeline (according to your choice):
- or with a Double-click on it (if you make .sh files executable files with Terminal.app)
- or with a Right-click > Open with > Terminal.app
- or with CLI from a terminal:
bash Start_RQC.sh
- Yours analyzes will start, with default configuration settings
Option-1: Edit config.yaml file in ./config/ directory
Option-2: Edit fastq-screen.conf file in ./config/ directory
First run will auto-created (only once): - RQC-Base conda environment (Snakemake, Mamba, Rename, GraphViz) - Snakemake-conda environments (for each tools used by GeVarLi) - Indexes for BWA and BOWTIE2 aligners (for each fasta genomes in resources)
This may take some time, depending on your internet connection and your computer
~ RESULTS ~
Yours results are available in ./results/ directory, as follow:
🧩 Reads_Quality_Control/
└── 📂 results/
├── 🌐 All_readsQC_reports.html
├── 📂 00_Quality_Control/
│ ├── 📂 fastq-screen/
│ │ ├── 🌐 {SAMPLE}_R{1/2}_screen.html
│ │ ├── 📈 {SAMPLE}_R{1/2}_screen.png
│ │ └── 📄 {SAMPLE}_R{1/2}_screen.txt
│ ├── 📂 fastqc/
│ │ ├── 🌐 {SAMPLE}_R{1/2}_fastqc.html
│ │ └── 📦 {SAMPLE}_R{1/2}_fastqc.zip
│ └── 📂 multiqc/
│ ├── 🌐 multiqc_report.html
│ └──📂 multiqc_data/
│ ├── 📝 multiqc.log
│ ├── 📄 multiqc_citations.txt
│ ├── 🌀 multiqc_data.json
│ ├── 📄 multiqc_fastq_screen.txt
│ ├── 📄 multiqc_fastqc.txt
│ ├── 📄 multiqc_general_stats.txt
| └── 📄 multiqc_sources.txt
└── 📂 10_Reports/
├── ⚙️ config.log
├── 📝 settings.log
├── 🍜 RQC-Base_v.{VERSION}.yaml
├── 📂 files-summaries
│ └── 📄 Reads_Quality_Control_files-summary.txt
├── 📂 graphs/
│ ├── 📈 Reads_Quality_Control_dag.{PNG/PDF}
│ ├── 📈 Reads_Quality_Control_filegraph.{PNG/PDF}
│ └── 📈 Reads_Quality_Control_rulegraph.{PNG/PDF}
└── 📂 tools-log/
├── 📂 bowtie2/
├── 📂 bwa/
├── 📝 fastq-screen.log
├── 📝 fastqc.log
└── 📝 multiqc.log
fastq-screen
Search in your libraries if the genomes of organisms you work on, along with PhiX, Vectors,
or other contaminants commonly seen in sequencing experiments.
More about fastq-screen
fastqc
Modular set of analyses which you can use to give a quick impression of whether
your data has any problems of which you should be aware before doing any further analysis.
More about fastqc
multiqc
Compiled HTML report. More about multiqc
~ CONFIGURATION ~
You can edit default settings in config.yaml file into ./config/ directory:
Resources
Edit to match your hardware configuration
-
cpus: for tools that can (i.e. bwa), could be use at most n cpus to run in parallel (default config: '8')
Note: snakemake (with default Start bash script) will always use all cpus to parallelize jobs - ram: for tools that can (i.e. samtools), limit memory usage to max n Gb (default config: '16' Gb)
- tmpdir: for tools that can (i.e. pangolin), specify where you want the temp stuff (default config: '$TMPDIR')
Environments
Edit if you want change some environments (e.g. test a new version) in ./workflow/envs/{tools}_v.{version}.yaml files
Fastq-Screen
- config: path to the fastq-screen configuration file (default config: ./config/fastq-screen.conf)
- subset: do not use the whole sequence file, but create a temporary dataset of this specified number of read (default config: '1000')
- aligner: specify the aligner to use for the mapping. Valid arguments are 'bowtie', bowtie2' or 'bwa' (default config: 'bwa')
fastq-screen.conf
- databases: enables you to configure multiple genomes databases (aligner index files) to search against
~ SUPPORT ~
- Read The Fabulous Manual!
- Read de Awsome Wiki! (todo...)
- Create a new issue: Issues > New issue > Describe your issue
- Send an email to nicolas.fernandez@ird.fr
~ ROADMAP ~
- Open to suggestions
~ AUTHORS & ACKNOWLEDGMENTS ~
- Nicolas Fernandez - IRD (Developer and Maintener)
- Christelle Butel - IRD (Reporter, User-addict, Features inspiration source)
- Eddy Kinganda-Lusamaki - INRB (looking for open source unix and biologiist friendly pipeline)
~ CONTRIBUTING ~
- Open to contributions!
- Testing code, finding issues, asking for update, proposing new features...
- Use Git tools to share!
~ PROJECT STATUS ~
This project is regularly update and actively maintened
However, you can be volunteer to step in as developer or maintainer
For information about main git roles:
- Guests are not active contributors in private projects, they can only see, and leave comments and issues
- Reporters are read-only contributors, they can't write to the repository, but can on issues
-
Developers are direct contributors, they have access to everything to go from idea to production
Unless something has been explicitly restricted -
Maintainers are super-developers, they are able to push to master, deploy to production
This role is often held by maintainers and engineering managers - Owners are essentially group-admins, they can give access to groups and have destructive capabilities
~ LICENSE ~
Licencied under GPLv3
Intellectual property belongs to IRD and authors.
RQC map
🧩 Reads_Quality_Control/
├── 🖥️ Start_GeVarLi.sh
├── 📚 README.md
├── 🪪 LICENSE
├── 🛑 .gitignore
├── 📂 .git/
├── 📂 .snakemake/
├── 📂 config/
│ ├── ⚙️ config.yaml
│ └── ⚙️ fastq-screen.conf
├── 📂 resources/
│ ├── 📂 genomes/
│ │ ├── 🧬 SARS-CoV-2_Wuhan_MN908947-3.fasta
│ │ ├── 🧬 Monkeypox-virus_Zaire_AF380138-1.fasta
│ │ ├── 🧬 Monkeypox-virus_UK_MT903345-1.fasta
│ │ ├── 🧬 Swinepox-virus_India_MW036632-1.fasta
│ │ ├── 🧬 Ebola-virus_Zaire_AF272001-1.fasta
│ │ ├── 🧬 Nipah-virus_Malaysia_AJ564622-1.fasta
│ │ ├── 🧬 HIV-1_HXB2_K03455-1.fasta.fasta
│ │ ├── 🧬 (your_favorite_genome_reference}.fasta
│ │ ├── 🧬 QC_Echerichia-coli_CP060121-1.fasta
│ │ ├── 🧬 QC_Kanamycin-Resistance-Gene.fasta
│ │ ├── 🧬 QC_NGS-adapters.fasta
│ │ ├── 🧬 QC_phi-X174_Coliphage_NC-001422-1.fasta
│ │ ├── 🧬 QC_UniVec_wo_phiX_and_kanamycin.fasta
│ │ └── 🧬 {your_favorite_qc_reference}.fasta
│ ├── 📂 indexes/
│ │ ├── 📂 bwa/
│ │ │ ├── 🗂️ {GENOME}.amb
│ │ │ ├── 🗂️ {GENOME}.ann
│ │ │ ├── 🗂️ {GENOME}.bwt
│ │ │ ├── 🗂️ {GENOME}.pac
│ │ │ └── 🗂️ {GENOME}.sa
│ │ └── 📂 bowtie2/
│ │ ├── 🗂️ {GENOME}.1.bt2
│ │ ├── 🗂️ {GENOME}.2.bt2
│ │ ├── 🗂️ {GENOME}.3.bt2
│ │ ├── 🗂️ {GENOME}.4.bt2
│ │ ├── 🗂️ {GENOME}.rev.1.bt2
│ │ └── 🗂️ {GENOME}.rev.2.bt2
│ ├── 📂 reads/
│ │ ├── 🛡️ .gitkeep
│ │ ├── 📦 {SAMPLE}_R1.fastq.gz
│ │ └── 📦 {SAMPLE}_R2.fastq.gz
│ └── 📂 visuals/
│ └── 📈 quality_control_rulegraph.png
└── 📂 workflow/
├── 📂 envs/
│ ├── 📂 linux/
│ │ ├── 🍜 bowtie2_v.2.4.5.yaml
│ │ ├── 🍜 bwa_v.0.7.17.yaml
│ │ ├── 🍜 fastq-screen_v.0.15.2.yaml
│ │ ├── 🍜 fastqc_v.0.11.9.yaml
│ │ ├── 🍜 gevarli-base_v.2022.11.yaml
│ │ └── 🍜 multiqc_v.1.12.yaml
│ └── 📂 osx/
│ ├── 🍜 bowtie2_v.2.4.5.yaml
│ ├── 🍜 bwa_v.0.7.17.yaml
│ ├── 🍜 fastq-screen_v.0.15.2.yaml
│ ├── 🍜 fastqc_v.0.11.9.yaml
│ ├── 🍜 gevarli-base_v.2022.11.yaml
│ └── 🍜 multiqc_v.1.12.yaml
└── 📂 rules/
├── 📜 indexing_genomes.smk
└── 📜 quality_control.smk
~ REFERENCES ~
Sustainable data analysis with Snakemake
Felix Mölder, Kim Philipp Jablonski, Brice Letcher, Michael B. Hall, Christopher H. Tomkins-Tinch, Vanessa Sochat, Jan Forster, Soohyun Lee, Sven O. Twardziok, Alexander Kanitz, Andreas Wilm, Manuel Holtgrewe, Sven Rahmann, Sven Nahnsen, Johannes Köster
F1000Research (2021)
DOI: https://doi.org/10.12688/f1000research.29032.2
Publication: https://f1000research.com/articles/10-33/v1
Source code: https://github.com/snakemake/snakemake
Documentation: https://snakemake.readthedocs.io/en/stable/index.html
Anaconda Software Distribution
Team
Computer software (2016)
DOI:
Publication: https://www.anaconda.com
Source code: https://github.com/snakemake/snakemake (conda)
Documentation: https://snakemake.readthedocs.io/en/stable/index.html (conda)
Source code: https://github.com/mamba-org/mamba (mamba)
Documentation: https://mamba.readthedocs.io/en/latest/index.html (mamba)
Fast and accurate short read alignment with Burrows-Wheeler Transform
Heng Li and Richard Durbin
Bioinformatics, Volume 25, Aricle 1754-60 (2009)
DOI: https://doi.org/10.1093/bioinformatics/btp324
Publication: https://pubmed.ncbi.nlm.nih.gov/19451168@
Source code: https://github.com/lh3/bwa
Documentation: http://bio-bwa.sourceforge.net
MultiQC: summarize analysis results for multiple tools and samples in a single report
Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
Bioinformatics, Volume 32, Issue 19 (2016)
DOI: https://doi.org/10.1093/bioinformatics/btw354
Publication: https://academic.oup.com/bioinformatics/article/32/19/3047/2196507
Source code: https://github.com/ewels/MultiQC
Documentation: https://multiqc.info
FastQ Screen: A tool for multi-genome mapping and quality control
Wingett SW and Andrews S
F1000Research (2018)
DOI: https://doi.org/10.12688/f1000research.15931.2
Publication: https://f1000research.com/articles/7-1338/v2
Source code: https://github.com/StevenWingett/FastQ-Screen
Documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen
FastQC: A quality control tool for high throughput sequence data
Simon Andrews
Online (2010)
DOI: https://doi.org/
Publication:
Source code: https://github.com/s-andrews/FastQC
Documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc
###############################################################################