Skip to content
Snippets Groups Projects

RQC: Reads Quality Control

Description

RQC is a bioinformatic pipeline used to check reads qualities from NGS sequencing.

This is the macOSX version (specific conda environments).

Badges

Maintener MacOSX Issues closed Issues opened Maintened Wiki Open Source GNU AGPL v3 Gitlab Bash Python Snakemake Conda

Visuals

Image of rulegraph

Installation

Conda (prior!)

Install Conda (i.e. Miniconda3 with Python 3.9 on MacOSX-64-bit)
Latest Miniconda Installer
Follow the screen prompt instructions

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh
rm Miniconda3-latest-MacOSX-x86_64.sh

Restart shell (close and reopen new terminal window)

Snakemake (prior!)

Install Snakemake (i.e. v.6.12.1) using Conda
Follow the screen prompt instructions

conda install -c conda-forge mamba --yes
mamba install -c bioconda rename --yes
mamba install -c conda-forge -c bioconda snakemake=6.12.1 --yes  

RQC

Clone the RQC pipeline** project

HTTPS

  • Clone with HTTPS (when you want to authenticate each time you perform an operation between your computer and GitLab)
    Authenticate with GitLab by following the instruction in the 2FA documentation
git clone https://gitlab.com/ird_transvihmi/Reads_Quality_Control_Pipeline.git
mv Reads_Quality_Control_Pipeline/ ~/Desktop/RQC_Pipeline/
cd ~/Desktop/RQC_Pipeline/

SSH

  • Clone with SSH (when you want to authenticate only one time)
    Authenticate with GitLab by following the instructions in the SSH documentation
git clone git@gitlab.com:ird_transvihmi/Reads_Quality_Control_Pipeline.git
mv Reads_Quality_Control_Pipeline/ ~/Desktop/RQC_Pipeline/
cd ~/Desktop/RQC_Pipeline/

Difference between Download and Clone

To create a copy of a remote repository’s files on your computer, you can either download or clone the repository
If you download it, you cannot sync the repository with the remote repository on GitLab
Cloning a repository is the same as downloading, except it preserves the Git connection with the remote repository
You can then modify the files locally and upload the changes to the remote repository on GitLab

Usage

  • Copy your paired-end reads in fastq.gz format files into: ./resources/reads/ directory
  • (option) Edit config.yaml file on ./config/ directory, as you want, if needed
  • (option) Edit fastq-screen.conf file on ./config/ directory, as you want, if needed
  • Run RQC.sh bash script by double-clicking on it (a terminal window will open and analyzes start)

Results

Yours results are available in results directory as follow:

  1. fastq-screen: your search libraries might contain the genomes of all of the organisms you work on, along with PhiX, Vectors or other contaminants commonly seen in sequencing experiments. More about fastq-screen
  2. fastqc: modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. More about fastqc
  3. multiqc: compiled HTML report. More about multiqc
  4. reports: log from tools
  5. summary: snakemake rules graph and files summary.

Configuration

./config/config.yaml

Resources

Edit to match your hardware configuration (given when you run RQC.sh)

Environments

Edit if you change some environments (i.e.new version) in ./workflow/envs/tools-version.yaml files (you should not change this)

Fastq-Screen
  • config: Path to the fastq-screen configuration file (default config: ./config/fastq-screen.conf)
  • subset: Don't use the whole sequence file, but create a temporary dataset of this specified number of read (default config: '10000', set '0' for all dataset)
  • aligner: Specify the aligner to use for the mapping. Valid arguments are 'bowtie', bowtie2' or 'bwa' (default config: 'bwa')

./config/fastq-screen.conf

databases

For each genome you need to provide a database name (which can't contain spaces) and the location of the aligner index files

The path to the index files should include the basename of the index, (e.g: ./resources/databases//Human/Homo_sapiens_h38)
Thus, the index files (Homo_sapiens_h38.bt2, Homo_sapiens_h38.2.bt2, etc.) are found in a folder named 'Homo_sapiens_h38'
For example, the Bowtie, Bowtie2 and BWA indices of a given genome reside in the same folder
A single path may be provided to all the of indices

The index used will be the one compatible with the chosen aligner (as specified using the --aligner option)

The entries shown in ./config/fastq-screen.conf are only suggested examples,

  • You can add as many database sections as required
  • You can comment out or remove as many of the existing entries as desired

It's suggested including genomes and sequences that:

  • may be sources of contamination either because they where run on your sequencer previously
  • may have contaminated your sample during the library preparation step

For IRD_U233_TransVIHMI, we can provid:

  • Human: main sources of laboratory contaminations
  • Mouse: main model in biology experimentation, very frequent in NGS facility core
  • Arabidopsis: frequent plant model in NGS facility core associated with plants researches (IRD, CIRAD, INRAE, ...)
  • Ecoli: frequent bacteria model, also an indicator of human contaminations, also in feces and stool samples
  • PhiX: usefull control in Illumina sequencing run technology
  • Adapters: use for libraries generation
  • Vector: use in general molecular biology
  • Gorilla: species studied in TransVIHMI
  • Chimpanzee: species studied in TransVIHMI
  • Bat: species studied in TransVIHMI
  • HIV: species studied in TransVIHMI
  • Ebola: species studied in TransVIHMI
  • SARS-CoV-2: species studied in TransVIHMI
  • Coronavirus: species studied in Trans

Indexes for larger genomes can be heavy (~ 3Gb) and gitlab limit each project to 10Gb.
Download all this databases can be also very long. So we commonly share on gitlab code but resources.
This data can be download separatly, from dedicated servers.
Or you can freely ask for share (with physical support or FileSender), to add it to your analyses.
You can also ask for new indexes, for your favorite genomes not yet included.

Support

  1. RTFM! (Read The Fabulous Manual! ^^.)
  2. Read de awsome wiki ;)
  3. Create a new issue: Issues > New issue > Describe your issue
  4. Send an email to nicolas.fernandez@ird.fr
  5. Call me to +33.(0)4.67.41.55.xx (No don't please O_o!)

Roadmap

Add a wiki !
Finish documentation about "terminal" and "results" Add new features

Contributing

Open to contributions :)
Testing code, finding issues, asking for update, proposing new features ...
Use Git tools to share!

Authors and acknowledgment

  • Nicolas Fernandez (Developer and Maintener)
  • Christelle Butel (Reporter, User-addict, Fetaures inspiration source)

License

GPLv3

Project status

This project is regularly update and actively maintened
However, you can be volunteer to step in as a maintainer

For information about main git roles:

  • Guests are not active contributors in private projects, they can only see, and leave comments and issues.
  • Reporters are read-only contributors, they can't write to the repository, but can on issues.
  • Developers are direct contributors, they have access to everything to go from idea to production,
    unless something has been explicitly restricted.
  • Maintainers are super-developers, they are able to push to master, deploy to production.
    This role is often held by maintainers and engineering managers.
  • Owners are essentially group-admins, they can give access to groups and have destructive capabilities.