RQC: Reads Quality Control
Description
RQC is a bioinformatic pipeline used to check reads qualities from NGS sequencing.
This is the macOSX version (specific conda environments).
Badges
Visuals
Installation
Conda (prior!)
Install Conda (i.e. Miniconda3 with Python 3.9 on MacOSX-64-bit)
Latest Miniconda Installer
Follow the screen prompt instructions
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh
rm Miniconda3-latest-MacOSX-x86_64.sh
Restart shell (close and reopen new terminal window)
Snakemake (prior!)
Install Snakemake (i.e. v.6.12.1) using Conda
Follow the screen prompt instructions
conda install -c conda-forge mamba --yes
mamba install -c bioconda rename --yes
mamba install -c conda-forge -c bioconda snakemake=6.12.1 --yes
RQC
Clone the RQC pipeline** project
HTTPS
- Clone with HTTPS (when you want to authenticate each time you perform an operation between your computer and GitLab)
Authenticate with GitLab by following the instruction in the 2FA documentation
git clone https://gitlab.com/ird_transvihmi/Reads_Quality_Control_Pipeline.git
mv Reads_Quality_Control_Pipeline/ ~/Desktop/RQC_Pipeline/
cd ~/Desktop/RQC_Pipeline/
SSH
- Clone with SSH (when you want to authenticate only one time)
Authenticate with GitLab by following the instructions in the SSH documentation
git clone git@gitlab.com:ird_transvihmi/Reads_Quality_Control_Pipeline.git
mv Reads_Quality_Control_Pipeline/ ~/Desktop/RQC_Pipeline/
cd ~/Desktop/RQC_Pipeline/
Difference between Download and Clone
To create a copy of a remote repository’s files on your computer, you can either download or clone the repository
If you download it, you cannot sync the repository with the remote repository on GitLab
Cloning a repository is the same as downloading, except it preserves the Git connection with the remote repository
You can then modify the files locally and upload the changes to the remote repository on GitLab
Usage
- Copy your paired-end reads in fastq.gz format files into: ./resources/reads/ directory
- (option) Edit config.yaml file on ./config/ directory, as you want, if needed
- (option) Edit fastq-screen.conf file on ./config/ directory, as you want, if needed
- Run RQC.sh bash script by double-clicking on it (a terminal window will open and analyzes start)
Results
Yours results are available in results directory as follow:
- fastq-screen: your search libraries might contain the genomes of all of the organisms you work on, along with PhiX, Vectors or other contaminants commonly seen in sequencing experiments. More about fastq-screen
- fastqc: modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. More about fastqc
- multiqc: compiled HTML report. More about multiqc
- reports: log from tools
- summary: snakemake rules graph and files summary.
Configuration
./config/config.yaml
Resources
Edit to match your hardware configuration (given when you run RQC.sh)
Environments
Edit if you change some environments (i.e.new version) in ./workflow/envs/tools-version.yaml files (you should not change this)
Fastq-Screen
- config: Path to the fastq-screen configuration file (default config: ./config/fastq-screen.conf)
- subset: Don't use the whole sequence file, but create a temporary dataset of this specified number of read (default config: '10000', set '0' for all dataset)
- aligner: Specify the aligner to use for the mapping. Valid arguments are 'bowtie', bowtie2' or 'bwa' (default config: 'bwa')
./config/fastq-screen.conf
databases
For each genome you need to provide a database name (which can't contain spaces) and the location of the aligner index files
The path to the index files should include the basename of the index, (e.g: ./resources/databases//Human/Homo_sapiens_h38)
Thus, the index files (Homo_sapiens_h38.bt2, Homo_sapiens_h38.2.bt2, etc.) are found in a folder named 'Homo_sapiens_h38'
For example, the Bowtie, Bowtie2 and BWA indices of a given genome reside in the same folder
A single path may be provided to all the of indices
The index used will be the one compatible with the chosen aligner (as specified using the --aligner option)
The entries shown in ./config/fastq-screen.conf are only suggested examples,
- You can add as many database sections as required
- You can comment out or remove as many of the existing entries as desired
It's suggested including genomes and sequences that:
- may be sources of contamination either because they where run on your sequencer previously
- may have contaminated your sample during the library preparation step
For IRD_U233_TransVIHMI, we can provid:
- Human: main sources of laboratory contaminations
- Mouse: main model in biology experimentation, very frequent in NGS facility core
- Arabidopsis: frequent plant model in NGS facility core associated with plants researches (IRD, CIRAD, INRAE, ...)
- Ecoli: frequent bacteria model, also an indicator of human contaminations, also in feces and stool samples
- PhiX: usefull control in Illumina sequencing run technology
- Adapters: use for libraries generation
- Vector: use in general molecular biology
- Gorilla: species studied in TransVIHMI
- Chimpanzee: species studied in TransVIHMI
- Bat: species studied in TransVIHMI
- HIV: species studied in TransVIHMI
- Ebola: species studied in TransVIHMI
- SARS-CoV-2: species studied in TransVIHMI
- Coronavirus: species studied in Trans
Indexes for larger genomes can be heavy (~ 3Gb) and gitlab limit each project to 10Gb.
Download all this databases can be also very long. So we commonly share on gitlab code but resources.
This data can be download separatly, from dedicated servers.
Or you can freely ask for share (with physical support or FileSender), to add it to your analyses.
You can also ask for new indexes, for your favorite genomes not yet included.
Support
- RTFM! (Read The Fabulous Manual! ^^.)
- Read de awsome wiki ;)
- Create a new issue: Issues > New issue > Describe your issue
- Send an email to nicolas.fernandez@ird.fr
- Call me to
+33.(0)4.67.41.55.xx
(No don't please O_o!)
Roadmap
Add a wiki !
Finish documentation about "terminal" and "results"
Add new features
Contributing
Open to contributions :)
Testing code, finding issues, asking for update, proposing new features ...
Use Git tools to share!
Authors and acknowledgment
- Nicolas Fernandez (Developer and Maintener)
- Christelle Butel (Reporter, User-addict, Fetaures inspiration source)
License
Project status
This project is regularly update and actively maintened
However, you can be volunteer to step in as a maintainer
For information about main git roles:
- Guests are not active contributors in private projects, they can only see, and leave comments and issues.
- Reporters are read-only contributors, they can't write to the repository, but can on issues.
-
Developers are direct contributors, they have access to everything to go from idea to production,
unless something has been explicitly restricted. -
Maintainers are super-developers, they are able to push to master, deploy to production.
This role is often held by maintainers and engineering managers. - Owners are essentially group-admins, they can give access to groups and have destructive capabilities.