Skip to content
Snippets Groups Projects
Commit 06fd2b5d authored by nicolas.fernandez_ird.fr's avatar nicolas.fernandez_ird.fr :shinto_shrine:
Browse files

remove big files for lighter archive

parent 8d86a931
No related branches found
No related tags found
No related merge requests found
# RQC: Reads Quality Control #
![Maintener](<https://badgen.net/badge/Maintener/Nicolas Fernandez/blue?scale=0.9>)
![MacOSX Intel/M1/M2](<https://badgen.net/badge/icon/Hight Sierra (10.13.6) | Catalina (10.15.7) | Big Sure (11.6.3) | Monterey (12.6.0) | Ventura (13.0.1)/E6055C?icon=apple&label&list=|&scale=0.9>)
![GNU-Linux Ubuntu](<https://badgen.net/badge/icon/Bionic Beaver (18.04) | Focal Fossa (20.04) | Jammy Jellyfish (22.04)/772953?icon=https://www.svgrepo.com/show/25424/ubuntu-logo.svg&label&list=|&scale=0.9>)
![WSL/WSL2](<https://badgen.net/badge/icon/Bionic Beaver (18.04) | Focal Fossa (20.04) | Jammy Jellyfish (22.04)/00BCF2?icon=windows&label&list=|&scale=0.9>)
![Issues closed](<https://badgen.net/badge/Issues closed/0/green?scale=0.9>)
![Issues opened](<https://badgen.net/badge/Issues opened/0/yellow?scale=0.9>)
![Maintened](<https://badgen.net/badge/Maintened/Yes/red?scale=0.9>)
![Wiki](<https://badgen.net/badge/icon/Wiki/pink?icon=wiki&label&scale=0.9>)
![Open Source](<https://badgen.net/badge/icon/Open Source/purple?icon=https://upload.wikimedia.org/wikipedia/commons/4/44/Corazón.svg&label&scale=0.9>)
![GNU AGPL v3](<https://badgen.net/badge/Licence/GNU AGPL v3/grey?scale=0.9>)
![Gitlab](<https://badgen.net/badge/icon/Gitlab/orange?icon=gitlab&label&scale=0.9>)
![Bash](<https://badgen.net/badge/icon/Bash 3.2.57/black?icon=terminal&label&scale=0.9>)
![Python](<https://badgen.net/badge/icon/Python 3.9.5/black?icon=https://upload.wikimedia.org/wikipedia/commons/0/0a/Python.svg&label&scale=0.9>)
![Snakemake](<https://badgen.net/badge/icon/Snakemake 6.12.1/black?icon=https://upload.wikimedia.org/wikipedia/commons/d/d3/Python_icon_%28black_and_white%29.svg&label&scale=0.9>)
![Conda](<https://badgen.net/badge/icon/Conda 4.10.3/black?icon=codacy&label&scale=0.9>)
## ~ ABOUT ~ ##
### RQC ###
RQC is a FAIR, open-source, scalable, modulable and traceable snakemake pipeline, used for Illumina Inc. short reads quality controls.
RQC is included as first step of **[GeVarLi](https://www.afroscreen.org/)** workflow.
### Genomic sequencing, a public health tool ###
The establishment of a surveillance and sequencing network is an essential public health tool for detecting and containing pathogens with epidemic potential. Genomic sequencing mak\
es it possible to identify pathogens, monitor the emergence and impact of variants, and adapt public health policies accordingly.
The Covid-19 epidemic has highlighted the disparities that remain between continents in terms of surveillance and sequencing systems. At the end of October 2021, of the 4,600,000 s\
equences shared on the public and free GISAID tool worldwide, only 49,000 came from the African continent, i.e. less than 1% of the cases of Covid-19 diagnosed on this continent.
### Features ###
- Reads quality control
- Fastq-Screen
- FastQC
- MultiQC (_html report_)
### Version ###
*V.2022.11*
### Rulegraph ###
<img src="./resources/visuals/quality_control_rulegraph.png" width="250" height="150">
## ~ SUPPORT ~ ##
1. Read The Fabulous Manual!
2. Read de Awsome Wiki!
3. Create a new issue: Issues > New issue > Describe your issue
4. Send an email to [nicolas.fernandez@ird.fr](url)
## ~ CITATION ~ ##
If you use this pipeline, *please* cite this *RQC*, GitLab IRDForge repository and authors:
GitLab IRDForge repository: [https://forge.ird.fr/transvihmi/nfernandez/RQC](https://forge.ird.fr/transvihmi/nfernandez/RQC)
RQC, a FAIR, open-source, scalable, modulable and traceable snakemake pipeline,
for Illumina Inc. short reads quality controls.
Nicolas FERNANDEZ NUÑEZ _(1)_
_(1) UMI 233 - Recherches Translationnelles sur le VIH et les Maladies Infectieuses endémiques et émergentes (TransVIHMI), University of Montpellier (UM), French Institute\
of Health and Medical Research (INSERM), French National Research Institute for Sustainable Development (IRD)_
## ~ AUTHORS & ACKNOWLEDGMENTS ~ ##
- Nicolas Fernandez - IRD _(Developer and Maintener)_
- Christelle Butel - IRD _(Reporter)_
- DALL•E mini - OpenAI [Git](https://github.com/borisdayma/dalle-mini) _(Repo. avatar)_
## ~ LICENSE ~ ##
Licencied under [GPLv3](https://www.gnu.org/licenses/gpl-3.0.html)
Intellectual property belongs to [IRD](https://www.ird.fr/) and authors.
## ~ ROADMAP ~ ##
- Add MultiQC config template
## ~ PROJECT STATUS ~ ##
This project is **regularly update** and **actively maintened**
However, you can be volunteer to step in as **developer** or **maintainer**
## ~ CONTRIBUTING ~ ##
Open to contributions!
- Asking for update
- Proposing new feature
- Reporting issue
- Fixing issue
- Sharing code
- Citing tool
## ~ INSTALLATIONS ~ ##
# Conda _(dependencies)_ #
RQC use the usefull **Conda** environment manager
So, if and only if, it's required _(Conda not already installed)_, please, first install **Conda**!
Download and install your OS adapted version of [Latest Miniconda Installer](https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links)
e.g. for **MacOSX-64-bit** systems:
```shell
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ~/Miniconda3-latest-MacOSX-x86_64.sh && \
bash ~/Miniconda3-latest-MacOSX-x86_64.sh -b -p ~/miniconda3/ && \
rm -f ~/Miniconda3-latest-MacOSX-x86_64.sh && \
~/miniconda3/condabin/conda update conda --yes && \
~/miniconda3/condabin/conda init && \
exit
```
e.g. for **Linux-64-bit** systems:
```shell
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o ~/Miniconda3-latest-Linux-x86_64.sh && \
bash ~/Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3/ && \
rm -f ~/Miniconda3-latest-Linux-x86_64.sh && \
~/miniconda3/condabin/conda update conda --yes && \
~/miniconda3/condabin/conda init && \
exit
```
Update Conda:
```
conda update -n base -c defaults conda
```
# RQC #
Clone to your home/ [RQC](https://forge.ird.fr/transvihmi/nfernandez/Reads_Quality_Control) GitLab IRDForg repository _(ID: 404)_:
```shell
git https://forge.ird.fr/transvihmi/nfernandez/RQC.git ~/RQC/
```
Update RQC:
```shell
cd ~/RQC/ && git reset --hard HEAD && git pull --verbose
```
## ~ USAGE ~ ##
1. Copy your **reads** _(single or paired-ends)_ in **.fastq.gz** or **fastq** formats files into: **./resources/reads/** directory
2. Execute **Start_RQC.sh** bash script to run GeVarLi pipeline _(according to your choice)_:
- or with a **Double-click** on it _(if you make .sh files executable files with Terminal.app)_
- or with a **Right-click** > **Open with** > **Terminal.app**
- or with **CLI** from a terminal:
```shell
bash Start_RQC.sh
```
3. Yours analyzes will start, with default configuration settings
_Option-1: Edit **config.yaml** file in **./config/** directory_
_Option-2: Edit **fastq-screen.conf** file in **./config/** directory_
First run will auto-created _(only once)_:
- Snakemake-Base conda environment _(Snakemake, Mamba, Rename, GraphViz)_
- RQC-conda environments _(for each tools used by RQC)_
- Indexes for BWA aligner _(for each fasta genomes in resources)_
_This may take some time, depending on your internet connection and your computer_
## ~ RESULTS ~ ##
Yours results are available in **./results/** directory, as follow:
```shell
🧩 Reads_Quality_Control/
└── 📂 results/
├── 🌐 All_readsQC_reports.html
├── 📂 00_Quality_Control/
│ ├── 📂 fastq-screen/
│ │ ├── 🌐 {SAMPLE}_R{1/2}_screen.html
│ │ ├── 📈 {SAMPLE}_R{1/2}_screen.png
│ │ └── 📄 {SAMPLE}_R{1/2}_screen.txt
│ ├── 📂 fastqc/
│ │ ├── 🌐 {SAMPLE}_R{1/2}_fastqc.html
│ │ └── 📦 {SAMPLE}_R{1/2}_fastqc.zip
│ └── 📂 multiqc/
│ ├── 🌐 multiqc_report.html
│ └──📂 multiqc_data/
│ ├── 📝 multiqc.log
│ ├── 📄 multiqc_citations.txt
│ ├── 🌀 multiqc_data.json
│ ├── 📄 multiqc_fastq_screen.txt
│ ├── 📄 multiqc_fastqc.txt
│ ├── 📄 multiqc_general_stats.txt
| └── 📄 multiqc_sources.txt
└── 📂 10_Reports/
├── ⚙️ config.log
├── 📝 settings.log
├── 🍜 RQC-Base_v.{VERSION}.yaml
├── 📂 files-summaries
│ └── 📄 Reads_Quality_Control_files-summary.txt
├── 📂 graphs/
│ ├── 📈 Reads_Quality_Control_dag.{PNG/PDF}
│ ├── 📈 Reads_Quality_Control_filegraph.{PNG/PDF}
│ └── 📈 Reads_Quality_Control_rulegraph.{PNG/PDF}
└── 📂 tools-log/
├── 📂 bowtie2/
├── 📂 bwa/
├── 📝 fastq-screen.log
├── 📝 fastqc.log
└── 📝 multiqc.log
```
### fastq-screen ###
Search in your libraries if the genomes of organisms you work on, along with PhiX, Vectors,
or other contaminants commonly seen in sequencing experiments.
More about [fastq-screen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/)
### fastqc ###
Modular set of analyses which you can use to give a quick impression of whether
your data has any problems of which you should be aware before doing any further analysis.
More about [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
### multiqc ###
Compiled HTML report. More about [multiqc](https://multiqc.info/)
## ~ CONFIGURATION ~ ##
You can edit default settings in **config.yaml** file into **./config/** directory:
### Resources ###
Edit to match your hardware configuration
- **cpus**: for tools that can _(i.e. bwa)_, could be use at most n cpus to run in parallel _(default config: '8')_
_**Note**: snakemake (with default Start bash script) will always use all cpus to parallelize jobs_
- **ram**: for tools that can _(i.e. samtools)_, limit memory usage to max n Gb _(default config: '16' Gb)_
- **tmpdir**: for tools that can _(i.e. pangolin)_, specify where you want the temp stuff _(default config: '$TMPDIR')_
### Environments ###
Edit if you want change some environments _(e.g. test a new version)_ in ./workflow/envs/{tools}_v.{version}.yaml files
### Fastq-Screen ###
- **config**: path to the fastq-screen configuration file _(default config: ./config/fastq-screen.conf)_
- **subset**: do not use the whole sequence file, but create a temporary dataset of this specified number of read _(default config: '1000')_
- **aligner**: specify the aligner to use for the mapping. Valid arguments are 'bowtie', bowtie2' or 'bwa' _(default config: 'bwa')_
#### fastq-screen.conf ####
- **databases**: enables you to configure multiple genomes databases _(aligner index files)_ to search against
### RQC map ###
```shell
🧩 Reads_Quality_Control/
├── 🖥️ Start_GeVarLi.sh
├── 📚 README.md
├── 🪪 LICENSE
├── 🛑 .gitignore
├── 📂 .git/
├── 📂 .snakemake/
├── 📂 config/
│ ├── ⚙️ config.yaml
│ └── ⚙️ fastq-screen.conf
├── 📂 resources/
│ ├── 📂 genomes/
│ │ ├── 🧬 SARS-CoV-2_Wuhan_MN908947-3.fasta
│ │ ├── 🧬 Monkeypox-virus_Zaire_AF380138-1.fasta
│ │ ├── 🧬 Monkeypox-virus_UK_MT903345-1.fasta
│ │ ├── 🧬 Swinepox-virus_India_MW036632-1.fasta
│ │ ├── 🧬 Ebola-virus_Zaire_AF272001-1.fasta
│ │ ├── 🧬 Nipah-virus_Malaysia_AJ564622-1.fasta
│ │ ├── 🧬 HIV-1_HXB2_K03455-1.fasta.fasta
│ │ ├── 🧬 (your_favorite_genome_reference}.fasta
│ │ ├── 🧬 QC_Echerichia-coli_CP060121-1.fasta
│ │ ├── 🧬 QC_Kanamycin-Resistance-Gene.fasta
│ │ ├── 🧬 QC_NGS-adapters.fasta
│ │ ├── 🧬 QC_phi-X174_Coliphage_NC-001422-1.fasta
│ │ ├── 🧬 QC_UniVec_wo_phiX_and_kanamycin.fasta
│ │ └── 🧬 {your_favorite_qc_reference}.fasta
│ ├── 📂 indexes/
│ │ └── 📂 bwa/
│ │ ├── 🗂️ {GENOME}.amb
│ │ ├── 🗂️ {GENOME}.ann
│ │ ├── 🗂️ {GENOME}.bwt
│ │ ├── 🗂️ {GENOME}.pac
│ │ └── 🗂️ {GENOME}.sa
│ ├── 📂 reads/
│ │ ├── 🛡️ .gitkeep
│ │ ├── 📦 {SAMPLE}_R1.fastq.gz
│ │ └── 📦 {SAMPLE}_R2.fastq.gz
│ └── 📂 visuals/
│ └── 📈 quality_control_rulegraph.png
└── 📂 workflow/
├── 📂 envs/
│ ├── 📂 linux/
│ │ ├── 🍜 bwa_v.0.7.17.yaml
│ │ ├── 🍜 fastq-screen_v.0.15.2.yaml
│ │ ├── 🍜 fastqc_v.0.11.9.yaml
│ │ ├── 🍜 multiqc_v.1.12.yaml
│ │ └── 🍜 snakemake-base_v.2023.02.yaml
│ └── 📂 osx/
│ ├── 🍜 bwa_v.0.7.17.yaml
│ ├── 🍜 fastq-screen_v.0.15.2.yaml
│ ├── 🍜 fastqc_v.0.11.9.yaml
│ ├── 🍜 multiqc_v.1.12.yaml
│ └── 🍜 snakemake-base_v.2023.02.yaml
└── 📂 rules/
├── 📜 indexing_genomes.smk
└── 📜 quality_control.smk
```
## ~ REFERENCES ~ ##
**Sustainable data analysis with Snakemake**
Felix Mölder, Kim Philipp Jablonski, Brice Letcher, Michael B. Hall, Christopher H. Tomkins-Tinch, Vanessa Sochat, Jan Forster, Soohyun Lee, Sven O. Twardziok, Alexander Kanitz, Andreas Wilm, Manuel Holtgrewe, Sven Rahmann, Sven Nahnsen, Johannes Köster
_F1000Research (2021)_
**DOI**: [https://doi.org/10.12688/f1000research.29032.2](https://doi.org/10.12688/f1000research.29032.2)
**Publication**: [https://f1000research.com/articles/10-33/v1](https://f1000research.com/articles/10-33/v1)
**Source code**: [https://github.com/snakemake/snakemake](https://github.com/snakemake/snakemake)
**Documentation**: [https://snakemake.readthedocs.io/en/stable/index.html](https://snakemake.readthedocs.io/en/stable/index.html)
**Anaconda Software Distribution**
Team
_Computer software (2016)_
**DOI**: []()
**Publication**: [https://www.anaconda.com](https://www.anaconda.com)
**Source code**: [https://github.com/snakemake/snakemake](https://github.com/snakemake/snakemake) (conda)
**Documentation**: [https://snakemake.readthedocs.io/en/stable/index.html](https://snakemake.readthedocs.io/en/stable/index.html) (conda)
**Source code**: [https://github.com/mamba-org/mamba](https://github.com/mamba-org/mamba) (mamba)
**Documentation**: [https://mamba.readthedocs.io/en/latest/index.html](https://mamba.readthedocs.io/en/latest/index.html) (mamba)
**Fast and accurate short read alignment with Burrows-Wheeler Transform**
Heng Li and Richard Durbin
_Bioinformatics, Volume 25, Aricle 1754-60 (2009)_
**DOI**: [https://doi.org/10.1093/bioinformatics/btp324](https://doi.org/10.1093/bioinformatics/btp324)
**Publication**: [https://pubmed.ncbi.nlm.nih.gov/19451168@](https://pubmed.ncbi.nlm.nih.gov/19451168)
**Source code**: [https://github.com/lh3/bwa](https://github.com/lh3/bwa)
**Documentation**: [http://bio-bwa.sourceforge.net](http://bio-bwa.sourceforge.net)
**MultiQC: summarize analysis results for multiple tools and samples in a single report**
Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
_Bioinformatics, Volume 32, Issue 19 (2016)_
**DOI**: [https://doi.org/10.1093/bioinformatics/btw354](https://doi.org/10.1093/bioinformatics/btw354)
**Publication**: [https://academic.oup.com/bioinformatics/article/32/19/3047/2196507](https://academic.oup.com/bioinformatics/article/32/19/3047/2196507)
**Source code**: [https://github.com/ewels/MultiQC](https://github.com/ewels/MultiQC)
**Documentation**: [https://multiqc.info](https://multiqc.info)
**FastQ Screen: A tool for multi-genome mapping and quality control**
Wingett SW and Andrews S
_F1000Research (2018)_
**DOI**: [https://doi.org/10.12688/f1000research.15931.2](https://doi.org/10.12688/f1000research.15931.2)
**Publication**: [https://f1000research.com/articles/7-1338/v2](https://f1000research.com/articles/7-1338/v2)
**Source code**: [https://github.com/StevenWingett/FastQ-Screen](https://github.com/StevenWingett/FastQ-Screen)
**Documentation**: [https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen)
**FastQC: A quality control tool for high throughput sequence data**
Simon Andrews
_Online (2010)_
**DOI**: [https://doi.org/](https://doi.org/)
**Publication**: []()
**Source code**: [https://github.com/s-andrews/FastQC](https://github.com/s-andrews/FastQC)
**Documentation**: [https://www.bioinformatics.babraham.ac.uk/projects/fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc)
###############################################################################
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment