diff --git a/OLD_README.md b/OLD_README.md deleted file mode 100644 index ff4f65140a1c444820f0e5bdd05f7f1699b7331a..0000000000000000000000000000000000000000 --- a/OLD_README.md +++ /dev/null @@ -1,391 +0,0 @@ -# RQC: Reads Quality Control # - - - | Catalina (10.15.7) | Big Sure (11.6.3) | Monterey (12.6.0) | Ventura (13.0.1)/E6055C?icon=apple&label&list=|&scale=0.9>) - | Focal Fossa (20.04) | Jammy Jellyfish (22.04)/772953?icon=https://www.svgrepo.com/show/25424/ubuntu-logo.svg&label&list=|&scale=0.9>) - | Focal Fossa (20.04) | Jammy Jellyfish (22.04)/00BCF2?icon=windows&label&list=|&scale=0.9>) - - - - - - - - - - - - - -## ~ ABOUT ~ ## - - -### RQC ### - -RQC is a FAIR, open-source, scalable, modulable and traceable snakemake pipeline, used for Illumina Inc. short reads quality controls. -RQC is included as first step of **[GeVarLi](https://www.afroscreen.org/)** workflow. - - -### Genomic sequencing, a public health tool ### - -The establishment of a surveillance and sequencing network is an essential public health tool for detecting and containing pathogens with epidemic potential. Genomic sequencing mak\ -es it possible to identify pathogens, monitor the emergence and impact of variants, and adapt public health policies accordingly. - -The Covid-19 epidemic has highlighted the disparities that remain between continents in terms of surveillance and sequencing systems. At the end of October 2021, of the 4,600,000 s\ -equences shared on the public and free GISAID tool worldwide, only 49,000 came from the African continent, i.e. less than 1% of the cases of Covid-19 diagnosed on this continent. - -### Features ### - -- Reads quality control - - Fastq-Screen - - FastQC - - MultiQC (_html report_) - - -### Version ### - -*V.2022.11* - - -### Rulegraph ### - -<img src="./resources/visuals/quality_control_rulegraph.png" width="250" height="150"> - - -## ~ SUPPORT ~ ## - -1. Read The Fabulous Manual! -2. Read de Awsome Wiki! -3. Create a new issue: Issues > New issue > Describe your issue -4. Send an email to [nicolas.fernandez@ird.fr](url) - -## ~ CITATION ~ ## - -If you use this pipeline, *please* cite this *RQC*, GitLab IRDForge repository and authors: - -GitLab IRDForge repository: [https://forge.ird.fr/transvihmi/nfernandez/RQC](https://forge.ird.fr/transvihmi/nfernandez/RQC) - -RQC, a FAIR, open-source, scalable, modulable and traceable snakemake pipeline, -for Illumina Inc. short reads quality controls. - -Nicolas FERNANDEZ NUÑEZ _(1)_ -_(1) UMI 233 - Recherches Translationnelles sur le VIH et les Maladies Infectieuses endémiques et émergentes (TransVIHMI), University of Montpellier (UM), French Institute\ - of Health and Medical Research (INSERM), French National Research Institute for Sustainable Development (IRD)_ - - -## ~ AUTHORS & ACKNOWLEDGMENTS ~ ## - -- Nicolas Fernandez - IRD _(Developer and Maintener)_ -- Christelle Butel - IRD _(Reporter)_ -- DALL•E mini - OpenAI [Git](https://github.com/borisdayma/dalle-mini) _(Repo. avatar)_ - - -## ~ LICENSE ~ ## - -Licencied under [GPLv3](https://www.gnu.org/licenses/gpl-3.0.html) -Intellectual property belongs to [IRD](https://www.ird.fr/) and authors. - - -## ~ ROADMAP ~ ## - -- Add MultiQC config template - - -## ~ PROJECT STATUS ~ ## - -This project is **regularly update** and **actively maintened** -However, you can be volunteer to step in as **developer** or **maintainer** - - -## ~ CONTRIBUTING ~ ## - -Open to contributions! - -- Asking for update -- Proposing new feature -- Reporting issue -- Fixing issue -- Sharing code -- Citing tool - - -## ~ INSTALLATIONS ~ ## - -# Conda _(dependencies)_ # - -RQC use the usefull **Conda** environment manager -So, if and only if, it's required _(Conda not already installed)_, please, first install **Conda**! - -Download and install your OS adapted version of [Latest Miniconda Installer](https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links) - -e.g. for **MacOSX-64-bit** systems: -```shell -curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ~/Miniconda3-latest-MacOSX-x86_64.sh && \ -bash ~/Miniconda3-latest-MacOSX-x86_64.sh -b -p ~/miniconda3/ && \ -rm -f ~/Miniconda3-latest-MacOSX-x86_64.sh && \ -~/miniconda3/condabin/conda update conda --yes && \ -~/miniconda3/condabin/conda init && \ -exit -``` - -e.g. for **Linux-64-bit** systems: -```shell -curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o ~/Miniconda3-latest-Linux-x86_64.sh && \ -bash ~/Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3/ && \ -rm -f ~/Miniconda3-latest-Linux-x86_64.sh && \ -~/miniconda3/condabin/conda update conda --yes && \ -~/miniconda3/condabin/conda init && \ -exit -``` - -Update Conda: -``` -conda update -n base -c defaults conda -``` - - -# RQC # - -Clone to your home/ [RQC](https://forge.ird.fr/transvihmi/nfernandez/Reads_Quality_Control) GitLab IRDForg repository _(ID: 404)_: -```shell -git https://forge.ird.fr/transvihmi/nfernandez/RQC.git ~/RQC/ -``` - -Update RQC: - -```shell -cd ~/RQC/ && git reset --hard HEAD && git pull --verbose -``` - -## ~ USAGE ~ ## - -1. Copy your **reads** _(single or paired-ends)_ in **.fastq.gz** or **fastq** formats files into: **./resources/reads/** directory - -2. Execute **Start_RQC.sh** bash script to run GeVarLi pipeline _(according to your choice)_: - - or with a **Double-click** on it _(if you make .sh files executable files with Terminal.app)_ - - or with a **Right-click** > **Open with** > **Terminal.app** - - or with **CLI** from a terminal: -```shell -bash Start_RQC.sh -``` -3. Yours analyzes will start, with default configuration settings - -_Option-1: Edit **config.yaml** file in **./config/** directory_ -_Option-2: Edit **fastq-screen.conf** file in **./config/** directory_ - -First run will auto-created _(only once)_: - - Snakemake-Base conda environment _(Snakemake, Mamba, Rename, GraphViz)_ - - RQC-conda environments _(for each tools used by RQC)_ - - Indexes for BWA aligner _(for each fasta genomes in resources)_ - -_This may take some time, depending on your internet connection and your computer_ - - -## ~ RESULTS ~ ## - -Yours results are available in **./results/** directory, as follow: - -```shell - 🧩 Reads_Quality_Control/ - └── 📂 results/ - ├── 🌠All_readsQC_reports.html - ├── 📂 00_Quality_Control/ - │ ├── 📂 fastq-screen/ - │ │ ├── 🌠{SAMPLE}_R{1/2}_screen.html - │ │ ├── 📈 {SAMPLE}_R{1/2}_screen.png - │ │ └── 📄 {SAMPLE}_R{1/2}_screen.txt - │ ├── 📂 fastqc/ - │ │ ├── 🌠{SAMPLE}_R{1/2}_fastqc.html - │ │ └── 📦 {SAMPLE}_R{1/2}_fastqc.zip - │ └── 📂 multiqc/ - │   ├── 🌠multiqc_report.html - │ └──📂 multiqc_data/ - │    ├── 📠multiqc.log - │    ├── 📄 multiqc_citations.txt - │    ├── 🌀 multiqc_data.json - │    ├── 📄 multiqc_fastq_screen.txt - │    ├── 📄 multiqc_fastqc.txt - │    ├── 📄 multiqc_general_stats.txt - |    └── 📄 multiqc_sources.txt - └── 📂 10_Reports/ -  ├── âš™ï¸ config.log -  ├── 📠settings.log -  ├── 🜠RQC-Base_v.{VERSION}.yaml - ├── 📂 files-summaries -  │ └── 📄 Reads_Quality_Control_files-summary.txt -  ├── 📂 graphs/ -  │ ├── 📈 Reads_Quality_Control_dag.{PNG/PDF} -  │ ├── 📈 Reads_Quality_Control_filegraph.{PNG/PDF} -  │ └── 📈 Reads_Quality_Control_rulegraph.{PNG/PDF} -  └── 📂 tools-log/ - ├── 📂 bowtie2/ - ├── 📂 bwa/ - ├── 📠fastq-screen.log - ├── 📠fastqc.log -   └── 📠multiqc.log -``` - -### fastq-screen ### - -Search in your libraries if the genomes of organisms you work on, along with PhiX, Vectors, -or other contaminants commonly seen in sequencing experiments. -More about [fastq-screen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/) - - -### fastqc ### - -Modular set of analyses which you can use to give a quick impression of whether -your data has any problems of which you should be aware before doing any further analysis. -More about [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) - - -### multiqc ### - -Compiled HTML report. More about [multiqc](https://multiqc.info/) - - -## ~ CONFIGURATION ~ ## - -You can edit default settings in **config.yaml** file into **./config/** directory: - -### Resources ### - -Edit to match your hardware configuration -- **cpus**: for tools that can _(i.e. bwa)_, could be use at most n cpus to run in parallel _(default config: '8')_ -_**Note**: snakemake (with default Start bash script) will always use all cpus to parallelize jobs_ -- **ram**: for tools that can _(i.e. samtools)_, limit memory usage to max n Gb _(default config: '16' Gb)_ -- **tmpdir**: for tools that can _(i.e. pangolin)_, specify where you want the temp stuff _(default config: '$TMPDIR')_ - - -### Environments ### - -Edit if you want change some environments _(e.g. test a new version)_ in ./workflow/envs/{tools}_v.{version}.yaml files - - -### Fastq-Screen ### - -- **config**: path to the fastq-screen configuration file _(default config: ./config/fastq-screen.conf)_ -- **subset**: do not use the whole sequence file, but create a temporary dataset of this specified number of read _(default config: '1000')_ -- **aligner**: specify the aligner to use for the mapping. Valid arguments are 'bowtie', bowtie2' or 'bwa' _(default config: 'bwa')_ - -#### fastq-screen.conf #### - -- **databases**: enables you to configure multiple genomes databases _(aligner index files)_ to search against - - -### RQC map ### - -```shell - 🧩 Reads_Quality_Control/ - ├── ðŸ–¥ï¸ Start_GeVarLi.sh - ├── 📚 README.md - ├── 🪪 LICENSE - ├── 🛑 .gitignore - ├── 📂 .git/ - ├── 📂 .snakemake/ - ├── 📂 config/ - │ ├── âš™ï¸ config.yaml - │ └── âš™ï¸ fastq-screen.conf - ├── 📂 resources/ - │ ├── 📂 genomes/ - │ │ ├── 🧬 SARS-CoV-2_Wuhan_MN908947-3.fasta - │ │ ├── 🧬 Monkeypox-virus_Zaire_AF380138-1.fasta - │ │ ├── 🧬 Monkeypox-virus_UK_MT903345-1.fasta - │ │ ├── 🧬 Swinepox-virus_India_MW036632-1.fasta - │ │ ├── 🧬 Ebola-virus_Zaire_AF272001-1.fasta - │ │ ├── 🧬 Nipah-virus_Malaysia_AJ564622-1.fasta - │ │ ├── 🧬 HIV-1_HXB2_K03455-1.fasta.fasta - │ │ ├── 🧬 (your_favorite_genome_reference}.fasta - │ │ ├── 🧬 QC_Echerichia-coli_CP060121-1.fasta - │ │ ├── 🧬 QC_Kanamycin-Resistance-Gene.fasta - │ │ ├── 🧬 QC_NGS-adapters.fasta - │ │ ├── 🧬 QC_phi-X174_Coliphage_NC-001422-1.fasta - │ │ ├── 🧬 QC_UniVec_wo_phiX_and_kanamycin.fasta - │ │ └── 🧬 {your_favorite_qc_reference}.fasta - │ ├── 📂 indexes/ - │ │ └── 📂 bwa/ - │ │ ├── ðŸ—‚ï¸ {GENOME}.amb - │ │ ├── ðŸ—‚ï¸ {GENOME}.ann - │ │ ├── ðŸ—‚ï¸ {GENOME}.bwt - │ │ ├── ðŸ—‚ï¸ {GENOME}.pac - │ │ └── ðŸ—‚ï¸ {GENOME}.sa - │ ├── 📂 reads/ - │ │ ├── ðŸ›¡ï¸ .gitkeep - │ │ ├── 📦 {SAMPLE}_R1.fastq.gz - │ │ └── 📦 {SAMPLE}_R2.fastq.gz - │ └── 📂 visuals/ - │ └── 📈 quality_control_rulegraph.png - └── 📂 workflow/ - ├── 📂 envs/ - │ ├── 📂 linux/ - │ │ ├── 🜠bwa_v.0.7.17.yaml - │ │ ├── 🜠fastq-screen_v.0.15.2.yaml - │ │ ├── 🜠fastqc_v.0.11.9.yaml - │ │ ├── 🜠multiqc_v.1.12.yaml - │ │ └── 🜠snakemake-base_v.2023.02.yaml - │ └── 📂 osx/ - │ ├── 🜠bwa_v.0.7.17.yaml - │ ├── 🜠fastq-screen_v.0.15.2.yaml - │ ├── 🜠fastqc_v.0.11.9.yaml - │ ├── 🜠multiqc_v.1.12.yaml - │ └── 🜠snakemake-base_v.2023.02.yaml - └── 📂 rules/ - ├── 📜 indexing_genomes.smk - └── 📜 quality_control.smk -``` - - -## ~ REFERENCES ~ ## - -**Sustainable data analysis with Snakemake** -Felix Mölder, Kim Philipp Jablonski, Brice Letcher, Michael B. Hall, Christopher H. Tomkins-Tinch, Vanessa Sochat, Jan Forster, Soohyun Lee, Sven O. Twardziok, Alexander Kanitz, Andreas Wilm, Manuel Holtgrewe, Sven Rahmann, Sven Nahnsen, Johannes Köster -_F1000Research (2021)_ -**DOI**: [https://doi.org/10.12688/f1000research.29032.2](https://doi.org/10.12688/f1000research.29032.2) -**Publication**: [https://f1000research.com/articles/10-33/v1](https://f1000research.com/articles/10-33/v1) -**Source code**: [https://github.com/snakemake/snakemake](https://github.com/snakemake/snakemake) -**Documentation**: [https://snakemake.readthedocs.io/en/stable/index.html](https://snakemake.readthedocs.io/en/stable/index.html) - -**Anaconda Software Distribution** -Team -_Computer software (2016)_ -**DOI**: []() -**Publication**: [https://www.anaconda.com](https://www.anaconda.com) -**Source code**: [https://github.com/snakemake/snakemake](https://github.com/snakemake/snakemake) (conda) -**Documentation**: [https://snakemake.readthedocs.io/en/stable/index.html](https://snakemake.readthedocs.io/en/stable/index.html) (conda) -**Source code**: [https://github.com/mamba-org/mamba](https://github.com/mamba-org/mamba) (mamba) -**Documentation**: [https://mamba.readthedocs.io/en/latest/index.html](https://mamba.readthedocs.io/en/latest/index.html) (mamba) - -**Fast and accurate short read alignment with Burrows-Wheeler Transform** -Heng Li and Richard Durbin -_Bioinformatics, Volume 25, Aricle 1754-60 (2009)_ -**DOI**: [https://doi.org/10.1093/bioinformatics/btp324](https://doi.org/10.1093/bioinformatics/btp324) -**Publication**: [https://pubmed.ncbi.nlm.nih.gov/19451168@](https://pubmed.ncbi.nlm.nih.gov/19451168) -**Source code**: [https://github.com/lh3/bwa](https://github.com/lh3/bwa) -**Documentation**: [http://bio-bwa.sourceforge.net](http://bio-bwa.sourceforge.net) - -**MultiQC: summarize analysis results for multiple tools and samples in a single report** -Philip Ewels, MÃ¥ns Magnusson, Sverker Lundin and Max Käller -_Bioinformatics, Volume 32, Issue 19 (2016)_ -**DOI**: [https://doi.org/10.1093/bioinformatics/btw354](https://doi.org/10.1093/bioinformatics/btw354) -**Publication**: [https://academic.oup.com/bioinformatics/article/32/19/3047/2196507](https://academic.oup.com/bioinformatics/article/32/19/3047/2196507) -**Source code**: [https://github.com/ewels/MultiQC](https://github.com/ewels/MultiQC) -**Documentation**: [https://multiqc.info](https://multiqc.info) - -**FastQ Screen: A tool for multi-genome mapping and quality control** -Wingett SW and Andrews S -_F1000Research (2018)_ -**DOI**: [https://doi.org/10.12688/f1000research.15931.2](https://doi.org/10.12688/f1000research.15931.2) -**Publication**: [https://f1000research.com/articles/7-1338/v2](https://f1000research.com/articles/7-1338/v2) -**Source code**: [https://github.com/StevenWingett/FastQ-Screen](https://github.com/StevenWingett/FastQ-Screen) -**Documentation**: [https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen) - -**FastQC: A quality control tool for high throughput sequence data** -Simon Andrews -_Online (2010)_ -**DOI**: [https://doi.org/](https://doi.org/) -**Publication**: []() -**Source code**: [https://github.com/s-andrews/FastQC](https://github.com/s-andrews/FastQC) -**Documentation**: [https://www.bioinformatics.babraham.ac.uk/projects/fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc) - - -############################################################################### diff --git a/resources/data_test/SARS-CoV-2_Omicron-BA-1-1_Covid-Seq-Lib-on-MiSeq_250000-reads_R1.fastq.gz b/resources/data_test/SARS-CoV-2_Omicron-BA-1-1_Covid-Seq-Lib-on-MiSeq_250000-reads_R1.fastq.gz deleted file mode 100644 index efda6523124e705811d70e9ee78e11f542bdd582..0000000000000000000000000000000000000000 Binary files a/resources/data_test/SARS-CoV-2_Omicron-BA-1-1_Covid-Seq-Lib-on-MiSeq_250000-reads_R1.fastq.gz and /dev/null differ diff --git a/resources/data_test/SARS-CoV-2_Omicron-BA-1-1_Covid-Seq-Lib-on-MiSeq_250000-reads_R2.fastq.gz b/resources/data_test/SARS-CoV-2_Omicron-BA-1-1_Covid-Seq-Lib-on-MiSeq_250000-reads_R2.fastq.gz deleted file mode 100644 index 852cc3f5735a9b9c2628b78f12b8e1f2719d7d6f..0000000000000000000000000000000000000000 Binary files a/resources/data_test/SARS-CoV-2_Omicron-BA-1-1_Covid-Seq-Lib-on-MiSeq_250000-reads_R2.fastq.gz and /dev/null differ