Skip to content
Snippets Groups Projects
README.md 7.19 KiB
Newer Older
# metagWGS: Documentation
Celine Noirot's avatar
Celine Noirot committed

Joanna Fourquet's avatar
Joanna Fourquet committed
## Introduction
**metagWGS** is a [Nextflow](https://www.nextflow.io/docs/latest/index.html#) bioinformatics analysis pipeline used for **metag**enomic **W**hole **G**enome **S**hotgun sequencing data (Illumina HiSeq3000 or NovaSeq, paired, 2\*150bp ; PacBio HiFi reads, single-end).
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
### Pipeline graphical representation
The workflow processes raw data from `.fastq/.fastq.gz` input and/or assemblies (contigs) `.fa/.fasta` and uses the modules represented in this figure:
![](docs/source/images/metagwgs_metro_map.png)
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
### metagWGS steps
Joanna Fourquet's avatar
Joanna Fourquet committed

metagWGS is split into different steps that correspond to different parts of the bioinformatics analysis.
Many of these steps are optional and their necessity depends on the desired analysis.
Joanna Fourquet's avatar
Joanna Fourquet committed

* `S01_CLEAN_QC`
Joanna Fourquet's avatar
Joanna Fourquet committed
   * trims adapters sequences and deletes low quality reads ([Cutadapt](https://cutadapt.readthedocs.io/en/stable/#), [Sickle](https://github.com/najoshi/sickle))
   * suppresses host contaminants ([BWA-MEM2](https://github.com/bwa-mem2/bwa-mem2) or [Minimap2](https://github.com/lh3/minimap2) + [Samtools](http://www.htslib.org/))
Joanna Fourquet's avatar
Joanna Fourquet committed
   * controls the quality of raw and cleaned data ([FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
   * makes a taxonomic classification of cleaned reads ([Kaiju MEM](https://github.com/bioinformatics-centre/kaiju) + [kronaTools](https://github.com/marbl/Krona/wiki/KronaTools) + [plot_kaiju_stat.py](bin/plot_kaiju_stat.py) + [merge_kaiju_results.py](bin/merge_kaiju_results.py))
* `S02_ASSEMBLY` 
   * assembles reads ([metaSPAdes](https://github.com/ablab/spades) or [Megahit](https://github.com/voutcn/megahit) or [Hifiasm_meta](https://github.com/lh3/hifiasm-meta), [metaFlye](https://github.com/fenderglass/Flye))
Joanna Fourquet's avatar
Joanna Fourquet committed
   * assesses the quality of assembly ([metaQUAST](http://quast.sourceforge.net/metaquast))
   * reads deduplication, alignment against contigs for short reads ([BWA-MEM2](https://github.com/bwa-mem2/bwa-mem2) + [Samtools](http://www.htslib.org/))
VIENNE MAINA's avatar
VIENNE MAINA committed
   * reads alignment against contigs for HiFi reads ([Minimap2](https://github.com/lh3/minimap2)  + [Samtools](http://www.htslib.org/))
* `S03_FILTERING` 
   * filters contigs with low CPM value ([filter_contig_per_cpm.py](bin/filter_contig_per_cpm.py) + [metaQUAST](http://quast.sourceforge.net/metaquast))
* `S04_STRUCTURAL_ANNOT` 
Jean Mainguy's avatar
Jean Mainguy committed
   * makes a structural annotation of genes ([Prodigal](https://github.com/hyattpd/Prodigal) + [Barrnap](https://github.com/tseemann/barrnap) + [tRNAscan-SE](https://github.com/UCSC-LoweLab/tRNAscan-SE) + [merge_annotations.py](bin/merge_annotations.py))
* `S05_PROTEIN_ALIGNMENT`
Joanna Fourquet's avatar
Joanna Fourquet committed
   * aligns the protein sequence of genes against a protein database ([DIAMOND](https://github.com/bbuchfink/diamond))
* `S06_FUNC_ANNOT` 
   * makes a sample and global clustering of proteins ([cd-hit](https://www.bioinformatics.org/cd-hit/) + [cd_hit_produce_table_clstr.py](bin/cd_hit_produce_table_clstr.py))
   * quantifies reads that align with the genes ([featureCounts](http://subread.sourceforge.net/) + [quantification_clusters.py](bin/quantification_clusters.py))
   * makes a functional annotation of genes and a quantification of reads by function ([eggNOG-mapper](http://eggnog-mapper.embl.de/) + [merge_abundance_and_functional_annotations.py](bin/merge_abundance_and_functional_annotations.py) + [quantification_by_functional_annotation.py](bin/quantification_by_functional_annotation.py))
* `S07_TAXO_AFFI` 
   * taxonomically affiliates the genes ([Samtools](http://www.htslib.org/) + [aln_to_tax_affi.py](bin/aln_to_tax_affi.py))
   * taxonomically affiliates the contigs ([Samtools](http://www.htslib.org/) + [aln_to_tax_affi.py](bin/aln_to_tax_affi.py))
   * counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_contig_quantif_perlineage.py](bin/merge_contig_quantif_perlineage.py) + [quantification_by_contig_lineage.py](bin/quantification_by_contig_lineage.py))
DARBOT Vincent's avatar
DARBOT Vincent committed
* `S08_BINNING` 
![](docs/source/images/08_binning.png)
   * aligns reads samples against assemblies (according to the strategy used) ([BWA-MEM2](https://github.com/bwa-mem2/bwa-mem2) or [Minimap2](https://github.com/lh3/minimap2))
   * performs metagenome binning ([METABAT2](https://bitbucket.org/berkeleylab/metabat/src/master/) + [MAXBIN2](https://sourceforge.net/projects/maxbin/) + [CONCOCT](https://github.com/BinPro/CONCOCT))
Claire Hoede's avatar
Claire Hoede committed
    * refines bin sets ([BINETTE](https://github.com/genotoul-bioinfo/Binette)). Circular contigs are used as bin set if you have some (in case of HiFi reads)
   * dereplicates bins between samples ([DREP](https://github.com/MrOlm/drep))
   * taxonomically affiliates the bins ([GTDBTK](https://github.com/Ecogenomics/GTDBTk))
   * calculates bins abundances between samples ([BWA-MEM2](https://github.com/bwa-mem2/bwa-mem2) or [Minimap2](https://github.com/lh3/minimap2) + [SAMTOOLS](http://www.htslib.org/))

All steps are launched one after another by default. Use `--stop_at_[STEP]` and `--skip_[STEP]` parameters to tweak execution to your will.
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
A report html file is generated at the end of the workflow with [MultiQC](https://multiqc.info/).
The pipeline is built using [Nextflow](https://www.nextflow.io/docs/latest/index.html#), a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
VIENNE MAINA's avatar
VIENNE MAINA committed
Two [Singularity](https://sylabs.io/docs/) containers are available making installation trivial and results highly reproducible.
Joanna Fourquet's avatar
Joanna Fourquet committed
## Documentation
Joanna Fourquet's avatar
Joanna Fourquet committed

The metagWGS documentation can be found in the following pages:

   * [Installation](/docs/source/installation.md)
Claire Hoede's avatar
Claire Hoede committed
      * The pipeline installation procedure. You can also see this documentation [here](https://genotoul-bioinfo.pages.mia.inra.fr/metagwgs/master/installation.html).
   * [Usage](/docs/source/usage.md)
Claire Hoede's avatar
Claire Hoede committed
      * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. You can also see this documentation [here](https://genotoul-bioinfo.pages.mia.inra.fr/metagwgs/master/usage.html).
   * [Output](/docs/source/output.md)
Claire Hoede's avatar
Claire Hoede committed
      * An overview of the different output files and directories produced by the pipeline. You can also see this documentation [here](https://genotoul-bioinfo.pages.mia.inra.fr/metagwgs/master/output.html).
   * [Use case](/docs/source/use_case.md) (WARNING: not up-to-date, needs to be updated)
      * A tutorial to learn how to launch the pipeline on a test dataset on [genobioinfo cluster](http://bioinfo.genotoul.fr/).
   * [Functional tests](/docs/source/functionnal_tests.md)
      * (for developers) A tool to launch a new version of the pipeline on curated input data and compare its results with known output.
A comprehensive documentation of Metagwgs is available here: https://genotoul-bioinfo.pages.mia.inra.fr/metagwgs/master .
## Contact us

If you have any questions or suggestions for improvement, please contact us to claire.hoede[@]inrae.fr.

## Cite us

Claire Hoede's avatar
Claire Hoede committed
For the moment if you use metagWGS for your research, please cite : 
Joanna Fourquet, Jean Mainguy, Maïna Vienne, Céline Noirot, Pierre Martin, et al.. metagWGS: a workflow to analyse short and long HiFi metagenomic reads Taxonomic profile HiFi vs Short reads assembly. JOBIM 2022, Jul 2022, Rennes, France. ⟨10.15454/1.5572369328961167E12⟩. ⟨hal-03771202⟩