Skip to content
Snippets Groups Projects
README.md 6.8 KiB
Newer Older
Joanna Fourquet's avatar
Joanna Fourquet committed
# metagWGS
Celine Noirot's avatar
Celine Noirot committed

Joanna Fourquet's avatar
Joanna Fourquet committed
## Introduction
Joanna Fourquet's avatar
Joanna Fourquet committed
**metagWGS** is a [Nextflow](https://www.nextflow.io/docs/latest/index.html#) bioinformatics analysis pipeline used for **metag**enomic **W**hole **G**enome **S**hotgun sequencing data (Illumina HiSeq3000 or NovaSeq, paired, 2\*150bp).

Joanna Fourquet's avatar
Joanna Fourquet committed
### Pipeline graphical representation
Joanna Fourquet's avatar
Joanna Fourquet committed
The workflow processes raw data from `.fastq` or `.fastq.gz` inputs and do the modules represented into this figure:
MARTIN Pierre's avatar
MARTIN Pierre committed
![](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/raw/master/docs/Pipeline.png)
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
### metagWGS steps
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
metagWGS is splitted into different steps that correspond to different parts of the bioinformatics analysis:
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
* `01_clean_qc` (can ke skipped)
Joanna Fourquet's avatar
Joanna Fourquet committed
   * trims adapters sequences and deletes low quality reads ([Cutadapt](https://cutadapt.readthedocs.io/en/stable/#), [Sickle](https://github.com/najoshi/sickle))
   * suppresses host contaminants ([BWA](http://bio-bwa.sourceforge.net/) + [Samtools](http://www.htslib.org/) + [Bedtools](https://bedtools.readthedocs.io/en/latest/))
   * controls the quality of raw and cleaned data ([FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
MARTIN Pierre's avatar
MARTIN Pierre committed
   * makes a taxonomic classification of cleaned reads ([Kaiju MEM](https://github.com/bioinformatics-centre/kaiju) + [kronaTools](https://github.com/marbl/Krona/wiki/KronaTools) + [Generate_barplot_kaiju.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/Generate_barplot_kaiju.py) + [merge_kaiju_results.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/merge_kaiju_results.py))
Joanna Fourquet's avatar
Joanna Fourquet committed
* `02_assembly`
Joanna Fourquet's avatar
Joanna Fourquet committed
   * assembles cleaned reads (combined with `01_clean_qc` step) or raw reads (combined with `--skip_01_clean_qc` parameter) ([metaSPAdes](https://github.com/ablab/spades) or [Megahit](https://github.com/voutcn/megahit))
   * assesses the quality of assembly ([metaQUAST](http://quast.sourceforge.net/metaquast))
   * deduplicates cleaned reads (combined with `01_clean_qc` step) or raw reads (combined with `--skip_01_clean_qc` parameter) ([BWA](http://bio-bwa.sourceforge.net/) + [Samtools](http://www.htslib.org/) + [Bedtools](https://bedtools.readthedocs.io/en/latest/))
Joanna Fourquet's avatar
Joanna Fourquet committed
* `03_filtering` (can be skipped)
MARTIN Pierre's avatar
MARTIN Pierre committed
   * filters contigs with low CPM value ([Filter_contig_per_cpm.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/Filter_contig_per_cpm.py) + [metaQUAST](http://quast.sourceforge.net/metaquast))
Joanna Fourquet's avatar
Joanna Fourquet committed
* `04_structural_annot`
MARTIN Pierre's avatar
MARTIN Pierre committed
   * makes a structural annotation of genes ([Prokka](https://github.com/tseemann/prokka) + [Rename_contigs_and_genes.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/Rename_contigs_and_genes.py))
Joanna Fourquet's avatar
Joanna Fourquet committed
* `05_alignment`
Joanna Fourquet's avatar
Joanna Fourquet committed
   * aligns reads to the contigs ([BWA](http://bio-bwa.sourceforge.net/) + [Samtools](http://www.htslib.org/))
   * aligns the protein sequence of genes against a protein database ([DIAMOND](https://github.com/bbuchfink/diamond))
Joanna Fourquet's avatar
Joanna Fourquet committed
* `06_func_annot`
MARTIN Pierre's avatar
MARTIN Pierre committed
   * makes a sample and global clustering of genes ([cd-hit-est](http://weizhongli-lab.org/cd-hit/) + [cd_hit_produce_table_clstr.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/cd_hit_produce_table_clstr.py))
   * quantifies reads that align with the genes ([featureCounts](http://subread.sourceforge.net/) + [Quantification_clusters.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/Quantification_clusters.py))
   * makes a functional annotation of genes and a quantification of reads by function ([eggNOG-mapper](http://eggnog-mapper.embl.de/) + [best_bitscore_diamond.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/best_bitscore_diamond.py) + [merge_abundance_and_functional_annotations.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/merge_abundance_and_functional_annotations.py) + [quantification_by_functional_annotation.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/quantification_by_functional_annotation.py))
Joanna Fourquet's avatar
Joanna Fourquet committed
* `07_taxo_affi`
MARTIN Pierre's avatar
MARTIN Pierre committed
   * taxonomically affiliates the genes ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/aln2taxaffi.py))
   * taxonomically affiliates the contigs ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/aln2taxaffi.py))
   * counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_contig_quantif_perlineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/merge_contig_quantif_perlineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/quantification_by_contig_lineage.py))
Joanna Fourquet's avatar
Joanna Fourquet committed
* `08_binning` from [nf-core/mag 1.0.0](https://github.com/nf-core/mag/releases/tag/1.0.0)
Joanna Fourquet's avatar
Joanna Fourquet committed
   * makes binning of contigs ([MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/))
MARTIN Pierre's avatar
MARTIN Pierre committed
   * assesses bins ([BUSCO](https://busco.ezlab.org/) + [metaQUAST](http://quast.sourceforge.net/metaquast) + [summary_busco.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/summary_busco.py) and [combine_tables.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/master/bin/combine_tables.py) from [nf-core/mag](https://github.com/nf-core/mag))
Joanna Fourquet's avatar
Joanna Fourquet committed
   * taxonomically affiliates the bins ([BAT](https://github.com/dutilh/CAT))
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
A report html file is generated at the end of the workflow with [MultiQC](https://multiqc.info/).
Joanna Fourquet's avatar
Joanna Fourquet committed
The pipeline is built using [Nextflow,](https://www.nextflow.io/docs/latest/index.html#) a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
MARTIN Pierre's avatar
MARTIN Pierre committed
Three [Singularity](https://sylabs.io/docs/) containers are available making installation trivial and results highly reproducible.
Joanna Fourquet's avatar
Joanna Fourquet committed
## Documentation
Joanna Fourquet's avatar
Joanna Fourquet committed

MARTIN Pierre's avatar
MARTIN Pierre committed
metagWGS documentation is available [here](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/tree/master/docs).
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
## License
metagWGS is distributed under the GNU General Public License v3.

## Copyright
2021 INRAE

Claire Hoede's avatar
Claire Hoede committed
## Funded by
Anti-Selfish (Labex ECOFECT – N° 00002455-CT15000562)
France Génomique National Infrastructure (funded as part of Investissement d’avenir program managed by Agence Nationale de la Recherche, contract ANR-10-INBS-09)
Claire Hoede's avatar
Claire Hoede committed
With participation of SeqOccIn members financed by FEDER-FSE MIDI-PYRENEES ET GARONNE 2014-2020.
Claire Hoede's avatar
Claire Hoede committed

Joanna Fourquet's avatar
Joanna Fourquet committed
## Citation
metagWGS has been presented at JOBIM 2020:

Poster "Whole metagenome analysis with metagWGS", J. Fourquet, C. Noirot, C. Klopp, P. Pinton, S. Combes, C. Hoede, G. Pascal.

https://www.sfbi.fr/sites/sfbi.fr/files/jobim/jobim2020/posters/compressed/jobim2020_poster_9.pdf

metagWGS has been presented at JOBIM 2019 and at Genotoul Biostat Bioinfo day:

Poster "Whole metagenome analysis with metagWGS", J. Fourquet, A. Chaubet, H. Chiapello, C. Gaspin, M. Haenni, C. Klopp, A. Lupo, J. Mainguy, C. Noirot, T. Rochegue, M. Zytnicki, T. Ferry, C. Hoede.