Overview

alternate text

MetaPathways [CIT2002] is a meta’omic analysis pipeline for the annotation and analysis for environmental sequence information. MetaPathways include metagenomic or metatranscriptomic sequence data in one of several file formats (.fasta, .gff, or .gbk). The pipeline consists of five operational stages including

Pipeline Overview

MetaPathways is composed of five general stages, encompassing a number of analytical or data handling steps (Figure 1):

  1. QC and ORF Prediction: Here MetaPathways performs basic quality control (QC) including removing duplicate sequences and sequence trimming. Open Reading Frame (ORF) prediction is then performed on the QC’ed sequences using Prodigal [PRODIGAL2010] or GeneMark [GeneMark12]. The final translated ORFs are now also trimmed according to a user-defined setting.
    • MetaPathways steps: PREPROCESS INPUT, ORF PREDICTION, and FILTER AMINOS
  2. Functional and Taxonomic Annotation: Using seed-and-extend homology search algorithms (B)LAST [BLAST90], [LAST11], MetaPathways can be used to conduct searches against functional and taxonomic databases.
    • MetaPathways steps: FUNC SEARCH, PARSE FUNC SEARCH, SCAN rRNA, and ANNOTATE ORFS
  3. Analyses: After sequence annotation, MetaPathways performs further taxonomic analyses including the Lowest Common Ancestor (LCA) algorithm [MEGAN07] and tRNA Scan [TRNASCAN97], and prepares detected annotations for environmental Pathway/Genome database (ePGDB) creation via Pathway Tools.
    • MetaPathways Steps: PATHOLOGIC INPUT, CREATE ANNOT REPORTS, and COMPUTE RPKM.
  4. ePGDB Creation: MetaPathways then predicts MetaCyc pathways using the Pathway Tools software and its pathway prediction algorithm PathoLogic [KARP11], resulting in the creation of an environmental Pathway/Genome Database (ePGDB), an integrative data structure of sequences, genes, pathways, and literature annotations for integrative interpretation.
    • MetaPathways Steps: BUILD ePGDB
  5. Pathway Export: Here MetaCyc pathways or reactions are exported in a tabular format for downstream analysis. As of the v2.5 release, MetaPathways will perform this step automatically.
    • MetaPathways Steps: BUILD ePGDB
http://i.imgur.com/HOacG2l.png

Output Format

Visualizing Output

[CIT2002]K. M. Konwar, N. W. Hanson, A. P. Pagé, S. J. Hallam, MetaPathways: a modular pipeline for constructing pathway/genome databases from environmental sequence information. BMC Bioinformatics 14, 202 (2013) http://www.biomedcentral.com/1471-2105/14/202
[PRODIGAL2010]D. Hyatt et al., Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
[GeneMark12]
  1. Hyatt, P. F. LoCascio, L. J. Hauser, E. C. Uberbacher, Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28, 2223–2230 (2012).
[BLAST90]
    1. Altschul, W. Gish, W. Miller, E. W. Myers, D. J. Lipman, Basic local alignment search tool. J Mol Biol 215, 403–410 (1990).
[LAST11]
    1. Kiełbasa, R. Wan, K. Sato, P. Horton, M. C. Frith, Adaptive seeds tame genomic sequence comparison. Genome Res 21, 487–493 (2011).
[MEGAN07]
    1. Huson, A. F. Auch, J. Qi, S. C. Schuster, MEGAN analysis of metagenomic data. Genome Res 17, 377–386 (2007).
[TRNASCAN97]
    1. Lowe, S. R. Eddy, tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research 25, 0955–0964 (1997).
[KARP11]
    1. Karp, M. Latendresse, R. Caspi, The pathway tools pathway prediction algorithm. Stand Genomic Sci 5, 424–429 (2011).