Installation¶

Setup a virtual environment¶

Create a python virtual environment python Virtual enviroments venv (for Python 3) allow you to manage separate package installations for different projects. They essentially allow you to create a “virtual” isolated Python installation and install packages into that virtual installation. When you switch projects, you can simply create a new virtual environment and not have to worry about breaking the packages installed in the other environments. It is always recommended to use a virtual environment while trying out new Python applications.

The following command creates a new virtual environment with a name mynewenv with Python 3

$ virtualenv -p /usr/bin/python3 mynewenv

Activate the new virtual environment by running

$ source mynewenv/bin/activate

Deactivate If you want to switch projects or otherwise leave your virtual environment, simply run:

$ deactivate

pip install MetaPathways¶

Install MetaPathways by running:

$ pip3 install metapathways

To make sure MetaPathways is installed type

$ MetaPathways --version

which, if MetaPathways, is properly installed, will print a version number. For example

MetaPathways: Version 3.5.0

Install Binaries¶

Next we install trnascan-1.4, rpkm, prodigal, FAST and bwa

Download the source code as

$ wget https://github.com/kishori82/MetaPathways_Python.3.0/raw/kmk-develop/c_cpp_sources.1.0.tar.gz

untar the files, make and install, which takes a few minutes

$ tar -zxvf c_cpp_sources.1.0.tar.gz
$ cd c_cpp_sources
$ make`
$ sudo make install

NOTE: if you would like to unstall then type

$ sudo make uninstall

Install ncbi-blast+ locally. Visit the download page.

For Ubuntu/Debian

$ sudo apt-get install ncbi-blast+

Reference Sequences¶

Create the following reference folder structure under a folder. Here we use the example name MetaPathways_DBs

$ mkdir -p MetaPathways_DBs/taxonomic/formatted
$ mkdir -p MetaPathways_DBs/functional/formatted
$ mkdir -p MetaPathways_DBs/ncbi_tree
$ mkdir -p MetaPathways_DBs/functional_categories

MetaPathways_DBs/
├── functional
│   ├── formatted
├── functional_categories
├── ncbi_tree
└── taxonomic
    └── formatted

Download and unzip the NCBI taxonomy file to the MetaPathways_DBs/ncbi_tree folder

$ cd MetaPathways_DBs/ncbi_tree
$ wget https://github.com/kishori82/MetaPathways_Python.3.0/raw/kmk-develop/data/refdata/ncbi_taxonomy_tree.txt.gz
$ wget https://github.com/kishori82/MetaPathways_Python.3.0/raw/kmk-develop/data/refdata/ncbi.map.gz

Download and unzip functional classification files to MetaPathways_DBs/functional_hierarchy folder

$ cd MetaPathways_DBs/functional_hierarchy
$ wget https://github.com/kishori82/MetaPathways_Python.3.0/raw/kmk-develop/data/refdata/CAZY_hierarchy.txt.gz
$ wget https://github.com/kishori82/MetaPathways_Python.3.0/raw/kmk-develop/data/refdata/COG_categories.txt.gz
$ wget https://github.com/kishori82/MetaPathways_Python.3.0/raw/kmk-develop/data/refdata/KO_classification.txt.gz
$ wget https://github.com/kishori82/MetaPathways_Python.3.0/raw/kmk-develop/data/refdata/SEED_subsystems.txt.gz

and we should see the following structure

MetaPathways_DBs/
├── functional
│   ├── formatted
├── functional_categories
│   ├── CAZY_hierarchy.txt.gz
│   ├── COG_categories.txt.gz
│   ├── KO_classification.txt.gz
│   ├── SEED_subsystems.txt.gz
├── ncbi_tree
│   ├── ncbi_taxonomy_tree.txt.gz
│   ├── ncbi.map.gz
└── taxonomic
    └── formatted

Functional Reference¶

The functional references are protein reference sequences used for functional and taxonomic annotation. Any set of protein references in the FASTA format can be used, e.g., we show a few lines

>WP_096046812.1 hypothetical protein [Sulfurospirillum sp. JPD-1]
MSKKAFLFLILLVMSLQSLLVACGGSCLECHSKLRPYINDQNHAILNECITCHNQPSKNGQCGRDCFDCHSQEKVYAQKDVNAHQELKT
CGTCHKEKVDFTTPKQSIISNQQNLIHLFK
>WP_096046815.1 hypothetical protein [Sulfurospirillum sp. JPD-1]
MKKLLIILALISRLIAEDSSDLDEIKEEDIPKILSIIKDGTKEHLPMMLDDYTTLVDIVSVNNAIEYRNRINSANEHVKTILKADKGTLI
KTTFDNNKSYLCSDYETRSLLKKGAVFIYVFYDMNNAELFKFSIQEKDCQ
>WP_016244176.1 hypothetical protein [Escherichia coli]
MTDITDRHTLRRMSWSELFTAAQEAEFQRDYERARIVWSFALHVATTTINKNLSIAHIRRCDTLLHKSKTVPGNNTGGRSVCLRPQHPRR
...........

Formatting Reference Sequences¶

For the purpose of demonstration we walk you through the process of preparing a small set of protein reference sequences from the NCBI Refseq protein databases. Download the example protein reference sequence file refseq-mini.fasta.gz to the functional folder as follows

$ cd MetaPathways_DBs/functional
$ wget https://github.com/kishori82/MetaPathways_Python.3.0/raw/kmk-develop/data/refdata/refseq-mini.fasta.gz
$ gunzip refseq-mini.fasta.gz

rename to remove the fasta suffix

$ mv refseq-mini.fasta  refseq-mini
$ cat refseq-mini | grep ">" > formatted/refseq-mini-names.txt

FAST¶

BLAST¶

Format the database for blastp as follows:

$ cd MetaPathways_DBs/functional
$ makeblastdb -dbtype prot -in refseq-mini -out formatted/refseq-mini