Introduction

Installation

With pip

ngs_toolkit is available for Python 3 only.

To install, simply do:

pip install ngs-toolkit

you might need to add a --user flag if not root or running in a virtual environment.

This will install all the Python dependencies needed too. See here a list of all Python dependencies used.

If you wish to install optional libraries that interface with R libraries, you can pass [rstats] to the following pip call:

pip install ngs-toolkit[rstats]

Non-Python optional requirements

ngs_toolkit makes use of some non-Python dependencies.

The following are required only for some data or analysis types:

  • bedtools: required for some ATAC/ChIP-seq functions. It is underlying the Python interface library to bedtools (pybedtools) which can be installed without bedtools.
  • R and some bioconductor libraries (optional):
    • cqn: used for GC-content aware normalization of NGS data.
    • DESeq2: used for differential testing of genes/regulatory elements.
  • Kent tools (optional): the ‘2bitToFa’ binary from UCSC’s Kent bioinformatics toolkit is used to convert between the 2bit and FASTA formats.

Note

bedtools version should be below 2.24.0 (2.20.1 is used for testing)

Using a conda environment

Get the latest Python 3 installation of miniconda from the conda website and follow the instructions for installation and activation of the environment.

Setup the bioconda channel:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Install the dependencies:

conda install -y bedtools==2.20.1
conda install -y ucsc-twobittofa
conda install -y bioconductor-deseq2
conda install -y bioconductor-cqn

And then install the ngs-toolkit library with pip (available only through PyPi).

pip install ngs-toolkit

API usage

To use a particular class or function from the toolkit, import it like this from within Python/iPython:

from ngs_toolkit import ATACSeqAnalysis
from ngs_toolkit.utils import log_pvalues