Recipes¶
ngs_toolkit
provides scripts to perform routine tasks on NGS data -
they are called recipes
.
Recipes are distributed with ngs_toolkit
and can be seen in the
github repository.
To make it convenient to run the scripts on data from a project,
recipes can also be run with the command
projectmanager recipe <recipe_name> <project_config.yaml>
.
ngs_toolkit.recipes.ngs_analysis¶
Perform full end-to-end analysis of ATAC-seq, ChIP-seq or RNA-seq data.
Produces quantification matrices, normalizes them, performes unsupervised and supervised analysis as well as enrichment analyisis of differential features, all accompaigned with powerful visualizations.
Supervised analysis will only be performed if PEP configuration file contains a comparison table field.
In addition, this recipe uses variables provided in the project configuration
file project_name
, sample_attributes
and group_attributes
.
usage: python -m ngs_toolkit.recipes.ngs_analysis [-h] [-n NAME]
[-o RESULTS_DIR]
[-t {ATAC-seq,RNA-seq,ChIP-seq}]
[-q] [-a ALPHA]
[-f ABS_FOLD_CHANGE]
config_file
Positional Arguments¶
- config_file
YAML project configuration file.
Named Arguments¶
- -n, --analysis-name
Name of analysis. Will be the prefix of output_files. By default it will be the name of the Project given in the YAML configuration.
- -o, --results-output
Directory for analysis output files. Default is ‘results’ under the project roort directory.
Default: “results”
- -t, --data-type
Possible choices: ATAC-seq, RNA-seq, ChIP-seq
Data type to restrict analysis to. Default is to run separate analysis for each data type.
- -q, --pass-qc
Whether only samples with a ‘pass_qc’ value of ‘1’ in the annotation sheet should be used.
Default: False
- -a, --alpha
Alpha value of confidence for supervised analysis.
Default: 0.05
- -f, --fold-change
Absolute log2 fold change value for supervised analysis.
Default: 0
ngs_toolkit.recipes.call_peaks¶
Call peaks for ChIP-seq samples given a comparison table mapping foreground-background relationships between samples.
usage: python -m ngs_toolkit.recipes.call_peaks [-h] [-c COMPARISON_TABLE]
[-t] [-qc] [-j]
[-o RESULTS_DIR]
config_file
Positional Arguments¶
- config_file
YAML project configuration file.
Named Arguments¶
- -c, --comparison-table
Comparison table to use for peak calling. If not provided will use a filenamed comparison_table.csv in the same directory of the given YAML Project configuration file.
- -t, --only-toggle
Whether only comparisons with ‘toggle’ value of ‘1’ or ‘True’ should be performed.
Default: False
- -qc, --pass-qc
Whether only samples with a ‘pass_qc’ attribute should be included. Default is
False
.Default: False
- -j, --as-jobs
Whether jobs should be created for each sample, or it should run in serial mode.
Default: False
- -o, --results-output
Directory for analysis output files. Default is ‘results’ under the project root directory.
Default: “results”
ngs_toolkit.recipes.coverage¶
A helper script to calculate the read coverage of a BAM file in regions from a BED file. Ensures the same order and number of lines as input BED file.
Software requirements:
None
usage: python -m ngs_toolkit.recipes.coverage [-h] [--no-overwrite]
bed_file bam_file output_bed
Positional Arguments¶
- bed_file
Input BED file with regions to quantify.
- bam_file
Input BAM file with reads.
- output_bed
Output BED file with counts for each region.
Named Arguments¶
- --no-overwrite
Whether results should not be overwritten if existing.
Default: True
ngs_toolkit.recipes.deseq2¶
Perform differential expression using DESeq2 by comparing sample groups using a formula.
Software requirements:
DESeq2
usage: python -m ngs_toolkit.recipes.deseq2 [-h]
[--output-prefix OUTPUT_PREFIX]
[--formula FORMULA]
[--alpha ALPHA] [-d] [--overwrite]
[--no-save-inputs]
work_dir
Positional Arguments¶
- work_dir
Working directory. Should contain required files for DESeq2.
Named Arguments¶
- --output-prefix
Prefix for output files.
Default: “differential_analysis”
- --formula
R-style formula for differential expression. Defaults to ‘~ sample_group’.
Default: “~ sample_group”
- --alpha
Significance level to call differential expression. All results will be output anyway.
Default: 0.05
- -d, --dry-run
Don’t actually do anything.
Default: False
- --overwrite
Don’t overwrite any existing directory or file.
Default: False
- --no-save-inputs
Don’t write inputs to disk.
Default: True
ngs_toolkit.recipes.enrichr¶
A helper script to run enrichment analysis using the Enrichr API on a gene set.
Software requirements: None
usage: python -m ngs_toolkit.recipes.enrichr [-h] [-a MAX_ATTEMPTS]
[--no-overwrite]
input_file output_file
Positional Arguments¶
- input_file
Input file with a gene name per row and no header.
- output_file
Output CSV file with results.
Named Arguments¶
- -a, --max-attempts
Maximum attempts to retry the API before giving up.
Default: 5
- --no-overwrite
Whether results should not be overwritten if existing.
Default: True
ngs_toolkit.recipes.generate_project¶
A helper script to generate synthetic data for a project in PEP format.
usage: python -m ngs_toolkit.recipes.generate_project [-h]
[--output-dir OUTPUT_DIR]
[--project-name PROJECT_NAME]
[--organism ORGANISM]
[--genome-assembly GENOME_ASSEMBLY]
[--data-type DATA_TYPE]
[--n-factors N_FACTORS]
[--only-metadata ONLY_METADATA]
[--sample-input-files SAMPLE_INPUT_FILES]
[--debug]
Named Arguments¶
- --output-dir
- --project-name
Default: “test_project”
- --organism
Default: “human”
- --genome-assembly
Default: “hg38”
- --data-type
Default: “ATAC-seq”
- --n-factors
Default: 1
- --only-metadata
Default: False
- --sample-input-files
Default: False
- --debug
Default: False
ngs_toolkit.recipes.lola¶
A helper script to run Location Overlap Analysis (LOLA) of a single region set in various sets of region-based annotations.
Software requirements:
LOLA
usage: python -m ngs_toolkit.recipes.lola [-h] [--no-overwrite] [-c CPUS]
bed_file universe_file output_folder
genome
Positional Arguments¶
- bed_file
BED file with query set regions.
- universe_file
BED file with universe where the query set came from.
- output_folder
Output directory for produced files.
- genome
Genome assembly of the region set.
Named Arguments¶
- --no-overwrite
Don’t overwrite existing output files.
Default: True
- -c, --cpus
Number of CPUS/threads to use for analysis.
ngs_toolkit.recipes.merge_signal¶
Merge signal from various ATAC-seq or ChIP-seq samples given a set of attributes to group samples by.
It produces merged BAM and bigWig files for all signal in the samples but is also capable of producing this for nucleosomal/nucleosomal free signal based on fragment length distribution if data is paired-end sequenced. This signal may optionally be normalized for each group. It is also capable of parallelizing work in jobs.
Software requirements:
samtools
sambamba
deeptools
usage: python -m ngs_toolkit.recipes.merge_signal [-h] [-a ATTRIBUTES] [-q]
[-j] [--cpus CPUS]
[--normalization-method NORMALIZATION_METHOD]
[--nucleosome] [--overwrite]
[-o OUTPUT_DIR] [-d]
config_file
Positional Arguments¶
- config_file
YAML project configuration file.
Named Arguments¶
- -a, --attributes
Attributes to merge samples by. A comma-delimited string with no spaces. By default will use values in the project config group_attributes.
- -q, --pass-qc
Whether only samples with a ‘pass_qc’ value of ‘1’ in the annotation sheet should be used.
Default: False
- -j, --as-jobs
Whether jobs should be created for each sample, or it should run in serial mode.
Default: False
- --cpus
CPUs/Threads to use per job if –as-jobs is on.
Default: 8
- --normalization-method
Method to normalize tracks regarding sequenced depth. One of the methods in https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html#Read%20coverage%20normalization%20options
Default: “RPGC”
- --nucleosome
Whether to produce nucleosome/nucleosome-free signal files.
Default: False
- --overwrite
Whether to overwrite existing files.
Default: False
- -o, --output-dir
Directory for output files. Default is ‘merged’ under the project root directory.
Default: “merged”
- -d, --dry-run
Whether to do everything except running commands.
Default: False
ngs_toolkit.recipes.region_enrichment¶
A helper script to run enrichment analysis of a single region set in region-based set of annotations.
usage: python -m ngs_toolkit.recipes.region_enrichment [-h]
[--output-file OUTPUT_FILE]
[--overwrite]
bed_file pep
Positional Arguments¶
- bed_file
BED file with regions.
- pep
The analysis’ PEP config file.
Named Arguments¶
- --output-file
Output file.
Default: “region_type_enrichment.csv”
- --overwrite
Don’t overwrite any existing directory or file.
Default: False
ngs_toolkit.recipes.region_set_frip¶
Compute fraction of reads in peaks (FRiP) based on a consensus set of regions derived from several samples.
A consensus region set can be passed, otherwise it will either try to use an existing one for that analysis or produce one on the fly.
Software requirements:
awk
samtools
usage: python -m ngs_toolkit.recipes.region_set_frip [-h] [-r REGION_SET] [-q]
[--computing-configuration COMPUTING_CONFIGURATION]
[--permissive]
config_file
Positional Arguments¶
- config_file
YAML project configuration file.
Named Arguments¶
- -r, --region-set
BED file with region set derived from several samples or Oracle region set. If unset, will try to get the sites attribute of an existing analysis object if existing, otherwise will create a region set from the peaks of all samples.
- -q, --pass-qc
Whether only samples with a ‘pass_qc’ value of ‘1’ in the annotation sheet should be used.
Default: False
- --computing-configuration
Which divvy computing configuration to use for distributed jobs. Type divvy list to see all options. Defaults to the value in the ngs_toolkit configuration.
- --permissive
If creating regions set, allow sample files to be missing and use what is present.
Default: False