Recipes¶
ngs_toolkit
provides scripts to perform routine tasks on NGS data -
they are called recipes
.
Recipes are distributed with ngs_toolkit
and can be seen in the
github repository.
To make it convenient to run the scripts on data from a project,
recipes can also be run with the command
projectmanager recipe <recipe_name> <project_config.yaml>
.
ngs_toolkit.recipes.ngs_analysis¶
Perform full end-to-end analysis of ATAC-seq, ChIP-seq or RNA-seq data.
Produces quantification matrices, normalizes them, performes unsupervised and supervised analysis as well as enrichment analyisis of differential features, all accompaigned with powerful visualizations.
Supervised analysis will only be performed if PEP configuration file contains a comparison table field.
In addition, this recipe uses variables provided in the project configuration
file project_name
, sample_attributes
and group_attributes
.
usage: python -m ngs_toolkit.recipes.ngs_analysis [-h] [-n NAME]
[-o RESULTS_DIR]
[-t {ATAC-seq,RNA-seq,ChIP-seq}]
[-q] [-a ALPHA]
[-f ABS_FOLD_CHANGE]
config_file
Positional Arguments¶
- config_file
YAML project configuration file.
Named Arguments¶
- -n, --analysis-name
Name of analysis. Will be the prefix of output_files. By default it will be the name of the Project given in the YAML configuration.
- -o, --results-output
Directory for analysis output files. Default is ‘results’ under the project roort directory.
Default: “results”
- -t, --data-type
Possible choices: ATAC-seq, RNA-seq, ChIP-seq
Data type to restrict analysis to. Default is to run separate analysis for each data type.
- -q, --pass-qc
Whether only samples with a ‘pass_qc’ value of ‘1’ in the annotation sheet should be used.
Default: False
- -a, --alpha
Alpha value of confidence for supervised analysis.
Default: 0.05
- -f, --fold-change
Absolute log2 fold change value for supervised analysis.
Default: 0
ngs_toolkit.recipes.call_peaks¶
Call peaks for ChIP-seq samples given a comparison table mapping foreground-background relationships between samples.
usage: python -m ngs_toolkit.recipes.call_peaks [-h] [-c COMPARISON_TABLE]
[-t] [-qc] [-j]
[-o RESULTS_DIR]
config_file
Positional Arguments¶
- config_file
YAML project configuration file.
Named Arguments¶
- -c, --comparison-table
Comparison table to use for peak calling. If not provided will use a filenamed comparison_table.csv in the same directory of the given YAML Project configuration file.
- -t, --only-toggle
Whether only comparisons with ‘toggle’ value of ‘1’ or ‘True’ should be performed.
Default: False
- -qc, --pass-qc
Whether only samples with a ‘pass_qc’ attribute should be included. Default is
False
.Default: False
- -j, --as-jobs
Whether jobs should be created for each sample, or it should run in serial mode.
Default: False
- -o, --results-output
Directory for analysis output files. Default is ‘results’ under the project root directory.
Default: “results”
ngs_toolkit.recipes.coverage¶
A helper script to calculate the read coverage of a BAM file in regions from a BED file. Ensures the same order and number of lines as input BED file.
usage: python -m ngs_toolkit.recipes.coverage [-h] [--no-overwrite]
bed_file bam_file output_bed
Positional Arguments¶
- bed_file
Input BED file with regions to quantify.
- bam_file
Input BAM file with reads.
- output_bed
Output BED file with counts for each region.
Named Arguments¶
- --no-overwrite
Whether results should not be overwritten if existing.
Default: True
ngs_toolkit.recipes.deseq2¶
Perform differential expression using DESeq2 by comparing sample groups using a formula.
usage: python -m ngs_toolkit.recipes.deseq2 [-h]
[--output_prefix OUTPUT_PREFIX]
[--formula FORMULA]
[--alpha ALPHA] [-d] [--overwrite]
[--no-save-inputs]
work_dir
Positional Arguments¶
- work_dir
Working directory. Should contain required files for DESeq2.
Named Arguments¶
- --output_prefix
Prefix for output files.
Default: “differential_analysis”
- --formula
R-style formula for differential expression. Default = ‘~ sample_group’.
Default: “~ sample_group”
- --alpha
Significance level to call differential expression. All results will be output anyway.
Default: 0.05
- -d, --dry-run
Don’t actually do anything.
Default: False
- --overwrite
Don’t overwrite any existing directory or file.
Default: False
- --no-save-inputs
Don’t write inputs to disk.
Default: True
ngs_toolkit.recipes.enrichr¶
A helper script to run enrichment analysis using the Enrichr API on a gene set.
usage: python -m ngs_toolkit.recipes.enrichr [-h] [-a MAX_ATTEMPTS]
[--no-overwrite]
input_file output_file
Positional Arguments¶
- input_file
Input file with gene names.
- output_file
Output CSV file with results.
Named Arguments¶
- -a, --max-attempts
Maximum attempts to retry the API before giving up.
Default: 5
- --no-overwrite
Whether results should not be overwritten if existing.
Default: True
ngs_toolkit.recipes.generate_project¶
A helper script to generate synthetic data for a project in PEP format.
usage: python -m ngs_toolkit.recipes.generate_project [-h]
[--output-dir OUTPUT_DIR]
[--project-name PROJECT_NAME]
[--organism ORGANISM]
[--genome-assembly GENOME_ASSEMBLY]
[--data-type DATA_TYPE]
[--n-factors N_FACTORS]
[--only-metadata ONLY_METADATA]
[--sample-input-files SAMPLE_INPUT_FILES]
[--debug]
Named Arguments¶
- --output-dir
- --project-name
Default: “test_project”
- --organism
Default: “human”
- --genome-assembly
Default: “hg38”
- --data-type
Default: “ATAC-seq”
- --n-factors
Default: 1
- --only-metadata
Default: False
- --sample-input-files
Default: False
- --debug
Default: False
ngs_toolkit.recipes.lola¶
A helper script to run Location Overlap Analysis (LOLA) of a single region set in various sets of region-based annotations.
usage: python -m ngs_toolkit.recipes.lola [-h] [--no-overwrite] [-c CPUS]
bed_file universe_file output_folder
genome
Positional Arguments¶
- bed_file
BED file with query set regions.
- universe_file
BED file with universe where the query set came from.
- output_folder
Output directory for produced files.
- genome
Genome assembly of the region set.
Named Arguments¶
- --no-overwrite
Don’t overwrite existing output files.
Default: True
- -c, --cpus
Number of CPUS/threads to use for analysis.
ngs_toolkit.recipes.merge_signal¶
Merge signal from various ATAC-seq or ChIP-seq samples given a set of attributes to group samples by.
It produces merged BAM and bigWig files for all signal in the samples but is also capable of producing this for nucleosomal/nucleosomal free signal based on fragment length distribution if data is paired-end sequenced. This signal may optionally be normalized for each group. It is also capable of parallelizing work in jobs.
usage: python -m ngs_toolkit.recipes.merge_signal [-h] [-a ATTRIBUTES] [-q]
[-j] [--cpus CPUS]
[--normalize] [--nucleosome]
[--overwrite]
[-o OUTPUT_DIR] [-d]
config_file
Positional Arguments¶
- config_file
YAML project configuration file.
Named Arguments¶
- -a, --attributes
Attributes to merge samples by. A comma-delimited string with no spaces. By default will use values in the project config group_attributes.
- -q, --pass-qc
Whether only samples with a ‘pass_qc’ value of ‘1’ in the annotation sheet should be used.
Default: False
- -j, --as-jobs
Whether jobs should be created for each sample, or it should run in serial mode.
Default: False
- --cpus
CPUs/Threads to use per job if –as-jobs is on.
Default: 8
- --normalize
Whether tracks should be normalized to total sequenced depth.
Default: False
- --nucleosome
Whether to produce nucleosome/nucleosome-free signal files.
Default: False
- --overwrite
Whether to overwrite existing files.
Default: False
- -o, --output-dir
Directory for output files. Default is ‘merged’ under the project root directory.
Default: “merged”
- -d, --dry-run
Whether to do everything except running commands.
Default: False
ngs_toolkit.recipes.region_enrichment¶
A helper script to run enrichment analysis of a single region set in region-based set of annotations.
usage: python -m ngs_toolkit.recipes.region_enrichment [-h]
[--output-file OUTPUT_FILE]
[--overwrite]
bed_file pep
Positional Arguments¶
- bed_file
BED file with regions.
- pep
The analysis’ PEP config file.
Named Arguments¶
- --output-file
Output file.
Default: “region_type_enrichment.csv”
- --overwrite
Don’t overwrite any existing directory or file.
Default: False
ngs_toolkit.recipes.region_set_frip¶
Compute fraction of reads in peaks (FRiP) based on a consensus set of regions derived from several samples.
usage: python -m ngs_toolkit.recipes.region_set_frip [-h] [-d DATA_TYPE]
[-n NAME] [-r REGION_SET]
[-q] [-j] [-o OUTPUT_DIR]
[-s]
config_file
Positional Arguments¶
- config_file
YAML project configuration file.
Named Arguments¶
- -d, --data-type
Data types to perform analysis on. Will be done separately for each.
- -n, --analysis-name
Name of analysis. Will be the prefix of output_files. By default it will be the name of the Project given in the YAML configuration.
- -r, --region-set
BED file with region set derived from several samples or Oracle region set. If unset, will try to get the sites attribute of an existing analysis object if existing, otherwise will create a region set from the peaks of all samples.
- -q, --pass-qc
Whether only samples with a ‘pass_qc’ value of ‘1’ in the annotation sheet should be used.
Default: False
- -j, --as-jobs
Whether jobs should be created for each sample, or it should run in serial mode.
Default: False
- -o, --results-output
Directory for analysis output files. Default is ‘results’ under the project roort directory.
Default: “results”
- -s, --strict
Whether to throw an error in case files cannot be created or not.
Default: False