Difference between revisions of "Bioinformatics Software"
From CsWiki
Line 71: | Line 71: | ||
|- | |- | ||
| [https://github.com/deeptools/deepTools Deeptools] || deeptools || A set of user-friendly tools for normalization and visualzation of deep-sequencing data | | [https://github.com/deeptools/deepTools Deeptools] || deeptools || A set of user-friendly tools for normalization and visualzation of deep-sequencing data | ||
+ | |- | ||
+ | | [https://www.dnanexus.com/ Dxpy] || dxpy|| Command-line client, tools for building and debugging apps, other utilities for working with DNA data on the DNAnexus platform | ||
|- | |- | ||
| [https://github.com/DReichLab/EIG Eigensoft] || eigensoft || The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes maximizing power to detect true associations. | | [https://github.com/DReichLab/EIG Eigensoft] || eigensoft || The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes maximizing power to detect true associations. |
Revision as of 15:17, 10 November 2020
This is the list of software that is installed on the cluster.
To use the software you need to use the module command with the name in the module column.
module load <name_of_module>
When you see None in module column it means the software is already in the path and no action needs to be taken.
Container means that the software is embedded in a Singularity container. The singularity module should be loaded first to work with it.
module load singularity
Here is a wiki on working with containers: https://wiki.cs.huji.ac.il/wiki/Containers
To see a list of all software that can be used by loading modules use this command
module avail
Name | Module | Description |
---|---|---|
Annovar | None | ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others) |
aws cli | None | AWS command line version 2 |
Bamtools | bio | BamTools provides both a programmer's API and an end user's toolkit for handling BAM files |
Bcl2fastq | bcl2fastq/2.20.0 | bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis |
BCFtools | samtools/1.9 samtools/1.10 | a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF |
bedgraphtobigwig | bedgraphtobw/377 | Convert a bedGraph file to bigWig format |
Bedops | bio | BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale. Tasks can be easily split by chromosome for distributing whole-genome analyses across a computational cluster. |
Bedtools | bio | A swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF |
Biscuit | bio | A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data |
Bismark | bio | A program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyze the methylation levels of their samples straight away |
Blast | blast | BLAST+ is a new suite of BLAST tools that utilizes the NCBI C++ Toolkit |
Boost & Boost-cpp | bio | Boost provides free peer-reviewed portable C++ source libraries |
Bsmap | bio | BSMAP is a short reads mapping software for bisulfite sequencing reads |
bsmapz | bsmapz | An optimized fork of BSMAP. BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leaves methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single-nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference |
Busco | tib | BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB |
BroadPeak | None | BroadPeak calling algorithm for diffuse ChIP-seq datasets |
BWA | elkind | The BWA read mapper |
Cellranger | cellranger | Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices, and perform clustering and gene expression analysis. Cell Ranger includes four pipelines relevant to single-cell gene expression experiments |
Chimerax | chimerax | UCSF ChimeraX (or simply ChimeraX) is the next-generation molecular visualization program |
Circos | bio | Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions |
CellPhoneDB | cellphonedb | CellPhoneDB is a publicly available repository of curated receptors, ligands, and its interactions |
CRISPRCasFinder | container | Enables the easy detection of CRISPRs and cas genes in user-submitted sequence data |
CTAT | container | The Trinity Cancer Transcriptome Analysis Toolkit (CTAT) aims to provide tools for leveraging RNA-Seq to gain insights into the biology of cancer transcriptomes |
Cutadapt | cutadapt/1.18 (Python 2) cutadapt/2.10 (Python 3) | Finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads |
Darts | darts | Deep-learning Augmented RNA-seq analysis of Transcript Splicing |
Deeptools | deeptools | A set of user-friendly tools for normalization and visualzation of deep-sequencing data |
Dxpy | dxpy | Command-line client, tools for building and debugging apps, other utilities for working with DNA data on the DNAnexus platform |
Eigensoft | eigensoft | The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes maximizing power to detect true associations. |
Fastqc | elkind | A quality control tool for high throughput sequence data |
Finestruc | bio | fine-structure is a fast and powerful algorithm for identifying population structure using dense sequencing data |
Gatk | bio | The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data |
Gatk4 | gatk4 | The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data |
GRAND-SLAM | slam | Globally refined analysis of newly transcribed RNA and decay rates using SLAM-seq |
Gsl | bio | The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers |
Homer | homer | Software for motif discovery and next-generation sequencing analysis |
HTSeq | htseq | A Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments |
IBAMR | None | IBAMR is a distributed-memory parallel implementation of the immersed boundary (IB) method with support for Cartesian grid adaptive mesh refinement (AMR) |
IDR | idr | The IDR (Irreproducible Discovery Rate) framework is a unified approach to measure the reproducibility of findings identified from replicate experiments and provide highly stable thresholds based on reproducibility |
Igblast | blast | A tool for analyzing immunoglobulin (IG) and T cell receptor (TR) sequences |
Igv | bio | Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations |
Igvtools | bio | command-line tools for IGV |
IMP | imp | IMP provides an open source C++ and Python toolbox for solving complex modeling problems, and a number of applications for tackling some common problems in a user-friendly way. IMP can also be used from the Chimera molecular modeling system |
Iqtree | tib | Efficient phylogenomic software by maximum likelihood |
Jags & Rjags | jags | JAGS is Just Another Gibbs Sampler. Rjags is an interface to the JAGS MCMC library |
LeafCutter | r4 | Leafcutter quantifies RNA splicing variation using short-read RNA-seq data |
Macs2 | macs2 | Model-Based Analysis for ChIP-Seq data |
Mafft | bio | Multiple alignment program for amino acid or nucleotide sequences |
Magicblast | blast | NCBI BLAST next generation read mapper |
Minimap2 | elkind | A versatile pairwise aligner for genomic and spliced nucleotide sequences |
miRExpress | None | Analyzing high-throughput sequencing data for profiling microRNA expression |
Mega2 | None | “Manipulation Environment for Genetic Analyses” A data-handling program for facilitating genetic linkage and association analyses |
Metaseq | metaseq | Integrative analysis of high-thoughput sequencing data |
Mochiview | mochiview | MochiView is Java software that integrates browsing of genomic sequences, features, and data with DNA motif visualization and analysis |
Meme | meme | Motif based sequence Analysis tools |
Mirdeep2 | bio | A completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs |
Mutscan | mutscan | Detect and visualize target mutations by scanning FastQ files directly |
NucleoATAC | nucleoatac | Python package for calling nucleosomes using ATAC-Seq data. Also includes general scripts for working with paired-end ATAC-Seq data |
Peakachu | peakachu | Peak calling tool for CLIP-seq data |
Phase | None | Reconstructing haplotypes from population data |
Phast | phast | Phylogenetic Analysis with Space/Time models |
Picard | picard | Java tools for working with NGS data in the BAM format |
Plink1.9 & Plink2 | None | Whole-genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner |
Pysam | samtools/1.9 samtools/1.10 | An interface for reading and writing SAM files |
Qiime2 | qiime | A next-generation microbiome bioinformatics platform |
Qualimap | qualimap | Quality control of alignment sequencing data and its derivatives like feature counts |
R ver 4 | r4 | R is a free software environment for statistical computing and graphics |
r-ichorcna | bio | Estimating tumor fraction in cell-free DNA from ultra-low-pass whole-genome sequencing |
Randfold | bio | Minimum free energy of folding randomization test software |
Rclone | rclone | A command line program to manage files on cloud storage |
Regtools | none | A set of tools that integrate DNA-seq and RNA-seq data to help interpret mutations in a regulatory and splicing context |
Relion | relion | RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM) |
Rmats | rmats | MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data |
rmats2sashimiplot | rmats | A tool that generates sahimi plots from rMATS outputs |
rsem | rsem | RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data |
Salmon | bio | Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment |
SAMtools | samtools/1.9 samtools/1.10 | Tools for dealing with SAM, BAM and CRAM files. Version 1.9 is for Python 2.7 and version 1.10 is for Python 3.7 |
Seqmonk | seqmonk | Visualize and analyze high throughput mapped sequence data |
Seqtk | elkind | A fast and lightweight tool for processing sequences in the FASTA or FASTQ format |
Seqkit | elkind | Cross-platform and ultrafast toolkit for FASTA/Q file manipulation |
Slamdunk | slam | SlamDunk is a novel, fully automated software tool for automated, robust, scalable and reproducible SLAMseq data analysis |
Snakemake | snakemake | The Snakemake workflow management system is a tool to create reproducible and scalable data analyses |
SortMeRNA | none | A program tool for filtering, mapping and OTU-picking NGS reads in metatranscriptomic and metagenomic data |
Spades | none | SPAdes – St. Petersburg genome assembler – is an assembly toolkit containing various assembly pipelines |
Statsmodel | bio | A Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration |
SRprism | blast | Single Read Paired Read Indel Substitution Minimizer |
Sratoolkit | sratoolkit | A set of compiled binaries and corresponding source code for tools that download, manipulate and validate next-generation sequencing data stored in the NCBI SRA archive |
STAR | None | Spliced Transcripts Alignment to a Reference |
STAR-Fusion | container | STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads |
Subread | bio | The Subread package comprises a suite of software programs for processing next-gen sequencing read data |
Tetoolkit | tetoolkit | Tools for estimating differential enrichment of Transposable Elements and other highly repetitive regions |
TEtranscripts | tetranscripts | A package for including transposable elements in differential enrichment analysis of sequencing datasets |
Tobias | tobias | TOBIAS - Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal |
Uropa | tobias | UROPA (Universal RObust Peak Annotator) is a command-line based tool, intended for genomic region annotation from e.g. peak calling. It detects the most appropriate annotation by taking parameters such as feature type, anchor, direction, and strand into account. Furthermore, it allows filtering for GTF attribute values, e.g. protein_coding |
TopHat | tophat | A fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie and then analyzes the mapping results to identify splice junctions between exons |
trim_galore | trim_galore/0.3.7 trim_galore/0.6.5 | Trim Galore! is a wrapper script to automate quality and adapter trimming as well as quality |
Trimmomatic | elkind | A flexible read trimming tool for Illumina NGS data |
Trinity | tib | Trinity assembles transcript sequences from Illumina RNA-Seq data |
Vcftools | bio | A set of tools written in Perl and C++ for working with VCF files. This package only contains the C++ libraries whereas the package perl-vcftools-vcf contains the perl libraries |
Viennarna | bio | Vienna RNA package -- RNA secondary structure prediction and comparison |
Weblogo | bio | Web-based application designed to make the generation of sequence logos as easy and painless as possible |