Difference between revisions of "Bioinformatics Software"

From CsWiki
Jump to: navigation, search
(3 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
  module load bio
 
  module load bio
  
This will load these packages: bamtools, bedtools, bismark, circos, bedops, bsmap, gatk, homer, r-ichorcna, igblast, igv, igvtools, mirdeep2, viennarna, plink, randfold, salmon, weblogo, biscuit, finestructure, boost-cpp, boost, gal, Statsmodels, Subread
+
This will load these packages: bamtools, bedtools, bismark, circos, bedops, bsmap, gatk, r-ichorcna, igblast, igv, igvtools, mirdeep2, viennarna, plink, randfold, salmon, weblogo, biscuit, finestructure, boost-cpp, boost, gal, Statsmodels, Subread
  
 
The rest of the package can be loaded with  
 
The rest of the package can be loaded with  
Line 35: Line 35:
 
|-
 
|-
 
| [https://code.google.com/archive/p/bsmap/ Bsmap] || BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leaves methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single-nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference
 
| [https://code.google.com/archive/p/bsmap/ Bsmap] || BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leaves methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single-nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference
 +
|-
 +
| [http://busco.ezlab.org/ Busco] || BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB
 
|-
 
|-
 
| [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger Cellranger] || Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices, and perform clustering and gene expression analysis. Cell Ranger includes four pipelines relevant to single-cell gene expression experiments
 
| [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger Cellranger] || Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices, and perform clustering and gene expression analysis. Cell Ranger includes four pipelines relevant to single-cell gene expression experiments
Line 65: Line 67:
 
|-
 
|-
 
| [http://www.broadinstitute.org/igv/ Igvtools] || command-line tools for IGV
 
| [http://www.broadinstitute.org/igv/ Igvtools] || command-line tools for IGV
 +
|-
 +
| [http://www.iqtree.org/ Iqtree] || Efficient phylogenomic software by maximum likelihood
 
|-
 
|-
 
| [https://github.com/taoliu/MACS/ Macs2] || Model-Based Analysis for ChIP-Seq data
 
| [https://github.com/taoliu/MACS/ Macs2] || Model-Based Analysis for ChIP-Seq data
Line 93: Line 97:
 
|-
 
|-
 
| [http://hammelllab.labsites.cshl.edu/software#TEToolkit Tetoolkit] || Tools for estimating differential enrichment of Transposable Elements and other highly repetitive regions
 
| [http://hammelllab.labsites.cshl.edu/software#TEToolkit Tetoolkit] || Tools for estimating differential enrichment of Transposable Elements and other highly repetitive regions
 +
|-
 +
| [http://hammelllab.labsites.cshl.edu/software/#TEtranscripts TEtranscripts] || A package for including transposable elements in differential enrichment analysis of sequencing datasets
 
|-
 
|-
 
| [https://github.com/loosolab/TOBIAS/ Tobias] || TOBIAS - Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
 
| [https://github.com/loosolab/TOBIAS/ Tobias] || TOBIAS - Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
Line 99: Line 105:
 
|-
 
|-
 
| [https://ccb.jhu.edu/software/tophat/index.shtml TopHat] || A fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie and then analyzes the mapping results to identify splice junctions between exons
 
| [https://ccb.jhu.edu/software/tophat/index.shtml TopHat] || A fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie and then analyzes the mapping results to identify splice junctions between exons
 +
|-
 +
| [https://github.com/trinityrnaseq/trinityrnaseq/ Trinity] || Trinity assembles transcript sequences from Illumina RNA-Seq data
 
|-
 
|-
 
| [http://www.tbi.univie.ac.at/RNA/ Viennarna] || Vienna RNA package -- RNA secondary structure prediction and comparison
 
| [http://www.tbi.univie.ac.at/RNA/ Viennarna] || Vienna RNA package -- RNA secondary structure prediction and comparison

Revision as of 15:48, 30 June 2020

Python packages mainly from Bioconda channel which specializes in bioinformatics software

To use these packages you need to run the following command

module load bio

This will load these packages: bamtools, bedtools, bismark, circos, bedops, bsmap, gatk, r-ichorcna, igblast, igv, igvtools, mirdeep2, viennarna, plink, randfold, salmon, weblogo, biscuit, finestructure, boost-cpp, boost, gal, Statsmodels, Subread

The rest of the package can be loaded with

module load <name_of_package>

List of installed modules can be seen with

module avail
Name Description
Bamtools BamTools provides both a programmer's API and an end user's toolkit for handling BAM files
BCFtools a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF
bedgraphtobigwig Convert a bedGraph file to bigWig format
Bedops BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale. Tasks can be easily split by chromosome for distributing whole-genome analyses across a computational cluster.
Bedtools A swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF
Biscuit A utility for analyzing sodium bisulfite conversion-based DNA methylation/modification data
Bismark A program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyze the methylation levels of their samples straight away
Boost & Boost-cpp Boost provides free peer-reviewed portable C++ source libraries
Bsmap BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leaves methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single-nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference
Busco BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB
Cellranger Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices, and perform clustering and gene expression analysis. Cell Ranger includes four pipelines relevant to single-cell gene expression experiments
Circos Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions
Cutadapt Finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads
Darts Deep-learning Augmented RNA-seq analysis of Transcript Splicing
Eigensoft The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes maximizing power to detect true associations.
Emase Expectation-Maximization algorithm for Allele-Specific Expression
Finestruc fine-structure is a fast and powerful algorithm for identifying population structure using dense sequencing data
Gatk The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data
Gatk4 The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data
Gsl The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers
Homer Software for motif discovery and next-generation sequencing analysis
HTSeq A Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments
Igblast A tool for analyzing immunoglobulin (IG) and T cell receptor (TR) sequences
Igv Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations
Igvtools command-line tools for IGV
Iqtree Efficient phylogenomic software by maximum likelihood
Macs2 Model-Based Analysis for ChIP-Seq data
Mafft Multiple alignment program for amino acid or nucleotide sequences
Meme Motif based sequence Analysis tools
Mirdeep2 A completely overhauled tool which discovers microRNA genes by analyzing sequenced RNAs
Peakachu Peak calling tool for CLIP-seq data
r-ichorcna Estimating tumor fraction in cell-free DNA from ultra-low-pass whole-genome sequencing
Randfold Minimum free energy of folding randomization test software
Rmats MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data
rmats2sashimiplot A tool that generates sahimi plots from rMATS outputs
Salmon Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
SAMtools Tools for dealing with SAM, BAM and CRAM files
Statsmodel A Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration
Subread The Subread package comprises a suite of software programs for processing next-gen sequencing read data
Tetoolkit Tools for estimating differential enrichment of Transposable Elements and other highly repetitive regions
TEtranscripts A package for including transposable elements in differential enrichment analysis of sequencing datasets
Tobias TOBIAS - Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
Uropa UROPA (Universal RObust Peak Annotator) is a command-line based tool, intended for genomic region annotation from e.g. peak calling. It detects the most appropriate annotation by taking parameters such as feature type, anchor, direction, and strand into account. Furthermore, it allows filtering for GTF attribute values, e.g. protein_coding
TopHat A fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie and then analyzes the mapping results to identify splice junctions between exons
Trinity Trinity assembles transcript sequences from Illumina RNA-Seq data
Viennarna Vienna RNA package -- RNA secondary structure prediction and comparison
Weblogo Web-based application designed to make the generation of sequence logos as easy and painless as possible


More software


Name Description
Annovar ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others)
Bcl2fastq bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis
BroadPeak BroadPeak calling algorithm for diffuse ChIP-seq datasets
CellPhoneDB CellPhoneDB is a publicly available repository of curated receptors, ligands, and its interactions
Chimerax UCSF ChimeraX (or simply ChimeraX) is the next-generation molecular visualization program
IBAMR IBAMR is a distributed-memory parallel implementation of the immersed boundary (IB) method with support for Cartesian grid adaptive mesh refinement (AMR)
Mega2 “Manipulation Environment for Genetic Analyses” A data-handling program for facilitating genetic linkage and association analyses
miRExpress Analyzing high-throughput sequencing data for profiling microRNA expression
Mochiview MochiView is Java software that integrates browsing of genomic sequences, features, and data with DNA motif visualization and analysis
Plink2 Whole-genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner
Relion RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)
Sratoolkit A set of compiled binaries and corresponding source code for tools that download, manipulate and validate next-generation sequencing data stored in the NCBI SRA archive
STAR Spliced Transcripts Alignment to a Reference
trim_galore Trim Galore! is a wrapper script to automate quality and adapter trimming as well as quality