Search | VHL Regional Portal

nf-rnaSeqCount: A Nextflow pipeline for obtaining raw read counts from RNA-seq data.

Mpangase, Phelelani T; Frost, Jacqueline; Tikly, Mohammed; Ramsay, Michèle; Hazelhurst, Scott.

S Afr Comput J ; 33(2)2021 Dec.

Article in English | MEDLINE | ID: mdl-35574063

ABSTRACT

The rate of raw sequence production through Next-Generation Sequencing (NGS) has been growing exponentially due to improved technology and reduced costs. This has enabled researchers to answer many biological questions through "multi-omics" data analyses. Even though such data promises new insights into how biological systems function and understanding disease mechanisms, computational analyses performed on such large datasets comes with its challenges and potential pitfalls. The aim of this study was to develop a robust portable and reproducible bioinformatic pipeline for the automation of RNA sequencing (RNA-seq) data analyses. Using Nextflow as a workflow management system and Singularity for application containerisation, the nf-rnaSeqCount pipeline was developed for mapping raw RNA-seq reads to a reference genome and quantifying abundance of identified genomic features for differential gene expression analyses. The pipeline provides a quick and efficient way to obtain a matrix of read counts that can be used with tools such as DESeq2 and edgeR for differential expression analysis. Robust and flexible bioinformatic and computational pipelines for RNA-seq data analysis, from QC to sequence alignment and comparative analyses, will reduce analysis time, and increase accuracy and reproducibility of findings to promote transcriptome research.

Functional characterisation of the transcriptome from leaf tissue of the fluoroacetate-producing plant, Dichapetalum cymosum, in response to mechanical wounding.

Sooklal, Selisha A; Mpangase, Phelelani T; Tomescu, Mihai-Silviu; Aron, Shaun; Hazelhurst, Scott; Archer, Robert H; Rumbold, Karl.

Sci Rep ; 10(1): 20539, 2020 11 25.

Article in English | MEDLINE | ID: mdl-33239700

ABSTRACT

Dichapetalum cymosum produces the toxic fluorinated metabolite, fluoroacetate, presumably as a defence mechanism. Given the rarity of fluorinated metabolites in nature, the biosynthetic origin and function of fluoroacetate have been of particular interest. However, the mechanism for fluorination in D. cymosum was never elucidated. More importantly, there is a severe lack in knowledge on a genetic level for fluorometabolite-producing plants, impeding research on the subject. Here, we report on the first transcriptome for D. cymosum and investigate the wound response for insights into fluorometabolite production. Mechanical wounding studies were performed and libraries of the unwounded (control) and wounded (30 and 60 min post wounding) plant were sequenced using the Illumina HiSeq platform. A combined reference assembly generated 77,845 transcripts. Using the SwissProt, TrEMBL, GO, eggNOG, KEGG, Pfam, EC and PlantTFDB databases, a 69% annotation rate was achieved. Differential expression analysis revealed the regulation of 364 genes in response to wounding. The wound responses in D. cymosum included key mechanisms relating to signalling cascades, phytohormone regulation, transcription factors and defence-related secondary metabolites. However, the role of fluoroacetate in inducible wound responses remains unclear. Bacterial fluorinases were searched against the D. cymosum transcriptome but transcripts with homology were not detected suggesting the presence of a potentially different fluorinating enzyme in plants. Nevertheless, the transcriptome produced in this study significantly increases genetic resources available for D. cymosum and will assist with future research into fluorometabolite-producing plants.

Subject(s)

Fluoroacetates/metabolism , Magnoliopsida/genetics , Plant Leaves/genetics , Stress, Mechanical , Transcriptome/genetics , Biosynthetic Pathways/genetics , Down-Regulation/genetics , Gene Expression Regulation, Plant , Gene Ontology , Molecular Sequence Annotation , Plant Proteins/genetics , Plant Proteins/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Up-Regulation/genetics

Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics.

Baichoo, Shakuntala; Souilmi, Yassine; Panji, Sumir; Botha, Gerrit; Meintjes, Ayton; Hazelhurst, Scott; Bendou, Hocine; Beste, Eugene de; Mpangase, Phelelani T; Souiai, Oussema; Alghali, Mustafa; Yi, Long; O'Connor, Brian D; Crusoe, Michael; Armstrong, Don; Aron, Shaun; Joubert, Fourie; Ahmed, Azza E; Mbiyavanga, Mamana; Heusden, Peter van; Magosi, Lerato E; Zermeno, Jennie; Mainzer, Liudmila Sergeevna; Fadlelmola, Faisal M; Jongeneel, C Victor; Mulder, Nicola.

BMC Bioinformatics ; 19(1): 457, 2018 Nov 29.

Article in English | MEDLINE | ID: mdl-30486782

ABSTRACT

BACKGROUND: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. RESULTS: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. CONCLUSION: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.

Subject(s)

Computational Biology/methods , Genomics/methods , Africa , Humans , Reproducibility of Results

Organizing and running bioinformatics hackathons within Africa: The H3ABioNet cloud computing experience.

Ahmed, Azza E; Mpangase, Phelelani T; Panji, Sumir; Baichoo, Shakuntala; Souilmi, Yassine; Fadlelmola, Faisal M; Alghali, Mustafa; Aron, Shaun; Bendou, Hocine; De Beste, Eugene; Mbiyavanga, Mamana; Souiai, Oussema; Yi, Long; Zermeno, Jennie; Armstrong, Don; O'Connor, Brian D; Mainzer, Liudmila Sergeevna; Crusoe, Michael R; Meintjes, Ayton; Van Heusden, Peter; Botha, Gerrit; Joubert, Fourie; Jongeneel, C Victor; Hazelhurst, Scott; Mulder, Nicola.

AAS Open Res ; 1: 9, 2018.

Article in English | MEDLINE | ID: mdl-32382696

ABSTRACT

The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines adapted to heterogeneous compute environments, and for the nurturing of technical expertise in workflow languages and containerization technologies. To address this need, in 2016 H3ABioNet arranged its first Cloud Computing and Reproducible Workflows Hackathon, with the purpose of building key genomics analysis pipelines able to run on heterogeneous computing environments and meeting the needs of H3Africa research projects. This paper describes the preparations for this hackathon and reflects upon the lessons learned about its impact on building the technical and scientific expertise of African researchers. The workflows developed were made publicly available in GitHub repositories and deposited as container images on quay.io.

Whole genome sequence of Oscheius sp. TEL-2014 entomopathogenic nematodes isolated from South Africa.

Lephoto, Tiisetso E; Mpangase, Phelelani T; Aron, Shaun; Gray, Vincent M.

Genom Data ; 7: 259-61, 2016 Mar.

Article in English | MEDLINE | ID: mdl-27054091

ABSTRACT

We present the annotation of the draft genome sequence of Oscheius sp. TEL-2014 (Genbank accession number KM492926). This entomopathogenic nematode was isolated from grassland in Suikerbosrand Nature Reserve near Johannesburg in South Africa. Oscheius sp. Strain TEL has a genome size of 110,599,558 bp and a GC content of 42.24%. The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession number LNBV00000000.

Draft Genome Sequence and Assembly of Photorhabdus heterorhabditis Strain VMG, a Bacterial Symbiont Associated with the Entomopathogenic Nematode Heterorhabditis zealandica.

Naidoo, Stephanie; Mothupi, Boipelo; Featherston, Jonathan; Mpangase, Phelelani T; Gray, Vincent M.

Genome Announc ; 3(5)2015 Oct 29.

Article in English | MEDLINE | ID: mdl-26514768

ABSTRACT

Here, we report the draft genome sequence of Photorhabdus heterorhabditis strain VMG, a symbiont of the entomopathogenic nematode Heterorhabditis zealandica in South Africa. The draft genome sequence is 4,878,919 bp long and contains 4,023 protein-coding genes. The genome assembly contains 262 contigs with a G+C content of 42.22%.

SpeeDB: fast structural protein searches.

Robillard, David E; Mpangase, Phelelani T; Hazelhurst, Scott; Dehne, Frank.

Bioinformatics ; 31(18): 3027-34, 2015 Sep 15.

Article in English | MEDLINE | ID: mdl-25979473

ABSTRACT

MOTIVATION: Interactions between amino acids are important determinants of the structure, stability and function of proteins. Several tools have been developed for the identification and analysis of such interactions in proteins based on the extensive studies carried out on high-resolution structures from Protein Data Bank (PDB). Although these tools allow users to identify and analyze interactions, analysis can only be performed on one structure at a time. This makes it difficult and time consuming to study the significance of these interactions on a large scale. RESULTS: SpeeDB is a web-based tool for the identification of protein structures based on structural properties. SpeeDB queries are executed on all structures in the PDB at once, quickly enough for interactive use. SpeeDB includes standard queries based on published criteria for identifying various structures: disulphide bonds, catalytic triads and aromatic-aromatic, sulphur-aromatic, cation-π and ionic interactions. Users can also construct custom queries in the user interface without any programming. Results can be downloaded in a Comma Separated Value (CSV) format for further analysis with other tools. Case studies presented in this article demonstrate how SpeeDB can be used to answer various biological questions. Analysis of human proteases revealed that disulphide bonds are the predominant type of interaction and are located close to the active site, where they promote substrate specificity. When comparing the two homologous G protein-coupled receptors and the two protein kinase paralogs analyzed, the differences in the types of interactions responsible for stability accounts for the differences in specificity and functionality of the structures. AVAILABILITY AND IMPLEMENTATION: SpeeDB is available at http://www.parallelcomputing.ca as a web service. CONTACT: d@drobilla.net SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Computational Biology/methods , Peptide Hydrolases/chemistry , Software , Amino Acids/chemistry , Databases, Protein , Humans , Hydrogen Bonding , Models, Molecular , Protein Conformation , Structural Homology, Protein

Discovery-2: an interactive resource for the rational selection and comparison of putative drug target proteins in malaria.

Mpangase, Phelelani T; Szolkiewicz, Michal J; le Grange, Misha; Smit, Jeanré H; Burger, Pieter B; Joubert, Fourie.

Malar J ; 12: 116, 2013 Mar 28.

Article in English | MEDLINE | ID: mdl-23537208

ABSTRACT

BACKGROUND: Drug resistance to anti-malarial compounds remains a serious problem, with resistance to newer pharmaceuticals developing at an alarming rate. The development of new anti-malarials remains a priority, and the rational selection of putative targets is a key element of this process. Discovery-2 is an update of the original Discovery in silico resource for the rational selection of putative drug target proteins, enabling researchers to obtain information for a protein which may be useful for the selection of putative drug targets, and to perform advanced filtering of proteins encoded by the malaria genome based on a series of molecular properties. METHODS: An updated in silico resource has been developed where researchers are able to mine information on malaria proteins and predicted ligands, as well as perform comparisons to the human and mosquito host characteristics. Protein properties used include: domains, motifs, EC numbers, GO terms, orthologs, protein-protein interactions, protein-ligand interactions. Newly added features include drugability measures from ChEMBL, automated literature relations and links to clinical trial information. Searching by chemical structure is also available. RESULTS: The updated functionality of the Discovery-2 resource is presented, together with a detailed case study of the Plasmodium falciparum S-adenosyl-L-homocysteine hydrolase (PfSAHH) protein. A short example of a chemical search with pyrimethamine is also illustrated. CONCLUSION: The updated Discovery-2 resource allows researchers to obtain detailed properties of proteins from the malaria genome, which may be of interest in the target selection process, and to perform advanced filtering and selection of proteins based on a relevant range of molecular characteristics.

Subject(s)

Antimalarials/isolation & purification , Computational Biology/methods , Data Mining/methods , Drug Discovery/methods , Plasmodium falciparum/drug effects , Plasmodium falciparum/genetics , Humans

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL