Search | VHL Regional Portal

1.

A pan-African pathogen genomics data sharing platform to support disease outbreaks.

Christoffels, Alan; Mboowa, Gerald; van Heusden, Peter; Makhubela, Sello; Githinji, George; Mwangi, Sarah; Onywera, Harris; Nnaemeka, Ndodo; Amoako, Daniel Gyamfi; Olawoye, Idowu; Diallo, Amadou; Mbala-Kingebeni, Placide; Oyola, Samuel O; Adu, Bright; Mvelase, Christopher; Ondoa, Pascale; Dratibi, Fred Athanasius; Sow, Abdourahmane; Gumede, Nicksy; Tessema, Sofonias K; Ouma, Ahmed Ogwell; Tebeje, Yenew Kebede.

Nat Med ; 29(5): 1052-1055, 2023 05.

Article in English | MEDLINE | ID: mdl-37161068

Subject(s)

Disease Outbreaks , Genomics , Information Dissemination , Humans

2.

Galaxy Training: A powerful framework for teaching!

Hiltemann, Saskia; Rasche, Helena; Gladman, Simon; Hotz, Hans-Rudolf; Larivière, Delphine; Blankenberg, Daniel; Jagtap, Pratik D; Wollmann, Thomas; Bretaudeau, Anthony; Goué, Nadia; Griffin, Timothy J; Royaux, Coline; Le Bras, Yvan; Mehta, Subina; Syme, Anna; Coppens, Frederik; Droesbeke, Bert; Soranzo, Nicola; Bacon, Wendi; Psomopoulos, Fotis; Gallardo-Alba, Cristóbal; Davis, John; Föll, Melanie Christine; Fahrner, Matthias; Doyle, Maria A; Serrano-Solano, Beatriz; Fouilloux, Anne Claire; van Heusden, Peter; Maier, Wolfgang; Clements, Dave; Heyl, Florian; Grüning, Björn; Batut, Bérénice.

PLoS Comput Biol ; 19(1): e1010752, 2023 01.

Article in English | MEDLINE | ID: mdl-36622853

ABSTRACT

There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.

Subject(s)

Computational Biology , Software , Humans , Computational Biology/methods , Data Analysis , Research Personnel

3.

Establishing MinION Sequencing and Genome Assembly Procedures for the Analysis of the Rooibos (Aspalathus linearis) Genome.

Mgwatyu, Yamkela; Cornelissen, Stephanie; van Heusden, Peter; Stander, Allison; Ranketse, Mary; Hesse, Uljana.

Plants (Basel) ; 11(16)2022 Aug 19.

Article in English | MEDLINE | ID: mdl-36015459

ABSTRACT

While plant genome analysis is gaining speed worldwide, few plant genomes have been sequenced and analyzed on the African continent. Yet, this information holds the potential to transform diverse industries as it unlocks medicinally and industrially relevant biosynthesis pathways for bioprospecting. Considering that South Africa is home to the highly diverse Cape Floristic Region, local establishment of methods for plant genome analysis is essential. Long-read sequencing is becoming standard procedure for plant genome research, as these reads can span repetitive regions of the DNA, substantially facilitating reassembly of a contiguous genome. With the MinION, Oxford Nanopore offers a cost-efficient sequencing method to generate long reads; however, DNA purification protocols must be adapted for each plant species to generate ultra-pure DNA, essential for these analyses. Here, we describe a cost-effective procedure for the extraction and purification of plant DNA and evaluate diverse genome assembly approaches for the reconstruction of the genome of rooibos (Aspalathus linearis), an endemic South African medicinal plant widely used for tea production. We discuss the pros and cons of nine tested assembly programs, specifically Redbean and NextDenovo, which generated the most contiguous assemblies, and Flye, which produced an assembly closest to the predicted genome size.

4.

Genomic surveillance of Rift Valley fever virus: from sequencing to lineage assignment.

Juma, John; Fonseca, Vagner; Konongoi, Samson L; van Heusden, Peter; Roesel, Kristina; Sang, Rosemary; Bett, Bernard; Christoffels, Alan; de Oliveira, Tulio; Oyola, Samuel O.

BMC Genomics ; 23(1): 520, 2022 Jul 18.

Article in English | MEDLINE | ID: mdl-35850574

ABSTRACT

Genetic evolution of Rift Valley fever virus (RVFV) in Africa has been shaped mainly by environmental changes such as abnormal rainfall patterns and climate change that has occurred over the last few decades. These gradual environmental changes are believed to have effected gene migration from macro (geographical) to micro (reassortment) levels. Presently, 15 lineages of RVFV have been identified to be circulating within the Sub-Saharan Africa. International trade in livestock and movement of mosquitoes are thought to be responsible for the outbreaks occurring outside endemic or enzootic regions. Virus spillover events contribute to outbreaks as was demonstrated by the largest epidemic of 1977 in Egypt. Genomic surveillance of the virus evolution is crucial in developing intervention strategies. Therefore, we have developed a computational tool for rapidly classifying and assigning lineages of the RVFV isolates. The computational method is presented both as a command line tool and a web application hosted at https://www.genomedetective.com/app/typingtool/rvfv/ . Validation of the tool has been performed on a large dataset using glycoprotein gene (Gn) and whole genome sequences of the Large (L), Medium (M) and Small (S) segments of the RVFV retrieved from the National Center for Biotechnology Information (NCBI) GenBank database. Using the Gn nucleotide sequences, the RVFV typing tool was able to correctly classify all 234 RVFV sequences at species level with 100% specificity, sensitivity and accuracy. All the sequences in lineages A (n = 10), B (n = 1), C (n = 88), D (n = 1), E (n = 3), F (n = 2), G (n = 2), H (n = 105), I (n = 2), J (n = 1), K (n = 4), L (n = 8), M (n = 1), N (n = 5) and O (n = 1) were also correctly classified at phylogenetic level. Lineage assignment using whole RVFV genome sequences (L, M and S-segments) did not achieve 100% specificity, sensitivity and accuracy for all the sequences analyzed. We further tested our tool using genomic data that we generated by sequencing 5 samples collected following a recent RVF outbreak in Kenya. All the 5 samples were assigned lineage C by both the partial (Gn) and whole genome sequence classifiers. The tool is useful in tracing the origin of outbreaks and supporting surveillance efforts.Availability: https://github.com/ajodeh-juma/rvfvtyping.

Subject(s)

Rift Valley Fever , Rift Valley fever virus , Animals , Commerce , Genomics , Internationality , Kenya , Phylogeny , Rift Valley Fever/epidemiology , Rift Valley fever virus/genetics

5.

Expanding the Galaxy's reference data.

VijayKrishna, Nagampalli; Joshi, Jayadev; Coraor, Nate; Hillman-Jackson, Jennifer; Bouvier, Dave; van den Beek, Marius; Eguinoa, Ignacio; Coppens, Frederik; Davis, John; Stolarczyk, Michal; Sheffield, Nathan C; Gladman, Simon; Cuccuru, Gianmauro; Grüning, Björn; Soranzo, Nicola; Rasche, Helena; Langhorst, Bradley W; Bernt, Matthias; Fornika, Dan; de Lima Morais, David Anderson; Barrette, Michel; van Heusden, Peter; Petrillo, Mauro; Puertas-Gallardo, Antonio; Patak, Alex; Hotz, Hans-Rudolf; Blankenberg, Daniel.

Bioinform Adv ; 2(1): vbac030, 2022.

Article in English | MEDLINE | ID: mdl-35669346

ABSTRACT

Summary: Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. In addition, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie's remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS (CernVM File System) repository from GalaxyProject.org, with mirrors across the USA, Canada, Europe and Australia, enabling easy use outside of Galaxy. Availability and implementation: The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 22.01, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/BlankenbergLab/galaxy-tools-blankenberg/tree/main/data_managers/data_manager_refgenie_pull and released using an MIT license. Access to existing data is also available through CVMFS, with instructions at https://galaxyproject.org/admin/reference-data-repo/. No new data were generated or analyzed in support of this research.

6.

microRNA profile of Hermetia illucens (black soldier fly) and its implications on mass rearing.

DeRaedt, Sarah; Bierman, Anandi; van Heusden, Peter; Richards, Cameron; Christoffels, Alan.

PLoS One ; 17(3): e0265492, 2022.

Article in English | MEDLINE | ID: mdl-35298540

ABSTRACT

The growing demands on protein producers and the dwindling available resources have made Hermetia illucens (the black soldier fly, BSF) an economically important species. Insights into the genome of this insect will better allow for robust breeding protocols, and more efficient production to be used as a replacement of animal feed protein. The use of microRNA as a method to understand how gene regulation allows insect species to adapt to changes in their environment, has been established in multiple species. The baseline and life stage expression levels established in this study, allow for insight into the development and sex-linked microRNA regulation in BSF. To accomplish this, microRNA was extracted and sequenced from 15 different libraries with each life stage in triplicate. Of the total 192 microRNAs found, 168 were orthologous to known arthropod microRNAs and 24 microRNAs were unique to BSF. Twenty-six of the 168 microRNAs conserved across arthropods had a statistically significant (p < 0.05) differential expression between Egg to Larval stages. The development from larva to pupa was characterized by 16 statistically significant differentially expressed microRNA. Seven and 9 microRNA were detected as statistically significant between pupa to adult female and pupa to adult male, respectively. All life stages had a nearly equal split between up and down regulated microRNAs. Ten of the unique 24 miRNA were detected exclusively in one life stage. The egg life stage expressed five microRNA (hil-miR-m, hil-miR-p, hil-miR-r, hil-miR-s, and hil-miR-u) not seen in any other life stages. The female adult and pupa life stages expressed one miRNA each hil-miR-h and hil-miR-ac respectively. Both male and female adult life stages expressed hil-miR-a, hil-miR-b, and hil-miR-y. There were no unique microRNAs found only in the larva stage. Twenty-two microRNAs with 56 experimentally validated target genes in the closely related Drosophila melanogaster were identified. Thus, the microRNA found display the unique evolution of BSF, along with the life stages and potential genes to target for robust mass rearing. Understanding of the microRNA expression in BSF will further their use in the crucial search for alternative and sustainable protein sources.

Subject(s)

Diptera , MicroRNAs , Animal Feed/analysis , Animals , Drosophila melanogaster , Female , Larva , Male , MicroRNAs/genetics , MicroRNAs/metabolism , Pupa

7.

The COMBAT-TB Workbench: Making Powerful Mycobacterium tuberculosis Bioinformatics Accessible.

van Heusden, Peter; Mashologu, Ziphozakhe; Lose, Thoba; Warren, Robin; Christoffels, Alan.

mSphere ; 7(1): e0099121, 2022 02 23.

Article in English | MEDLINE | ID: mdl-35138128

ABSTRACT

Whole-genome sequencing (WGS) is a powerful method for detecting drug resistance, genetic diversity, and transmission dynamics of Mycobacterium tuberculosis. Implementation of WGS in public health microbiology laboratories is impeded by a lack of user-friendly, automated, and semiautomated pipelines. We present the COMBAT-TB Workbench, a modular, easy-to-install application that provides a web-based environment for Mycobacterium tuberculosis bioinformatics. The COMBAT-TB Workbench is built using two main software components: the IRIDA platform for its web-based user interface and data management capabilities and the Galaxy bioinformatics workflow platform for workflow execution. These components are combined into a single easy-to-install application using Docker container technology. We implemented two workflows, for M. tuberculosis sample analysis and phylogeny, in Galaxy. Building our workflows involved updating some Galaxy tools (Trimmomatic, snippy, and snp-sites) and writing new Galaxy tools (snp-dists, TB-Profiler, tb_variant_filter, and TB Variant Report). The irida-wf-ga2xml tool was updated to be able to work with recent versions of Galaxy and was further developed into IRIDA plugins for both workflows. In the case of the M. tuberculosis sample analysis, an interface was added to update the metadata stored for each sequence sample with results gleaned from the Galaxy workflow output. Data can be loaded into the COMBAT-TB Workbench via the web interface or via the command line IRIDA uploader tool. The COMBAT-TB Workbench application deploys IRIDA, the COMBAT-TB IRIDA plugins, the MariaDB database, and Galaxy using Docker containers (https://github.com/COMBAT-TB/irida-galaxy-deploy). IMPORTANCE While the reduction in the cost of WGS is making sequencing more affordable in lower- and middle-income countries (LMICs), public health laboratories in these countries seldom have access to bioinformaticians and system support engineers adept at using the Linux command line and complex bioinformatics software. The COMBAT-TB Workbench provides an open-source, modular, easy-to-deploy and -use environment for managing and analyzing M. tuberculosis WGS data and thereby makes WGS usable in practice in the LMIC context.

Subject(s)

Mycobacterium tuberculosis , Tuberculosis , Computational Biology/methods , Humans , Mycobacterium tuberculosis/genetics , Software , Workflow

8.

A framework for the promotion of ethical benefit sharing in health research.

Bedeker, Anja; Nichols, Michelle; Allie, Taryn; Tamuhla, Tsaone; van Heusden, Peter; Olorunsogbon, Olorunyomi; Tiffin, Nicki.

BMJ Glob Health ; 7(2)2022 02.

Article in English | MEDLINE | ID: mdl-35144922

ABSTRACT

There is an increasing recognition of the importance of including benefit sharing in research programmes in order to ensure equitable and just distribution of the benefits arising from research. Whilst there are global efforts to promote benefit sharing when using non-human biological resources, benefit sharing plans and implementation do not yet feature prominently in research programmes, funding applications or requirements by ethics review boards. Whilst many research stakeholders may agree with the concept of benefit sharing, it can be difficult to operationalise benefit sharing within research programmes. We present a framework designed to assist with identifying benefit sharing opportunities in research programmes. The framework has two dimensions: the first represents microlevel, mesolevel and macrolevel stakeholders as defined using a socioecological model; and the second identifies nine different types of benefit sharing that might be achieved during a research programme. We provide an example matrix identifying different types of benefit sharing that might be undertaken during genomics research, and present a case study evaluating benefit sharing in Africa during the SARS-CoV-2 pandemic. This framework, with examples, is intended as a practical tool to assist research stakeholders with identifying opportunities for benefit sharing, and inculcating intentional benefit sharing in their research programmes from inception.

Subject(s)

Biomedical Research , COVID-19 , Africa , Humans , SARS-CoV-2

9.

Capacity building for whole genome sequencing of Mycobacterium tuberculosis and bioinformatics in high TB burden countries.

Rivière, Emmanuel; Heupink, Tim H; Ismail, Nabila; Dippenaar, Anzaan; Clarke, Charlene; Abebe, Gemeda; Heusden, Peter; Warren, Rob; Meehan, Conor J; Van Rie, Annelies.

Brief Bioinform ; 22(4)2021 07 20.

Article in English | MEDLINE | ID: mdl-33009560

ABSTRACT

BACKGROUND: Whole genome sequencing (WGS) is increasingly used for Mycobacterium tuberculosis (Mtb) research. Countries with the highest tuberculosis (TB) burden face important challenges to integrate WGS into surveillance and research. METHODS: We assessed the global status of Mtb WGS and developed a 3-week training course coupled with long-term mentoring and WGS infrastructure building. Training focused on genome sequencing, bioinformatics and development of a locally relevant WGS research project. The aim of the long-term mentoring was to support trainees in project implementation and funding acquisition. The focus of WGS infrastructure building was on the DNA extraction process and bioinformatics. FINDINGS: Compared to their TB burden, Asia and Africa are grossly underrepresented in Mtb WGS research. Challenges faced resulted in adaptations to the training, mentoring and infrastructure building. Out-of-date laptop hardware and operating systems were overcome by using online tools and a Galaxy WGS analysis pipeline. A case studies approach created a safe atmosphere for students to formulate and defend opinions. Because quality DNA extraction is paramount for WGS, a biosafety level 3 and general laboratory skill training session were added, use of commercial DNA extraction kits was introduced and a 2-week training in a highly equipped laboratory was combined with a 1-week training in the local setting. INTERPRETATION: By developing and sharing the components of and experiences with a sequencing and bioinformatics training program, we hope to stimulate capacity building programs for Mtb WGS and empower high-burden countries to play an important role in WGS-based TB surveillance and research.

Subject(s)

Computational Biology , Genome, Bacterial , Mycobacterium tuberculosis/genetics , Tuberculosis/genetics , Whole Genome Sequencing , Africa/epidemiology , Asia/epidemiology , Cost of Illness , Humans , Mycobacterium tuberculosis/isolation & purification , Tuberculosis/epidemiology

10.

Genome Sequencing of a Severe Acute Respiratory Syndrome Coronavirus 2 Isolate Obtained from a South African Patient with Coronavirus Disease 2019.

Allam, Mushal; Ismail, Arshad; Khumalo, Zamantungwa T H; Kwenda, Stanford; van Heusden, Peter; Cloete, Ruben; Wibmer, Constantinos Kurt; Mtshali, Phillip Senzo; Mnyameni, Florah; Mohale, Thabo; Subramoney, Kathleen; Walaza, Sibongile; Ngubane, Wendy; Govender, Nevashan; Motaze, Nkengafac V; Bhiman, Jinal N.

Microbiol Resour Announc ; 9(27)2020 Jul 02.

Article in English | MEDLINE | ID: mdl-32616644

ABSTRACT

As a contribution to the global efforts to track and trace the ongoing coronavirus pandemic, here we present the sequence, phylogenetic analysis, and modeling of nonsynonymous mutations for a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome that was detected in a South African patient with coronavirus disease 2019 (COVID-19).

11.

Transcriptomics of the Rooibos (Aspalathus linearis) Species Complex.

Stander, Emily Amor; Williams, Wesley; Mgwatyu, Yamkela; Heusden, Peter van; Rautenbach, Fanie; Marnewick, Jeanine; Roes-Hill, Marilize Le; Hesse, Uljana.

BioTech (Basel) ; 9(4)2020 Sep 23.

Article in English | MEDLINE | ID: mdl-35822822

ABSTRACT

Rooibos (Aspalathus linearis), widely known as a herbal tea, is endemic to the Cape Floristic Region of South Africa (SA). It produces a wide range of phenolic compounds that have been associated with diverse health promoting properties of the plant. The species comprises several growth forms that differ in their morphology and biochemical composition, only one of which is cultivated and used commercially. Here, we established methodologies for non-invasive transcriptome research of wild-growing South African plant species, including (1) harvesting and transport of plant material suitable for RNA sequencing; (2) inexpensive, high-throughput biochemical sample screening; (3) extraction of high-quality RNA from recalcitrant, polysaccharide- and polyphenol rich plant material; and (4) biocomputational analysis of Illumina sequencing data, together with the evaluation of programs for transcriptome assembly (Trinity, IDBA-Trans, SOAPdenovo-Trans, CLC), protein prediction, as well as functional and taxonomic transcript annotation. In the process, we established a biochemically characterized sample pool from 44 distinct rooibos ecotypes (1-5 harvests) and generated four in-depth annotated transcriptomes (each comprising on average ≈86,000 transcripts) from rooibos plants that represent distinct growth forms and differ in their biochemical profiles. These resources will serve future rooibos research and plant breeding endeavours.

12.

COMBAT-TB-NeoDB: fostering tuberculosis research through integrative analysis using graph database technologies.

Lose, Thoba; van Heusden, Peter; Christoffels, Alan.

Bioinformatics ; 36(3): 982-983, 2020 02 01.

Article in English | MEDLINE | ID: mdl-31504165

ABSTRACT

MOTIVATION: Recent advancements in genomic technologies have enabled high throughput cost-effective generation of 'omics' data from M.tuberculosis (M.tb) isolates, which then gets shared via a number of heterogeneous publicly available biological databases. Albeit useful, fragmented curation negatively impacts the researcher's ability to leverage the data via federated queries. RESULTS: We present Combat-TB-NeoDB, an integrated M.tb 'omics' knowledge-base. Combat-TB-NeoDB is based on Neo4j and was created by binding the labeled property graph model to a suitable ontology namely Chado. Combat-TB-NeoDB enables researchers to execute complex federated queries by linking prominent biological databases, and supplementary M.tb variants data from published literature. AVAILABILITY AND IMPLEMENTATION: The Combat-TB-NeoDB (https://neodb.sanbi.ac.za) repository and all tools mentioned in this manuscript are freely available at https://github.com/COMBAT-TB. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Mycobacterium tuberculosis , Tuberculosis , Databases, Factual , Genome , Genomics , Humans , Software

13.

Advancing HIV Vaccine Research With Low-Cost High-Performance Computing Infrastructure: An Alternative Approach for Resource-Limited Settings.

Mabvakure, Batsirai M; Rott, Raymond; Dobrowsky, Leslie; Van Heusden, Peter; Morris, Lynn; Scheepers, Cathrine; Moore, Penny L.

Bioinform Biol Insights ; 13: 1177932219882347, 2019.

Article in English | MEDLINE | ID: mdl-35173421

ABSTRACT

Next-generation sequencing (NGS) technologies have revolutionized biological research by generating genomic data that were once unaffordable by traditional first-generation sequencing technologies. These sequencing methodologies provide an opportunity for in-depth analyses of host and pathogen genomes as they are able to sequence millions of templates at a time. However, these large datasets can only be efficiently explored using bioinformatics analyses requiring huge data storage and computational resources adapted for high-performance processing. High-performance computing allows for efficient handling of large data and tasks that may require multi-threading and prolonged computational times, which is not feasible with ordinary computers. However, high-performance computing resources are costly and therefore not always readily available in low-income settings. We describe the establishment of an affordable high-performance computing bioinformatics cluster consisting of 3 nodes, constructed using ordinary desktop computers and open-source software including Linux Fedora, SLURM Workload Manager, and the Conda package manager. For the analysis of large antibody sequence datasets and for complex viral phylodynamic analyses, the cluster out-performed desktop computers. This has demonstrated that it is possible to construct high-performance computing capacity capable of analyzing large NGS data from relatively low-cost hardware and entirely free (open-source) software, even in resource-limited settings. Such a cluster design has broad utility beyond bioinformatics to other studies that require high-performance computing.

14.

Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics.

Baichoo, Shakuntala; Souilmi, Yassine; Panji, Sumir; Botha, Gerrit; Meintjes, Ayton; Hazelhurst, Scott; Bendou, Hocine; Beste, Eugene de; Mpangase, Phelelani T; Souiai, Oussema; Alghali, Mustafa; Yi, Long; O'Connor, Brian D; Crusoe, Michael; Armstrong, Don; Aron, Shaun; Joubert, Fourie; Ahmed, Azza E; Mbiyavanga, Mamana; Heusden, Peter van; Magosi, Lerato E; Zermeno, Jennie; Mainzer, Liudmila Sergeevna; Fadlelmola, Faisal M; Jongeneel, C Victor; Mulder, Nicola.

BMC Bioinformatics ; 19(1): 457, 2018 Nov 29.

Article in English | MEDLINE | ID: mdl-30486782

ABSTRACT

BACKGROUND: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. RESULTS: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. CONCLUSION: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.

Subject(s)

Computational Biology/methods , Genomics/methods , Africa , Humans , Reproducibility of Results

15.

Organizing and running bioinformatics hackathons within Africa: The H3ABioNet cloud computing experience.

Ahmed, Azza E; Mpangase, Phelelani T; Panji, Sumir; Baichoo, Shakuntala; Souilmi, Yassine; Fadlelmola, Faisal M; Alghali, Mustafa; Aron, Shaun; Bendou, Hocine; De Beste, Eugene; Mbiyavanga, Mamana; Souiai, Oussema; Yi, Long; Zermeno, Jennie; Armstrong, Don; O'Connor, Brian D; Mainzer, Liudmila Sergeevna; Crusoe, Michael R; Meintjes, Ayton; Van Heusden, Peter; Botha, Gerrit; Joubert, Fourie; Jongeneel, C Victor; Hazelhurst, Scott; Mulder, Nicola.

AAS Open Res ; 1: 9, 2018.

Article in English | MEDLINE | ID: mdl-32382696

ABSTRACT

The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines adapted to heterogeneous compute environments, and for the nurturing of technical expertise in workflow languages and containerization technologies. To address this need, in 2016 H3ABioNet arranged its first Cloud Computing and Reproducible Workflows Hackathon, with the purpose of building key genomics analysis pipelines able to run on heterogeneous computing environments and meeting the needs of H3Africa research projects. This paper describes the preparations for this hackathon and reflects upon the lessons learned about its impact on building the technical and scientific expertise of African researchers. The workflows developed were made publicly available in GitHub repositories and deposited as container images on quay.io.

16.

An integrated and comparative approach towards identification, characterization and functional annotation of candidate genes for drought tolerance in sorghum (Sorghum bicolor (L.) Moench).

Woldesemayat, Adugna Abdi; Van Heusden, Peter; Ndimba, Bongani K; Christoffels, Alan.

BMC Genet ; 18(1): 119, 2017 12 22.

Article in English | MEDLINE | ID: mdl-29273003

ABSTRACT

BACKGROUND: Drought is the most disastrous abiotic stress that severely affects agricultural productivity worldwide. Understanding the biological basis of drought-regulated traits, requires identification and an in-depth characterization of genetic determinants using model organisms and high-throughput technologies. However, studies on drought tolerance have generally been limited to traditional candidate gene approach that targets only a single gene in a pathway that is related to a trait. In this study, we used sorghum, one of the model crops that is well adapted to arid regions, to mine genes and define determinants for drought tolerance using drought expression libraries and RNA-seq data. RESULTS: We provide an integrated and comparative in silico candidate gene identification, characterization and annotation approach, with an emphasis on genes playing a prominent role in conferring drought tolerance in sorghum. A total of 470 non-redundant functionally annotated drought responsive genes (DRGs) were identified using experimental data from drought responses by employing pairwise sequence similarity searches, pathway and interpro-domain analysis, expression profiling and orthology relation. Comparison of the genomic locations between these genes and sorghum quantitative trait loci (QTLs) showed that 40% of these genes were co-localized with QTLs known for drought tolerance. The genome reannotation conducted using the Program to Assemble Spliced Alignment (PASA), resulted in 9.6% of existing single gene models being updated. In addition, 210 putative novel genes were identified using AUGUSTUS and PASA based analysis on expression dataset. Among these, 50% were single exonic, 69.5% represented drought responsive and 5.7% were complete gene structure models. Analysis of biochemical metabolism revealed 14 metabolic pathways that are related to drought tolerance and also had a strong biological network, among categories of genes involved. Identification of these pathways, signifies the interplay of biochemical reactions that make up the metabolic network, constituting fundamental interface for sorghum defence mechanism against drought stress. CONCLUSIONS: This study suggests untapped natural variability in sorghum that could be used for developing drought tolerance. The data presented here, may be regarded as an initial reference point in functional and comparative genomics in the Gramineae family.

Subject(s)

Genes, Plant , Molecular Sequence Annotation , Sorghum/genetics , Sorghum/physiology , Computer Simulation , Droughts , Exons , Metabolic Networks and Pathways , Quantitative Trait Loci , Transcriptome

17.

Virome Assembly and Annotation: A Surprise in the Namib Desert.

Hesse, Uljana; van Heusden, Peter; Kirby, Bronwyn M; Olonade, Israel; van Zyl, Leonardo J; Trindade, Marla.

Front Microbiol ; 8: 13, 2017.

Article in English | MEDLINE | ID: mdl-28167933

ABSTRACT

Sequencing, assembly, and annotation of environmental virome samples is challenging. Methodological biases and differences in species abundance result in fragmentary read coverage; sequence reconstruction is further complicated by the mosaic nature of viral genomes. In this paper, we focus on biocomputational aspects of virome analysis, emphasizing latent pitfalls in sequence annotation. Using simulated viromes that mimic environmental data challenges we assessed the performance of five assemblers (CLC-Workbench, IDBA-UD, SPAdes, RayMeta, ABySS). Individual analyses of relevant scaffold length fractions revealed shortcomings of some programs in reconstruction of viral genomes with excessive read coverage (IDBA-UD, RayMeta), and in accurate assembly of scaffolds ≥50 kb (SPAdes, RayMeta, ABySS). The CLC-Workbench assembler performed best in terms of genome recovery (including highly covered genomes) and correct reconstruction of large scaffolds; and was used to assemble a virome from a copper rich site in the Namib Desert. We found that scaffold network analysis and cluster-specific read reassembly improved reconstruction of sequences with excessive read coverage, and that strict data filtering for non-viral sequences prior to downstream analyses was essential. In this study we describe novel viral genomes identified in the Namib Desert copper site virome. Taxonomic affiliations of diverse proteins in the dataset and phylogenetic analyses of circovirus-like proteins indicated links to the marine habitat. Considering additional evidence from this dataset we hypothesize that viruses may have been carried from the Atlantic Ocean into the Namib Desert by fog and wind, highlighting the impact of the extended environment on an investigated niche in metagenome studies.

18.

Correction: Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding.

Vij, Shubha; Kuhl, Heiner; Kuznetsova, Inna S; Komissarov, Aleksey; Yurchenko, Andrey A; Van Heusden, Peter; Singh, Siddharth; Thevasagayam, Natascha M; Prakki, Sai Rama Sridatta; Purushothaman, Kathiresan; Saju, Jolly M; Jiang, Junhui; Mbandi, Stanley Kimbung; Jonas, Mario; Hin Yan Tong, Amy; Mwangi, Sarah; Lau, Doreen; Ngoh, Si Yan; Liew, Woei Chang; Shen, Xueyan; Hon, Lawrence S; Drake, James P; Boitano, Matthew; Hall, Richard; Chin, Chen-Shan; Lachumanan, Ramkumar; Korlach, Jonas; Trifonov, Vladimir; Kabilov, Marsel; Tupikin, Alexey; Green, Darrell; Moxon, Simon; Garvin, Tyler; Sedlazeck, Fritz J; Vurture, Gregory W; Gopalapillai, Gopikrishna; Katneni, Vinaya Kumar; Noble, Tansyn H; Scaria, Vinod; Sivasubbu, Sridhar; Jerry, Dean R; O'Brien, Stephen J; Schatz, Michael C; Dalmay, Tamás; Turner, Stephen W; Lok, Si; Christoffels, Alan; Orbán, László.

PLoS Genet ; 12(12): e1006500, 2016 Dec.

Article in English | MEDLINE | ID: mdl-27935956

ABSTRACT

[This corrects the article DOI: 10.1371/journal.pgen.1005954.].

19.

Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding.

Vij, Shubha; Kuhl, Heiner; Kuznetsova, Inna S; Komissarov, Aleksey; Yurchenko, Andrey A; Van Heusden, Peter; Singh, Siddharth; Thevasagayam, Natascha M; Prakki, Sai Rama Sridatta; Purushothaman, Kathiresan; Saju, Jolly M; Jiang, Junhui; Mbandi, Stanley Kimbung; Jonas, Mario; Hin Yan Tong, Amy; Mwangi, Sarah; Lau, Doreen; Ngoh, Si Yan; Liew, Woei Chang; Shen, Xueyan; Hon, Lawrence S; Drake, James P; Boitano, Matthew; Hall, Richard; Chin, Chen-Shan; Lachumanan, Ramkumar; Korlach, Jonas; Trifonov, Vladimir; Kabilov, Marsel; Tupikin, Alexey; Green, Darrell; Moxon, Simon; Garvin, Tyler; Sedlazeck, Fritz J; Vurture, Gregory W; Gopalapillai, Gopikrishna; Kumar Katneni, Vinaya; Noble, Tansyn H; Scaria, Vinod; Sivasubbu, Sridhar; Jerry, Dean R; O'Brien, Stephen J; Schatz, Michael C; Dalmay, Tamás; Turner, Stephen W; Lok, Si; Christoffels, Alan; Orbán, László.

PLoS Genet ; 12(4): e1005954, 2016 Apr.

Article in English | MEDLINE | ID: mdl-27082250

ABSTRACT

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.

Subject(s)

Bass/genetics , Chromosome Mapping , Animals , Bass/classification , Genome , In Situ Hybridization, Fluorescence , Phylogeny

20.

NetCapDB: measuring bioinformatics capacity development in Africa.

Bendou, Hocine; Entfellner, Jean-Baka Domelevo; van Heusden, Peter; Gamieldien, Junaid; Tiffin, Nicki.

BMC Res Notes ; 9: 144, 2016 Mar 05.

Article in English | MEDLINE | ID: mdl-26945860

ABSTRACT

BACKGROUND: The National Institutes of Health (USA) has committed 5 years of funding to the Bioinformatics Network of the Human Heredity and Health in Africa initiative. This pan-African network aims to develop capacity for bioinformatics research, in order to provide support to human health genomics research programs ongoing on the continent. Over the 5 years of funding, it is imperative to track changes in bioinformatics capacity at the funded centres and to document how the funding has translated into capacity development during this time frame. RESULTS: The Network capacity database, NetCapDB, is a relational database that captures quantitative metrics for bioinformatics capacity, and tracks the changes in these metrics over time. A graphical user interface allows for straight-forward, browser-based data entry by users across Africa; and for visual and graph-based exploration of captured data. A reporting interface allows for semi-automated generation of standardized reports for monitoring and evaluation purposes.

Subject(s)

Computational Biology/economics , Genome, Human , National Institutes of Health (U.S.)/economics , Program Evaluation/statistics & numerical data , Africa , Capital Financing , Computational Biology/instrumentation , Computational Biology/methods , Databases, Factual , Humans , United States , User-Computer Interface

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL