Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
1.
Neurol Genet ; 9(4): e200077, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37346932

RESUMO

Background and Objectives: Amyotrophic lateral sclerosis (ALS) is a degenerative condition of the brain and spinal cord in which protein-coding variants in known ALS disease genes explain a minority of sporadic cases. There is a growing interest in the role of noncoding structural variants (SVs) as ALS risk variants or genetic modifiers of ALS phenotype. In small European samples, specific short SV alleles in noncoding regulatory regions of SCAF4, SQSTM1, and STMN2 have been reported to be associated with ALS, and several groups have investigated the possible role of SMN1/SMN2 gene copy numbers in ALS susceptibility and clinical severity. Methods: Using short-read whole genome sequencing (WGS) data, we investigated putative ALS-susceptibility SCAF4 (3'UTR poly-T repeat), SQSTM1 (intron 5 AAAC insertion), and STMN2 (intron 3 CA repeat) alleles in African ancestry patients with ALS and described the architecture of the SMN1/SMN2 gene region. South African cases with ALS (n = 114) were compared with ancestry-matched controls (n = 150), 1000 Genomes Project samples (n = 2,336), and H3Africa Genotyping Chip Project samples (n = 347). Results: There was no association with previously reported SCAF4 poly-T repeat, SQSTM1 AAAC insertion, and long STMN2 CA alleles with ALS risk in South Africans (p > 0.2). Similarly, SMN1 and SMN2 gene copy numbers did not differ between South Africans with ALS and matched population controls (p > 0.9). Notably, 20% of the African samples in this study had no SMN2 gene copies, which is a higher frequency than that reported in Europeans (approximately 7%). Discussion: We did not replicate the reported association of SCAF4, SQSTM1, and STMN2 short SVs with ALS in a small South African sample. In addition, we found no link between SMN1 and SMN2 copy numbers and susceptibility to ALS in this South African sample, which is similar to the conclusion of a recent meta-analysis of European studies. However, the SMN gene region findings in Africans replicate previous results from East and West Africa and highlight the importance of including diverse population groups in disease gene discovery efforts. The clinically relevant differences in the SMN gene architecture between African and non-African populations may affect the effectiveness of targeted SMN2 gene therapy for related diseases such as spinal muscular atrophy.

2.
PLoS Comput Biol ; 19(6): e1011163, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37327214

RESUMO

BACKGROUND: Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines. RESULTS: We compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database-but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation. CONCLUSIONS: By estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.


Assuntos
Microbiota , Espectrometria de Massas em Tandem , Humanos , RNA Ribossômico 16S/genética , Bases de Dados de Proteínas , Peptídeos/genética , Peptídeos/análise , Microbiota/genética , Bactérias/genética , Proteoma/genética
3.
Commun Biol ; 6(1): 49, 2023 01 14.
Artigo em Inglês | MEDLINE | ID: mdl-36641522

RESUMO

Pulmonary function is an indicator of well-being, and pulmonary pathologies are the third major cause of death worldwide. We analysed the UK Biobank genome-wide association summary statistics of pulmonary function for Europeans and individuals of recent African descent to identify variants associated with the trait in the two ancestries. Here, we show 627 variants in Europeans and 3 in Africans associated with three pulmonary function parameters. In addition to the 110 variants in Europeans previously reported to be associated with phenotypes related to pulmonary function, we identify 279 novel loci, including an ISX intergenic variant rs369476290 on chromosome 22 in Africans. Remarkably, we find no shared variants among Africans and Europeans. Furthermore, enrichment analyses of variants separately for each ancestry background reveal significant enrichment for terms related to pulmonary phenotypes in Europeans but not Africans. Further analysis of studies of pulmonary phenotypes reveals that individuals of European background are disproportionally overrepresented in datasets compared to Africans, with the gap widening over the past five years. Our findings extend our understanding of the different variants that modify the pulmonary function in Africans and Europeans, a promising finding for future GWASs and medical studies.


Assuntos
Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Pulmão , Humanos , População Negra/genética , Pulmão/fisiologia , Reino Unido , População Europeia/genética
4.
Neurol Genet ; 8(1): e654, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35047667

RESUMO

BACKGROUND AND OBJECTIVES: To perform the first screen of 44 amyotrophic lateral sclerosis (ALS) genes in a cohort of African genetic ancestry individuals with ALS using whole-genome sequencing (WGS) data. METHODS: One hundred three consecutive cases with probable/definite ALS (using the revised El Escorial criteria), and self-categorized as African genetic ancestry, underwent WGS using various Illumina platforms. As population controls, 238 samples from various African WGS data sets were included. Our analysis was restricted to 44 ALS genes, which were curated for rare sequence variants and classified according to the American College of Medical Genetics guidelines as likely benign, uncertain significance, likely pathogenic, or pathogenic variants. RESULTS: Thirteen percent of 103 ALS cases harbored pathogenic variants; 5 different SOD1 variants (N87S, G94D, I114T, L145S, and L145F) in 5 individuals (5%, 1 familial case), pathogenic C9orf72 repeat expansions in 7 individuals (7%, 1 familial case) and a likely pathogenic ANXA11 (G38R) variant in 1 individual. Thirty individuals (29%) harbored ≥1 variant of uncertain significance; 10 of these variants had limited pathogenic evidence, although this was insufficient to permit confident classification as pathogenic. DISCUSSION: Our findings show that known ALS genes can be expected to identify a genetic cause of disease in >11% of sporadic ALS cases of African genetic ancestry. Similar to European cohorts, the 2 most frequent genes harboring pathogenic variants in this population group are C9orf72 and SOD1.

5.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33341897

RESUMO

Current variant calling (VC) approaches have been designed to leverage populations of long-range haplotypes and were benchmarked using populations of European descent, whereas most genetic diversity is found in non-European such as Africa populations. Working with these genetically diverse populations, VC tools may produce false positive and false negative results, which may produce misleading conclusions in prioritization of mutations, clinical relevancy and actionability of genes. The most prominent question is which tool or pipeline has a high rate of sensitivity and precision when analysing African data with either low or high sequence coverage, given the high genetic diversity and heterogeneity of this data. Here, a total of 100 synthetic Whole Genome Sequencing (WGS) samples, mimicking the genetics profile of African and European subjects for different specific coverage levels (high/low), have been generated to assess the performance of nine different VC tools on these contrasting datasets. The performances of these tools were assessed in false positive and false negative call rates by comparing the simulated golden variants to the variants identified by each VC tool. Combining our results on sensitivity and positive predictive value (PPV), VarDict [PPV = 0.999 and Matthews correlation coefficient (MCC) = 0.832] and BCFtools (PPV = 0.999 and MCC = 0.813) perform best when using African population data on high and low coverage data. Overall, current VC tools produce high false positive and false negative rates when analysing African compared with European data. This highlights the need for development of VC approaches with high sensitivity and precision tailored for populations characterized by high genetic variations and low linkage disequilibrium.


Assuntos
População Negra/genética , Bases de Dados de Ácidos Nucleicos , Variação Genética , Genoma Humano , População Branca/genética , Sequenciamento Completo do Genoma , Humanos , Desequilíbrio de Ligação
6.
Front Hum Neurosci ; 15: 761424, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35002653

RESUMO

Networks are present in many aspects of our lives, and networks in neuroscience have recently gained much attention leading to novel representations of brain connectivity. The integration of neuroimaging characteristics and genetics data allows a better understanding of the effects of the gene expression on brain structural and functional connections. The current work uses whole-brain tractography in a longitudinal setting, and by measuring the brain structural connectivity changes studies the neurodegeneration of Alzheimer's disease. This is accomplished by examining the effect of targeted genetic risk factors on the most common local and global brain connectivity measures. Furthermore, we examined the extent to which Clinical Dementia Rating relates to brain connections longitudinally, as well as to gene expression. For instance, here we show that the expression of PLAU gene increases the change over time in betweenness centrality related to the fusiform gyrus. We also show that the betweenness centrality metric impact dementia-related changes in distinct brain regions. Our findings provide insights into the complex longitudinal interplay between genetics and brain characteristics and highlight the role of Alzheimer's genetic risk factors in the estimation of regional brain connectivity alterations.

7.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33129201

RESUMO

Advances in high-throughput sequencing technologies have resulted in an exponential growth of publicly accessible biological datasets. In the 'big data' driven 'post-genomic' context, much work is being done to explore human protein-protein interactions (PPIs) for a systems level based analysis to uncover useful signals and gain more insights to advance current knowledge and answer specific biological and health questions. These PPIs are experimentally or computationally predicted, stored in different online databases and some of PPI resources are updated regularly. As with many biological datasets, such regular updates continuously render older PPI datasets potentially outdated. Moreover, while many of these interactions are shared between these online resources, each resource includes its own identified PPIs and none of these databases exhaustively contains all existing human PPI maps. In this context, it is essential to enable the integration of or combining interaction datasets from different resources, to generate a PPI map with increased coverage and confidence. To allow researchers to produce an integrated human PPI datasets in real-time, we introduce the integrated human protein-protein interaction network generator (IHP-PING) tool. IHP-PING is a flexible python package which generates a human PPI network from freely available online resources. This tool extracts and integrates heterogeneous PPI datasets to generate a unified PPI network, which is stored locally for further applications.


Assuntos
Bases de Dados de Proteínas , Linguagens de Programação , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Humanos
8.
Sci Rep ; 10(1): 1433, 2020 01 29.
Artigo em Inglês | MEDLINE | ID: mdl-31996736

RESUMO

Variations in the human genome have been found to be an essential factor that affects susceptibility to Alzheimer's disease. Genome-wide association studies (GWAS) have identified genetic loci that significantly contribute to the risk of Alzheimers. The availability of genetic data, coupled with brain imaging technologies have opened the door for further discoveries, by using data integration methodologies and new study designs. Although methods have been proposed for integrating image characteristics and genetic information for studying Alzheimers, the measurement of disease is often taken at a single time point, therefore, not allowing the disease progression to be taken into consideration. In longitudinal settings, we analyzed neuroimaging and single nucleotide polymorphism datasets obtained from the Alzheimer's Disease Neuroimaging Initiative for three clinical stages of the disease, including healthy control, early mild cognitive impairment and Alzheimer's disease subjects. We conducted a GWAS regressing the absolute change of global connectivity metrics on the genetic variants, and used the GWAS summary statistics to compute the gene and pathway scores. We observed significant associations between the change in structural brain connectivity defined by tractography and genes, which have previously been reported to biologically manipulate the risk and progression of certain neurodegenerative disorders, including Alzheimer's disease.


Assuntos
Doença de Alzheimer/genética , Encéfalo/fisiologia , Doença de Alzheimer/diagnóstico , Encéfalo/diagnóstico por imagem , Conectoma , Progressão da Doença , Ontologia Genética , Redes Reguladoras de Genes , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Fator de Crescimento Insulin-Like I/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Receptores Acoplados a Proteínas G/genética , Receptores de Peptídeos/genética , Transmissão Sináptica
9.
Brief Funct Genomics ; 19(1): 49-59, 2020 01 22.
Artigo em Inglês | MEDLINE | ID: mdl-31867604

RESUMO

In silico DNA sequence generation is a powerful technology to evaluate and validate bioinformatics tools, and accordingly more than 35 DNA sequence simulation tools have been developed. With such a diverse array of tools to choose from, an important question is: Which tool should be used for a desired outcome? This question is largely unanswered as documentation for many of these DNA simulation tools is sparse. To address this, we performed a review of DNA sequence simulation tools developed to date and evaluated 20 state-of-art DNA sequence simulation tools on their ability to produce accurate reads based on their implemented sequence error model. We provide a succinct description of each tool and suggest which tool is most appropriate for the given different scenarios. Given the multitude of similar yet non-identical tools, researchers can use this review as a guide to inform their choice of DNA sequence simulation tool. This paves the way towards assessing existing tools in a unified framework, as well as enabling different simulation scenario analysis within the same framework.


Assuntos
Simulação por Computador , DNA/análise , DNA/genética , Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala , Humanos
10.
Genes (Basel) ; 10(12)2019 11 21.
Artigo em Inglês | MEDLINE | ID: mdl-31766582

RESUMO

Hearing impairment (HI) is a common sensory disorder that is defined as the partial or complete inability to detect sound in one or both ears. This diverse pathology is associated with a myriad of phenotypic expressions and can be non-syndromic or syndromic. HI can be caused by various genetic, environmental, and/or unknown factors. Some ontologies capture some HI forms, phenotypes, and syndromes, but there is no comprehensive knowledge portal which includes aspects specific to the HI disease state. This hampers inter-study comparability, integration, and interoperability within and across disciplines. This work describes the HI Ontology (HIO) that was developed based on the Sickle Cell Disease Ontology (SCDO) model. This is a collaboratively developed resource built around the 'Hearing Impairment' concept by a group of experts in different aspects of HI and ontologies. HIO is the first comprehensive, standardized, hierarchical, and logical representation of existing HI knowledge. HIO allows researchers and clinicians alike to readily access standardized HI-related knowledge in a single location and promotes collaborations and HI information sharing, including epidemiological, socio-environmental, biomedical, genetic, and phenotypic information. Furthermore, this ontology illustrates the adaptability of the SCDO framework for use in developing a disease-specific ontology.


Assuntos
Ontologias Biológicas , Perda Auditiva , Pesquisa Biomédica , Comportamento Cooperativo , Humanos , Conhecimento
11.
Brainlesion ; 11383: 239-250, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31482151

RESUMO

Glioblastoma is the most aggressive malignant primary brain tumor with a poor prognosis. Glioblastoma heterogeneous neuroimaging, pathologic, and molecular features provide opportunities for subclassification, prognostication, and the development of targeted therapies. Magnetic resonance imaging has the capability of quantifying specific phenotypic imaging features of these tumors. Additional insight into disease mechanism can be gained by exploring genetics foundations. Here, we use the gene expressions to evaluate the associations with various quantitative imaging phenomic features extracted from magnetic resonance imaging. We highlight a novel correlation by carrying out multi-stage genomewide association tests at the gene-level through a non-parametric correlation framework that allows testing multiple hypotheses about the integrated relationship of imaging phenotype-genotype more efficiently and less expensive computationally. Our result showed several novel genes previously associated with glioblastoma and other types of cancers, as the LRRC46 (chromosome 17), EPGN (chromosome 4) and TUBA1C (chromosome 12), all associated with our radiographic tumor features.

12.
BMC Bioinformatics ; 20(1): 741, 2019 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-31888443

RESUMO

BACKGROUND: Currently, formal mechanisms for bioinformatics support are limited. The H3Africa Bioinformatics Network has implemented a public and freely available Helpdesk (HD), which provides generic bioinformatics support to researchers through an online ticketing platform. The following article reports on the H3ABioNet HD (H3A-HD)'s development, outlining its design, management, usage and evaluation framework, as well as the lessons learned through implementation. RESULTS: The H3A-HD evaluated using automatically generated usage logs, user feedback and qualitative ticket evaluation. Evaluation revealed that communication methods, ticketing strategies and the technical platforms used are some of the primary factors which may influence the effectivity of HD. CONCLUSION: To continuously improve the H3A-HD services, the resource should be regularly monitored and evaluated. The H3A-HD design, implementation and evaluation framework could be easily adapted for use by interested stakeholders within the Bioinformatics community and beyond.


Assuntos
Biologia Computacional/métodos , Interface Usuário-Computador , África , Genômica , Pesquisa
13.
Brief Bioinform ; 20(5): 1709-1724, 2019 09 27.
Artigo em Inglês | MEDLINE | ID: mdl-30010715

RESUMO

Over the past decade, studies of admixed populations have increasingly gained interest in both medical and population genetics. These studies have so far shed light on the patterns of genetic variation throughout modern human evolution and have improved our understanding of the demographics and adaptive processes of human populations. To date, there exist about 20 methods or tools to deconvolve local ancestry. These methods have merits and drawbacks in estimating local ancestry in multiway admixed populations. In this article, we survey existing ancestry deconvolution methods, with special emphasis on multiway admixture, and compare these methods based on simulation results reported by different studies, computational approaches used, including mathematical and statistical models, and biological challenges related to each method. This should orient users on the choice of an appropriate method or tool for given population admixture characteristics and update researchers on current advances, challenges and opportunities behind existing ancestry deconvolution methods.


Assuntos
Evolução Molecular , Genoma Humano , Modelos Genéticos , Humanos
14.
Brief Funct Genomics ; 17(1): 34-41, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28968683

RESUMO

Drug repositioning is the process of finding new therapeutic uses for existing, approved drugs-a process thathas value when considering the exorbitant costs of novel drug development. Several computational strategies exist as a way to predict these alternative applications. In this study, we used datasets on: (1) human biological drug targets and (2) disease-associated genes and, based on a direct functional interaction between them, searched for potential opportunities for drug repositioning. From the set of 1125 unique drug targets and their 88 490 interactions with disease-associated genes, 30 drug targets were analyzed and (3) discussed in detail for the purpose of this article. The current indications of the drugs thattarget them were validated through the interactions, and new opportunities for repositioning were predicted. Among the set of drugs for potential repositioning werebenzodiazepines for the treatment of autism spectrum disorders; nortriptyline for the treatment of melanoma, glioma and other cancers; and vitamin B6 in prevention of spontaneous abortions and cleft palate birth defects. Special emphasis was also placed on those new potential indications that pertained to orphan diseases-these are diseases whose rarity means that development of novel treatment is not financially viable. This computational drug repositioning approach uses existing information on drugs and drug targets, and insights into the genetic basis of disease, as a means to systematically generate the most probable new uses for the drugs on offer, and in this way harness their true therapeutic power.


Assuntos
Doença , Reposicionamento de Medicamentos , Biologia de Sistemas/métodos , Biologia Computacional , Descoberta de Drogas , Genética Populacional , Humanos , Proteínas/metabolismo
15.
Brief Bioinform ; 19(6): 1141-1152, 2018 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-28520909

RESUMO

Populations worldwide currently face several public health challenges, including growing prevalence of infections and the emergence of new pathogenic organisms. The cost and risk associated with drug development make the development of new drugs for several diseases, especially orphan or rare diseases, unappealing to the pharmaceutical industry. Proof of drug safety and efficacy is required before market approval, and rigorous testing makes the drug development process slow, expensive and frequently result in failure. This failure is often because of the use of irrelevant targets identified in the early steps of the drug discovery process, suggesting that target identification and validation are cornerstones for the success of drug discovery and development. Here, we present a large-scale data-driven integrative computational framework to extract essential targets and processes from an existing disease-associated data set and enhance target selection by leveraging drug-target-disease association at the systems level. We applied this framework to tuberculosis and Ebola virus diseases combining heterogeneous data from multiple sources, including protein-protein functional interaction, functional annotation and pharmaceutical data sets. Results obtained demonstrate the effectiveness of the pipeline, leading to the extraction of essential drug targets and to the rational use of existing approved drugs. This provides an opportunity to move toward optimal target-based strategies for screening available drugs and for drug discovery. There is potential for this model to bridge the gap in the production of orphan disease therapies, offering a systematic approach to predict new uses for existing drugs, thereby harnessing their full therapeutic potential.


Assuntos
Conjuntos de Dados como Assunto , Antituberculosos/química , Antituberculosos/farmacologia , Antivirais/química , Antivirais/farmacologia , Desenvolvimento de Medicamentos , Ebolavirus/efeitos dos fármacos , Doença pelo Vírus Ebola/genética , Interações Hospedeiro-Patógeno , Humanos , Anotação de Sequência Molecular , Mycobacterium tuberculosis/efeitos dos fármacos , Reprodutibilidade dos Testes , Tuberculose/genética
16.
Bioinformatics ; 33(19): 2995-3002, 2017 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-28957497

RESUMO

MOTIVATION: Recent technological advances in high-throughput sequencing and genotyping have facilitated an improved understanding of genomic structure and disease-associated genetic factors. In this context, simulation models can play a critical role in revealing various evolutionary and demographic effects on genomic variation, enabling researchers to assess existing and design novel analytical approaches. Although various simulation frameworks have been suggested, they do not account for natural selection in admixture processes. Most are tailored to a single chromosome or a genomic region, very few capture large-scale genomic data, and most are not accessible for genomic communities. RESULTS: Here we develop a multi-scenario genome-wide medical population genetics simulation framework called 'FractalSIM'. FractalSIM has the capability to accurately mimic and generate genome-wide data under various genetic models on genetic diversity, genomic variation affecting diseases and DNA sequence patterns of admixed and/or homogeneous populations. Moreover, the framework accounts for natural selection in both homogeneous and admixture processes. The outputs of FractalSIM have been assessed using popular tools, and the results demonstrated its capability to accurately mimic real scenarios. They can be used to evaluate the performance of a range of genomic tools from ancestry inference to genome-wide association studies. AVAILABILITY AND IMPLEMENTATION: The FractalSIM package is available at http://www.cbio.uct.ac.za/FractalSIM. CONTACT: emile.chimusa@uct.ac.za. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genética Populacional/métodos , Genômica/métodos , Variação Genética , Genoma , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Polimorfismo de Nucleotídeo Único , Seleção Genética , Análise de Sequência de DNA , Software
17.
Glob Heart ; 12(2): 91-98, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28302555

RESUMO

BACKGROUND: Although pockets of bioinformatics excellence have developed in Africa, generally, large-scale genomic data analysis has been limited by the availability of expertise and infrastructure. H3ABioNet, a pan-African bioinformatics network, was established to build capacity specifically to enable H3Africa (Human Heredity and Health in Africa) researchers to analyze their data in Africa. Since the inception of the H3Africa initiative, H3ABioNet's role has evolved in response to changing needs from the consortium and the African bioinformatics community. OBJECTIVES: H3ABioNet set out to develop core bioinformatics infrastructure and capacity for genomics research in various aspects of data collection, transfer, storage, and analysis. METHODS AND RESULTS: Various resources have been developed to address genomic data management and analysis needs of H3Africa researchers and other scientific communities on the continent. NetMap was developed and used to build an accurate picture of network performance within Africa and between Africa and the rest of the world, and Globus Online has been rolled out to facilitate data transfer. A participant recruitment database was developed to monitor participant enrollment, and data is being harmonized through the use of ontologies and controlled vocabularies. The standardized metadata will be integrated to provide a search facility for H3Africa data and biospecimens. Because H3Africa projects are generating large-scale genomic data, facilities for analysis and interpretation are critical. H3ABioNet is implementing several data analysis platforms that provide a large range of bioinformatics tools or workflows, such as Galaxy, the Job Management System, and eBiokits. A set of reproducible, portable, and cloud-scalable pipelines to support the multiple H3Africa data types are also being developed and dockerized to enable execution on multiple computing infrastructures. In addition, new tools have been developed for analysis of the uniquely divergent African data and for downstream interpretation of prioritized variants. To provide support for these and other bioinformatics queries, an online bioinformatics helpdesk backed by broad consortium expertise has been established. Further support is provided by means of various modes of bioinformatics training. CONCLUSIONS: For the past 4 years, the development of infrastructure support and human capacity through H3ABioNet, have significantly contributed to the establishment of African scientific networks, data analysis facilities, and training programs. Here, we describe the infrastructure and how it has affected genomics and bioinformatics research in Africa.


Assuntos
Pesquisa Biomédica/métodos , Biologia Computacional/tendências , Genômica/métodos , África , Humanos
18.
Brief Bioinform ; 18(5): 886-901, 2017 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-27473066

RESUMO

Gene Ontology (GO) semantic similarity tools enable retrieval of semantic similarity scores, which incorporate biological knowledge embedded in the GO structure for comparing or classifying different proteins or list of proteins based on their GO annotations. This facilitates a better understanding of biological phenomena underlying the corresponding experiment and enables the identification of processes pertinent to different biological conditions. Currently, about 14 tools are available, which may play an important role in improving protein analyses at the functional level using different GO semantic similarity measures. Here we survey these tools to provide a comprehensive view of the challenges and advances made in this area to avoid redundant effort in developing features that already exist, or implementing ideas already proven to be obsolete in the context of GO. This helps researchers, tool developers, as well as end users, understand the underlying semantic similarity measures implemented through knowledge of pertinent features of, and issues related to, a particular tool. This should empower users to make appropriate choices for their biological applications and ensure effective knowledge discovery based on GO annotations.


Assuntos
Ontologia Genética , Humanos , Anotação de Sequência Molecular , Semântica , Inquéritos e Questionários
19.
Front Genet ; 7: 39, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27066064

RESUMO

The advance in high-throughput sequencing technologies has yielded complete genome sequences of several organisms, including complete bacterial genomes. The growing number of these available sequenced genomes has enabled analyses of their dynamics, as well as the molecular and evolutionary processes which these organisms are under. Comparative genomics of different bacterial genomes have highlighted their genome size and gene content in association with lifestyles and adaptation to various environments and have contributed to enhancing our understanding of the mechanisms of their evolution. Protein-protein functional interactions mediate many essential processes for maintaining the stability of the biological systems under changing environmental conditions. Thus, these interactions play crucial roles in the evolutionary processes of different organisms, especially for obligate intracellular bacteria, proven to generally have reduced genome sizes compared to their nearest free-living relatives. In this study, we used the approach based on the Renormalization Group (RG) analysis technique and the Maximum-Excluded-Mass-Burning (MEMB) model to investigate the evolutionary process of genome reduction in relation to the organization of functional networks of two organisms. Using a Mycobacterium leprae (MLP) network in comparison with a Mycobacterium tuberculosis (MTB) network as a case study, we show that reductive evolution in MLP was as a result of removal of important proteins from neighbors of corresponding orthologous MTB proteins. While each orthologous MTB protein had an increase in number of interacting partners in most instances, the corresponding MLP protein had lost some of them. This work provides a quantitative model for mapping reductive evolution and protein-protein functional interaction network organization in terms of roles played by different proteins in the network structure.

20.
PLoS Comput Biol ; 12(2): e1004395, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26845152

RESUMO

Bioinformatics is now a critical skill in many research and commercial environments as biological data are increasing in both size and complexity. South African researchers recognized this need in the mid-1990s and responded by working with the government as well as international bodies to develop initiatives to build bioinformatics capacity in the country. Significant injections of support from these bodies provided a springboard for the establishment of computational biology units at multiple universities throughout the country, which took on teaching, basic research and support roles. Several challenges were encountered, for example with unreliability of funding, lack of skills, and lack of infrastructure. However, the bioinformatics community worked together to overcome these, and South Africa is now arguably the leading country in bioinformatics on the African continent. Here we discuss how the discipline developed in the country, highlighting the challenges, successes, and lessons learnt.


Assuntos
Biologia Computacional , Biotecnologia , Biologia Computacional/educação , Biologia Computacional/história , Biologia Computacional/organização & administração , História do Século XX , História do Século XXI , Humanos , África do Sul
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...