Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
PLoS One ; 18(8): e0288023, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37556452

RESUMO

Computational prediction of absolute essential genes using machine learning has gained wide attention in recent years. However, essential genes are mostly conditional and not absolute. Experimental techniques provide a reliable approach of identifying conditionally essential genes; however, experimental methods are laborious, time and resource consuming, hence computational techniques have been used to complement the experimental methods. Computational techniques such as supervised machine learning, or flux balance analysis are grossly limited due to the unavailability of required data for training the model or simulating the conditions for gene essentiality. This study developed a heuristic-enabled active machine learning method based on a light gradient boosting model to predict essential immune response and embryonic developmental genes in Drosophila melanogaster. We proposed a new sampling selection technique and introduced a heuristic function which replaces the human component in traditional active learning models. The heuristic function dynamically selects the unlabelled samples to improve the performance of the classifier in the next iteration. Testing the proposed model with four benchmark datasets, the proposed model showed superior performance when compared to traditional active learning models (random sampling and uncertainty sampling). Applying the model to identify conditionally essential genes, four novel essential immune response genes and a list of 48 novel genes that are essential in embryonic developmental condition were identified. We performed functional enrichment analysis of the predicted genes to elucidate their biological processes and the result evidence our predictions. Immune response and embryonic development related processes were significantly enriched in the essential immune response and embryonic developmental genes, respectively. Finally, we propose the predicted essential genes for future experimental studies and use of the developed tool accessible at http://heal.covenantuniversity.edu.ng for conditional essentiality predictions.


Assuntos
Drosophila melanogaster , Heurística , Animais , Humanos , Drosophila melanogaster/genética , Aprendizado de Máquina Supervisionado , Aprendizado de Máquina , Genes Essenciais
4.
NAR Genom Bioinform ; 3(4): lqab110, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34859210

RESUMO

Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality information is obtained from studying single cells or whole, multi-cellular organisms, and particularly when derived from human cell line screens and human population studies. We employed machine learning across six model eukaryotes and 60 381 genes, using 41 635 features derived from the sequence, gene function information and network topology. Within a leave-one-organism-out cross-validation, the classifiers showed high generalizability with an average accuracy close to 80% in the left-out species. As a case study, we applied the method to Tribolium castaneum and Bombyx mori and validated predictions experimentally yielding similar performances. Finally, using the classifier based on the studied model organisms enabled linking the essentiality information of human cell line screens and population studies.

5.
Comput Struct Biotechnol J ; 19: 4581-4592, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34471501

RESUMO

Pathogens causing infections, and particularly when invading the host cells, require the host cell machinery for efficient regeneration and proliferation during infection. For their life cycle, host proteins are needed and these Host Dependency Factors (HDF) may serve as therapeutic targets. Several attempts have approached screening for HDF producing large lists of potential HDF with, however, only marginal overlap. To get consistency into the data of these experimental studies, we developed a machine learning pipeline. As a case study, we used publicly available lists of experimentally derived HDF from twelve different screening studies based on gene perturbation in Drosophila melanogaster cells or in vivo upon bacterial or protozoan infection. A total of 50,334 gene features were generated from diverse categories including their functional annotations, topology attributes in protein interaction networks, nucleotide and protein sequence features, homology properties and subcellular localization. Cross-validation revealed an excellent prediction performance. All feature categories contributed to the model. Predicted and experimentally derived HDF showed a good consistency when investigating their common cellular processes and function. Cellular processes and molecular function of these genes were highly enriched in membrane trafficking, particularly in the trans-Golgi network, cell cycle and the Rab GTPase binding family. Using our machine learning approach, we show that HDF in organisms can be predicted with high accuracy evidencing their common investigated characteristics. We elucidated cellular processes which are utilized by invading pathogens during infection. Finally, we provide a list of 208 novel HDF proposed for future experimental studies.

6.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33842944

RESUMO

Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes' biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions. SHORT ABSTRACT: Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets' discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.


Assuntos
Algoritmos , Biologia Computacional/métodos , Genes Essenciais/genética , Aprendizado de Máquina , Máquina de Vetores de Suporte , Animais , Caenorhabditis elegans/genética , Ontologia Genética , Redes Reguladoras de Genes , Humanos
7.
Comput Struct Biotechnol J ; 18: 612-621, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32257045

RESUMO

Genes are termed to be essential if their loss of function compromises viability or results in profound loss of fitness. On the genome scale, these genes can be determined experimentally employing RNAi or knockout screens, but this is very resource intensive. Computational methods for essential gene prediction can overcome this drawback, particularly when intrinsic (e.g. from the protein sequence) as well as extrinsic features (e.g. from transcription profiles) are considered. In this work, we employed machine learning to predict essential genes in Drosophila melanogaster. A total of 27,340 features were generated based on a large variety of different aspects comprising nucleotide and protein sequences, gene networks, protein-protein interactions, evolutionary conservation and functional annotations. Employing cross-validation, we obtained an excellent prediction performance. The best model achieved in D. melanogaster a ROC-AUC of 0.90, a PR-AUC of 0.30 and a F1 score of 0.34. Our approach considerably outperformed a benchmark method in which only features derived from the protein sequences were used (P < 0.001). Investigating which features contributed to this success, we found all categories of features, most prominently network topological, functional and sequence-based features. To evaluate our approach we performed the same workflow for essential gene prediction in human and achieved an ROC-AUC = 0.97, PR-AUC = 0.73, and F1 = 0.64. In summary, this study shows that using our well-elaborated assembly of features covering a broad range of intrinsic and extrinsic gene and protein features enabled intelligent systems to predict well the essentiality of genes in an organism.

8.
Int J Genomics ; 2019: 1750291, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31662957

RESUMO

Plasmodium falciparum, a malaria pathogen, has shown substantial resistance to treatment coupled with poor response to some vaccines thereby requiring urgent, holistic, and broad approach to prevent this endemic disease. Understanding the biology of the malaria parasite has been identified as a vital approach to overcome the threat of malaria. This study is aimed at identifying essential proteins unique to malaria parasites using a reconstructed iPfa genome-scale metabolic model (GEM) of the 3D7 strain of Plasmodium falciparum by filling gaps in the model with nineteen (19) metabolites and twenty-three (23) reactions obtained from the MetaCyc database. Twenty (20) currency metabolites were removed from the network because they have been identified to produce shortcuts that are biologically infeasible. The resulting modified iPfa GEM was a model using the k-shortest path algorithm to identify possible alternative metabolic pathways in glycolysis and pentose phosphate pathways of Plasmodium falciparum. Heuristic function was introduced for the optimal performance of the algorithm. To validate the prediction, the essentiality of the reactions in the reconstructed network was evaluated using betweenness centrality measure, which was applied to every reaction within the pathways considered in this study. Thirty-two (32) essential reactions were predicted among which our method validated fourteen (14) enzymes already predicted in the literature. The enzymatic proteins that catalyze these essential reactions were checked for homology with the host genome, and two (2) showed insignificant similarity, making them possible drug targets. In conclusion, the application of the intelligent search technique to the metabolic network of P. falciparum predicts potential biologically relevant alternative pathways using graph theory-based approach.

9.
Biomed Res Int ; 2018: 8985718, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29789805

RESUMO

Malaria is an infectious disease that affects close to half a million individuals every year and Plasmodium falciparum is a major cause of malaria. The treatment of this disease could be done effectively if the essential enzymes of this parasite are specifically targeted. Nevertheless, the development of the parasite in resisting existing drugs now makes discovering new drugs a core responsibility. In this study, a novel computational model that makes the prediction of new and validated antimalarial drug target cheaper, easier, and faster has been developed. We have identified new essential reactions as potential targets for drugs in the metabolic network of the parasite. Among the top seven (7) predicted essential reactions, four (4) have been previously identified in earlier studies with biological evidence and one (1) has been with computational evidence. The results from our study were compared with an extensive list of seventy-seven (77) essential reactions with biological evidence from a previous study. We present a list of thirty-one (31) potential candidates for drug targets in Plasmodium falciparum which includes twenty-four (24) new potential candidates for drug targets.


Assuntos
Antimaláricos/farmacocinética , Descoberta de Drogas/métodos , Malária Falciparum , Metaboloma , Modelos Biológicos , Plasmodium falciparum/metabolismo , Antimaláricos/uso terapêutico , Humanos , Malária Falciparum/tratamento farmacológico , Malária Falciparum/metabolismo
10.
Bioinform Biol Insights ; 10: 237-253, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27932867

RESUMO

Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...