Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
1.
Comput Biol Med ; 173: 108339, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38547658

RESUMO

The application of Artificial Intelligence (AI) to screen drug molecules with potential therapeutic effects has revolutionized the drug discovery process, with significantly lower economic cost and time consumption than the traditional drug discovery pipeline. With the great power of AI, it is possible to rapidly search the vast chemical space for potential drug-target interactions (DTIs) between candidate drug molecules and disease protein targets. However, only a small proportion of molecules have labelled DTIs, consequently limiting the performance of AI-based drug screening. To solve this problem, a machine learning-based approach with great ability to generalize DTI prediction across molecules is desirable. Many existing machine learning approaches for DTI identification failed to exploit the full information with respect to the topological structures of candidate molecules. To develop a better approach for DTI prediction, we propose GraphormerDTI, which employs the powerful Graph Transformer neural network to model molecular structures. GraphormerDTI embeds molecular graphs into vector-format representations through iterative Transformer-based message passing, which encodes molecules' structural characteristics by node centrality encoding, node spatial encoding and edge encoding. With a strong structural inductive bias, the proposed GraphormerDTI approach can effectively infer informative representations for out-of-sample molecules and as such, it is capable of predicting DTIs across molecules with an exceptional performance. GraphormerDTI integrates the Graph Transformer neural network with a 1-dimensional Convolutional Neural Network (1D-CNN) to extract the drugs' and target proteins' representations and leverages an attention mechanism to model the interactions between them. To examine GraphormerDTI's performance for DTI prediction, we conduct experiments on three benchmark datasets, where GraphormerDTI achieves a superior performance than five state-of-the-art baselines for out-of-molecule DTI prediction, including GNN-CPI, GNN-PT, DeepEmbedding-DTI, MolTrans and HyperAttentionDTI, and is on a par with the best baseline for transductive DTI prediction. The source codes and datasets are publicly accessible at https://github.com/mengmeng34/GraphormerDTI.


Assuntos
Inteligência Artificial , Descoberta de Drogas , Avaliação Pré-Clínica de Medicamentos , Redes Neurais de Computação , Benchmarking
2.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37874948

RESUMO

Proteases contribute to a broad spectrum of cellular functions. Given a relatively limited amount of experimental data, developing accurate sequence-based predictors of substrate cleavage sites facilitates a better understanding of protease functions and substrate specificity. While many protease-specific predictors of substrate cleavage sites were developed, these efforts are outpaced by the growth of the protease substrate cleavage data. In particular, since data for 100+ protease types are available and this number continues to grow, it becomes impractical to publish predictors for new protease types, and instead it might be better to provide a computational platform that helps users to quickly and efficiently build predictors that address their specific needs. To this end, we conceptualized, developed, tested and released a versatile bioinformatics platform, ProsperousPlus, that empowers users, even those with no programming or little bioinformatics background, to build fast and accurate predictors of substrate cleavage sites. ProsperousPlus facilitates the use of the rapidly accumulating substrate cleavage data to train, empirically assess and deploy predictive models for user-selected substrate types. Benchmarking tests on test datasets show that our platform produces predictors that on average exceed the predictive performance of current state-of-the-art approaches. ProsperousPlus is available as a webserver and a stand-alone software package at http://prosperousplus.unimelb-biotools.cloud.edu.au/.


Assuntos
Aprendizado de Máquina , Peptídeo Hidrolases , Peptídeo Hidrolases/metabolismo , Especificidade por Substrato , Algoritmos
3.
J Biomed Inform ; 147: 104509, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37827477

RESUMO

The adoption of electronic health records (EHRs) has created opportunities to analyse historical data for predicting clinical outcomes and improving patient care. However, non-standardised data representations and anomalies pose major challenges to the use of EHRs in digital health research. To address these challenges, we have developed EHR-QC, a tool comprising two modules: the data standardisation module and the preprocessing module. The data standardisation module migrates source EHR data to a standard format using advanced concept mapping techniques, surpassing expert curation in benchmarking analysis. The preprocessing module includes several functions designed specifically to handle healthcare data subtleties. We provide automated detection of data anomalies and solutions to handle those anomalies. We believe that the development and adoption of tools like EHR-QC is critical for advancing digital health. Our ultimate goal is to accelerate clinical research by enabling rapid experimentation with data-driven observational research to generate robust, generalisable biomedical knowledge.


Assuntos
Benchmarking , Registros Eletrônicos de Saúde , Humanos , Pesquisa Empírica , Projetos de Pesquisa
4.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37291763

RESUMO

BACKGROUND: Promoters are DNA regions that initiate the transcription of specific genes near the transcription start sites. In bacteria, promoters are recognized by RNA polymerases and associated sigma factors. Effective promoter recognition is essential for synthesizing the gene-encoded products by bacteria to grow and adapt to different environmental conditions. A variety of machine learning-based predictors for bacterial promoters have been developed; however, most of them were designed specifically for a particular species. To date, only a few predictors are available for identifying general bacterial promoters with limited predictive performance. RESULTS: In this study, we developed TIMER, a Siamese neural network-based approach for identifying both general and species-specific bacterial promoters. Specifically, TIMER uses DNA sequences as the input and employs three Siamese neural networks with the attention layers to train and optimize the models for a total of 13 species-specific and general bacterial promoters. Extensive 10-fold cross-validation and independent tests demonstrated that TIMER achieves a competitive performance and outperforms several existing methods on both general and species-specific promoter prediction. As an implementation of the proposed method, the web server of TIMER is publicly accessible at http://web.unimelb-bioinfortools.cloud.edu.au/TIMER/.


Assuntos
Bactérias , Redes Neurais de Computação , Bactérias/genética , Bactérias/metabolismo , RNA Polimerases Dirigidas por DNA/genética , RNA Polimerases Dirigidas por DNA/metabolismo , Sequência de Bases , Regiões Promotoras Genéticas
5.
Br J Pharmacol ; 2023 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-37161878

RESUMO

The application of artificial intelligence (AI) approaches to drug discovery for G protein-coupled receptors (GPCRs) is a rapidly expanding area. Artificial intelligence can be used at multiple stages during the drug discovery process, from aiding our understanding of the fundamental actions of GPCRs to the discovery of new ligand-GPCR interactions or the prediction of clinical responses. Here, we provide an overview of the concepts behind artificial intelligence, including the subfields of machine learning and deep learning. We summarise the published applications of artificial intelligence to different stages of the GPCR drug discovery process. Finally, we reflect on the benefits and limitations of artificial intelligence and share our vision for the exciting potential for further development of applications to aid GPCR drug discovery. In addition to making the drug discovery process "faster, smarter and cheaper," we anticipate that the application of artificial intelligence will create exciting new opportunities for GPCR drug discovery.

6.
Bioinformatics ; 39(3)2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36794913

RESUMO

MOTIVATION: The rapid accumulation of high-throughput sequence data demands the development of effective and efficient data-driven computational methods to functionally annotate proteins. However, most current approaches used for functional annotation simply focus on the use of protein-level information but ignore inter-relationships among annotations. RESULTS: Here, we established PFresGO, an attention-based deep-learning approach that incorporates hierarchical structures in Gene Ontology (GO) graphs and advances in natural language processing algorithms for the functional annotation of proteins. PFresGO employs a self-attention operation to capture the inter-relationships of GO terms, updates its embedding accordingly and uses a cross-attention operation to project protein representations and GO embedding into a common latent space to identify global protein sequence patterns and local functional residues. We demonstrate that PFresGO consistently achieves superior performance across GO categories when compared with 'state-of-the-art' methods. Importantly, we show that PFresGO can identify functionally important residues in protein sequences by assessing the distribution of attention weightings. PFresGO should serve as an effective tool for the accurate functional annotation of proteins and functional domains within proteins. AVAILABILITY AND IMPLEMENTATION: PFresGO is available for academic purposes at https://github.com/BioColLab/PFresGO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Anotação de Sequência Molecular , Ontologia Genética , Biologia Computacional/métodos , Algoritmos , Proteínas/metabolismo
7.
Br J Clin Pharmacol ; 89(2): 914-920, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36301837

RESUMO

The COVID-19 pandemic has disrupted seeking and delivery of healthcare. Different Australian jurisdictions implemented different COVID-19 restrictions. We used Australian national pharmacy dispensing data to conduct interrupted time series analyses to examine the incidence and prevalence of opioid dispensing in different jurisdictions. Following nationwide COVID-19 restrictions, the incidence dropped by -0.40 (95% confidence interval [CI]: -0.50, -0.31), -0.33 (95% CI: -0.46, -0.21) and -0.21 (95% CI: -0.37, -0.04) per 1000 people per week and the prevalence dropped by -0.85 (95% CI: -1.39, -0.31), -0.54 (95% CI: -1.01, -0.07) and -0.62 (95% CI: -0.99, -0.25) per 1000 people per week in Victoria, New South Wales and other jurisdictions, respectively. Incidence and prevalence increased by 0.29 (95% CI: 0.13, 0.44) and 0.72 (95% CI: 0.11, 1.33) per 1000 people per week, respectively in Victoria post-lockdown; no significant changes were observed in other jurisdictions. No significant changes were observed in the initiation of long-term opioid use in any jurisdictions. More stringent restrictions coincided with more pronounced reductions in overall opioid initiation, but initiation of long-term opioid use did not change.


Assuntos
COVID-19 , Transtornos Relacionados ao Uso de Opioides , Humanos , Analgésicos Opioides/uso terapêutico , Austrália/epidemiologia , Prevalência , Incidência , Pandemias , COVID-19/epidemiologia , Controle de Doenças Transmissíveis , Transtornos Relacionados ao Uso de Opioides/epidemiologia , Transtornos Relacionados ao Uso de Opioides/prevenção & controle , Transtornos Relacionados ao Uso de Opioides/tratamento farmacológico , Prescrições de Medicamentos
8.
Curr Probl Cardiol ; 48(4): 101576, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36586705

RESUMO

COVID-19 restrictions may have an unintended consequence of limiting access to cardiovascular care. Australia implemented adaptive interventions (eg, telehealth consultations, digital image prescriptions, continued dispensing, medication delivery) to maintain medication access. This study investigated whether COVID-19 restrictions in different jurisdictions coincided with changes in statin incidence, prevalence and adherence. Analysis of a 10% random sample of national medication claims data from January 2018 to December 2020 was conducted across 3 Australian jurisdictions. Weekly incidence and prevalence were estimated by dividing the number statin initiations and any statin dispensing by the Australian population aged 18-99 years. Statin adherence was analyzed across the jurisdictions and years, with adherence categorized as <40%, 40%-79% and ≥80% based on dispensing per calendar year. Overall, 309,123, 315,703 and 324,906 people were dispensed and 39,029, 39,816, and 44,979 initiated statins in 2018, 2019, and 2020 respectively. Two waves of COVID-19 restrictions in 2020 coincided with no meaningful change in statin incidence or prevalence per week when compared to 2018 and 2019. Incidence increased 0.3% from 23.7 to 26.2 per 1000 people across jurisdictions in 2020 compared to 2019. Prevalence increased 0.14% from 158.5 to 159.9 per 1000 people across jurisdictions in 2020 compared to 2019. The proportion of adults with ≥80% adherence increased by 3.3% in Victoria, 1.4% in NSW and 1.8% in other states and territories between 2019 and 2020. COVID-19 restrictions did not coincide with meaningful changes in the incidence, prevalence or adherence to statins suggesting adaptive interventions succeeded in maintaining access to cardiovascular medications.


Assuntos
COVID-19 , Inibidores de Hidroximetilglutaril-CoA Redutases , Adulto , Humanos , Inibidores de Hidroximetilglutaril-CoA Redutases/uso terapêutico , Incidência , Prevalência , Austrália
9.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36341591

RESUMO

Subcellular localization of messenger RNAs (mRNAs) plays a key role in the spatial regulation of gene activity. The functions of mRNAs have been shown to be closely linked with their localizations. As such, understanding of the subcellular localizations of mRNAs can help elucidate gene regulatory networks. Despite several computational methods that have been developed to predict mRNA localizations within cells, there is still much room for improvement in predictive performance, especially for the multiple-location prediction. In this study, we proposed a novel multi-label multi-class predictor, termed Clarion, for mRNA subcellular localization prediction. Clarion was developed based on a manually curated benchmark dataset and leveraged the weighted series method for multi-label transformation. Extensive benchmarking tests demonstrated Clarion achieved competitive predictive performance and the weighted series method plays a crucial role in securing superior performance of Clarion. In addition, the independent test results indicate that Clarion outperformed the state-of-the-art methods and can secure accuracy of 81.47, 91.29, 79.77, 92.10, 89.15, 83.74, 80.74, 79.23 and 84.74% for chromatin, cytoplasm, cytosol, exosome, membrane, nucleolus, nucleoplasm, nucleus and ribosome, respectively. The webserver and local stand-alone tool of Clarion is freely available at http://monash.bioweb.cloud.edu.au/Clarion/.


Assuntos
Núcleo Celular , Proteínas , RNA Mensageiro/genética , Núcleo Celular/genética , Biologia Computacional/métodos , Bases de Dados de Proteínas
10.
J Chem Inf Model ; 62(17): 4270-4282, 2022 09 12.
Artigo em Inglês | MEDLINE | ID: mdl-35973091

RESUMO

An essential step in engineering proteins and understanding disease-causing missense mutations is to accurately model protein stability changes when such mutations occur. Here, we developed a new sequence-based predictor for the protein stability (PROST) change (Gibb's free energy change, ΔΔG) upon a single-point missense mutation. PROST extracts multiple descriptors from the most promising sequence-based predictors, such as BoostDDG, SAAFEC-SEQ, and DDGun. RPOST also extracts descriptors from iFeature and AlphaFold2. The extracted descriptors include sequence-based features, physicochemical properties, evolutionary information, evolutionary-based physicochemical properties, and predicted structural features. The PROST predictor is a weighted average ensemble model based on extreme gradient boosting (XGBoost) decision trees and an extra-trees regressor; PROST is trained on both direct and hypothetical reverse mutations using the S5294 (S2647 direct mutations + S2647 inverse mutations). The parameters for the PROST model are optimized using grid searching with 5-fold cross-validation, and feature importance analysis unveils the most relevant features. The performance of PROST is evaluated in a blinded manner, employing nine distinct data sets and existing state-of-the-art sequence-based and structure-based predictors. This method consistently performs well on frataxin, S217, S349, Ssym, S669, Myoglobin, and CAGI5 data sets in blind tests and similarly to the state-of-the-art predictors for p53 and S276 data sets. When the performance of PROST is compared with the latest predictors such as BoostDDG, SAAFEC-SEQ, ACDC-NN-seq, and DDGun, PROST dominates these predictors. A case study of mutation scanning of the frataxin protein for nine wild-type residues demonstrates the utility of PROST. Taken together, these findings indicate that PROST is a well-suited predictor when no protein structural information is available. The source code of PROST, data sets, examples, and pretrained models along with how to use PROST are available at https://github.com/ShahidIqb/PROST and https://prost.erc.monash.edu/seq.


Assuntos
Mutação de Sentido Incorreto , Transferência Intratubária do Zigoto , Estabilidade Proteica , Proteínas/química , Software
11.
Bioinformatics ; 38(17): 4206-4213, 2022 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-35801909

RESUMO

MOTIVATION: The molecular subtyping of gastric cancer (adenocarcinoma) into four main subtypes based on integrated multiomics profiles, as proposed by The Cancer Genome Atlas (TCGA) initiative, represents an effective strategy for patient stratification. However, this approach requires the use of multiple technological platforms, and is quite expensive and time-consuming to perform. A computational approach that uses histopathological image data to infer molecular subtypes could be a practical, cost- and time-efficient complementary tool for prognostic and clinical management purposes. RESULTS: Here, we propose a deep learning ensemble approach (called DEMoS) capable of predicting the four recognized molecular subtypes of gastric cancer directly from histopathological images. DEMoS achieved tile-level area under the receiver-operating characteristic curve (AUROC) values of 0.785, 0.668, 0.762 and 0.811 for the prediction of these four subtypes of gastric cancer [i.e. (i) Epstein-Barr (EBV)-infected, (ii) microsatellite instability (MSI), (iii) genomically stable (GS) and (iv) chromosomally unstable tumors (CIN)] using an independent test dataset, respectively. At the patient-level, it achieved AUROC values of 0.897, 0.764, 0.890 and 0.898, respectively. Thus, these four subtypes are well-predicted by DEMoS. Benchmarking experiments further suggest that DEMoS is able to achieve an improved classification performance for image-based subtyping and prevent model overfitting. This study highlights the feasibility of using a deep learning ensemble-based method to rapidly and reliably subtype gastric cancer (adenocarcinoma) solely using features from histopathological images. AVAILABILITY AND IMPLEMENTATION: All whole slide images used in this study was collected from the TCGA database. This study builds upon our previously published HEAL framework, with related documentation and tutorials available at http://heal.erc.monash.edu.au. The source code and related models are freely accessible at https://github.com/Docurdt/DEMoS.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Adenocarcinoma , Aprendizado Profundo , Neoplasias Gástricas , Humanos , Neoplasias Gástricas/diagnóstico por imagem , Neoplasias Gástricas/genética , Adenocarcinoma/diagnóstico por imagem , Adenocarcinoma/genética , Instabilidade de Microssatélites
12.
Methods Mol Biol ; 2499: 205-219, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35696083

RESUMO

Among various types of protein post-translational modifications (PTMs), lysine PTMs play an important role in regulating a wide range of functions and biological processes. Due to the generation and accumulation of enormous amount of protein sequence data by ongoing whole-genome sequencing projects, systematic identification of different types of lysine PTM substrates and their specific PTM sites in the entire proteome is increasingly important and has therefore received much attention. Accordingly, a variety of computational methods for lysine PTM identification have been developed based on the combination of various handcrafted sequence features and machine-learning techniques. In this chapter, we first briefly review existing computational methods for lysine PTM identification and then introduce a recently developed deep learning-based method, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs). Specifically, MUscADEL employs bidirectional long short-term memory (BiLSTM) recurrent neural networks and is capable of predicting eight major types of lysine PTMs in both the human and mouse proteomes. The web server of MUscADEL is publicly available at http://muscadel.erc.monash.edu/ for the research community to use.


Assuntos
Lisina , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos , Animais , Lisina/metabolismo , Aprendizado de Máquina , Camundongos , Proteoma/metabolismo
13.
NPJ Precis Oncol ; 6(1): 45, 2022 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-35739342

RESUMO

Gastric cancer is one of the deadliest cancers worldwide. An accurate prognosis is essential for effective clinical assessment and treatment. Spatial patterns in the tumor microenvironment (TME) are conceptually indicative of the staging and progression of gastric cancer patients. Using spatial patterns of the TME by integrating and transforming the multiplexed immunohistochemistry (mIHC) images as Cell-Graphs, we propose a graph neural network-based approach, termed Cell-Graph Signature or CGSignature, powered by artificial intelligence, for the digital staging of TME and precise prediction of patient survival in gastric cancer. In this study, patient survival prediction is formulated as either a binary (short-term and long-term) or ternary (short-term, medium-term, and long-term) classification task. Extensive benchmarking experiments demonstrate that the CGSignature achieves outstanding model performance, with Area Under the Receiver Operating Characteristic curve of 0.960 ± 0.01, and 0.771 ± 0.024 to 0.904 ± 0.012 for the binary- and ternary-classification, respectively. Moreover, Kaplan-Meier survival analysis indicates that the "digital grade" cancer staging produced by CGSignature provides a remarkable capability in discriminating both binary and ternary classes with statistical significance (P value < 0.0001), significantly outperforming the AJCC 8th edition Tumor Node Metastasis staging system. Using Cell-Graphs extracted from mIHC images, CGSignature improves the assessment of the link between the TME spatial patterns and patient prognosis. Our study suggests the feasibility and benefits of such an artificial intelligence-powered digital staging system in diagnostic pathology and precision oncology.

14.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35176756

RESUMO

Protein secretion has a pivotal role in many biological processes and is particularly important for intercellular communication, from the cytoplasm to the host or external environment. Gram-positive bacteria can secrete proteins through multiple secretion pathways. The non-classical secretion pathway has recently received increasing attention among these secretion pathways, but its exact mechanism remains unclear. Non-classical secreted proteins (NCSPs) are a class of secreted proteins lacking signal peptides and motifs. Several NCSP predictors have been proposed to identify NCSPs and most of them employed the whole amino acid sequence of NCSPs to construct the model. However, the sequence length of different proteins varies greatly. In addition, not all regions of the protein are equally important and some local regions are not relevant to the secretion. The functional regions of the protein, particularly in the N- and C-terminal regions, contain important determinants for secretion. In this study, we propose a new hybrid deep learning-based framework, referred to as ASPIRER, which improves the prediction of NCSPs from amino acid sequences. More specifically, it combines a whole sequence-based XGBoost model and an N-terminal sequence-based convolutional neural network model; 5-fold cross-validation and independent tests demonstrate that ASPIRER achieves superior performance than existing state-of-the-art approaches. The source code and curated datasets of ASPIRER are publicly available at https://github.com/yanwu20/ASPIRER/. ASPIRER is anticipated to be a useful tool for improved prediction of novel putative NCSPs from sequences information and prioritization of candidate proteins for follow-up experimental validation.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Biologia Computacional , Redes Neurais de Computação , Proteínas/química , Software
15.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35021193

RESUMO

Promoters are crucial regulatory DNA regions for gene transcriptional activation. Rapid advances in next-generation sequencing technologies have accelerated the accumulation of genome sequences, providing increased training data to inform computational approaches for both prokaryotic and eukaryotic promoter prediction. However, it remains a significant challenge to accurately identify species-specific promoter sequences using computational approaches. To advance computational support for promoter prediction, in this study, we curated 58 comprehensive, up-to-date, benchmark datasets for 7 different species (i.e. Escherichia coli, Bacillus subtilis, Homo sapiens, Mus musculus, Arabidopsis thaliana, Zea mays and Drosophila melanogaster) to assist the research community to assess the relative functionality of alternative approaches and support future research on both prokaryotic and eukaryotic promoters. We revisited 106 predictors published since 2000 for promoter identification (40 for prokaryotic promoter, 61 for eukaryotic promoter, and 5 for both). We systematically evaluated their training datasets, computational methodologies, calculated features, performance and software usability. On the basis of these benchmark datasets, we benchmarked 19 predictors with functioning webservers/local tools and assessed their prediction performance. We found that deep learning and traditional machine learning-based approaches generally outperformed scoring function-based approaches. Taken together, the curated benchmark dataset repository and the benchmarking analysis in this study serve to inform the design and implementation of computational approaches for promoter prediction and facilitate more rigorous comparison of new techniques in the future.


Assuntos
Drosophila melanogaster , Eucariotos , Animais , Biologia Computacional/métodos , Drosophila melanogaster/genética , Células Eucarióticas , Camundongos , Células Procarióticas , Regiões Promotoras Genéticas
16.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34729589

RESUMO

Conventional supervised binary classification algorithms have been widely applied to address significant research questions using biological and biomedical data. This classification scheme requires two fully labeled classes of data (e.g. positive and negative samples) to train a classification model. However, in many bioinformatics applications, labeling data is laborious, and the negative samples might be potentially mislabeled due to the limited sensitivity of the experimental equipment. The positive unlabeled (PU) learning scheme was therefore proposed to enable the classifier to learn directly from limited positive samples and a large number of unlabeled samples (i.e. a mixture of positive or negative samples). To date, several PU learning algorithms have been developed to address various biological questions, such as sequence identification, functional site characterization and interaction prediction. In this paper, we revisit a collection of 29 state-of-the-art PU learning bioinformatic applications to address various biological questions. Various important aspects are extensively discussed, including PU learning methodology, biological application, classifier design and evaluation strategy. We also comment on the existing issues of PU learning and offer our perspectives for the future development of PU learning applications. We anticipate that our work serves as an instrumental guideline for a better understanding of the PU learning framework in bioinformatics and further developing next-generation PU learning frameworks for critical biological applications.


Assuntos
Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Aprendizado de Máquina Supervisionado
17.
Bioinformatics ; 37(21): 3986-3988, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34061168

RESUMO

MOTIVATION: Tumor tile selection is a necessary prerequisite in patch-based cancer whole slide image analysis, which is labor-intensive and requires expertise. Whole slides are annotated as tumor or tumor free, but tiles within a tumor slide are not. As all tiles within a tumor free slide are tumor free, these can be used to capture tumor-free patterns using the one-class learning strategy. RESULTS: We present a Python package, termed OCTID, which combines a pretrained convolutional neural network (CNN) model, Uniform Manifold Approximation and Projection (UMAP) and one-class support vector machine to achieve accurate tumor tile classification using a training set of tumor free tiles. Benchmarking experiments on four H&E image datasets achieved remarkable performance in terms of F1-score (0.90 ± 0.06), Matthews correlation coefficient (0.93 ± 0.05) and accuracy (0.94 ± 0.03). AVAILABILITY AND IMPLEMENTATION: Detailed information can be found in the Supplementary File. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento de Imagem Assistida por Computador , Neoplasias , Redes Neurais de Computação , Linguagens de Programação , Neoplasias/diagnóstico por imagem , Humanos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Conjuntos de Dados como Assunto
18.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34058752

RESUMO

Understanding how a mutation might affect protein stability is of significant importance to protein engineering and for understanding protein evolution genetic diseases. While a number of computational tools have been developed to predict the effect of missense mutations on protein stability protein stability upon mutations, they are known to exhibit large biases imparted in part by the data used to train and evaluate them. Here, we provide a comprehensive overview of predictive tools, which has provided an evolving insight into the importance and relevance of features that can discern the effects of mutations on protein stability. A diverse selection of these freely available tools was benchmarked using a large mutation-level blind dataset of 1342 experimentally characterised mutations across 130 proteins from ThermoMutDB, a second test dataset encompassing 630 experimentally characterised mutations across 39 proteins from iStable2.0 and a third blind test dataset consisting of 268 mutations in 27 proteins from the newly published ProThermDB. The performance of the methods was further evaluated with respect to the site of mutation, type of mutant residue and by ranging the pH and temperature. Additionally, the classification performance was also evaluated by classifying the mutations as stabilizing (∆∆G ≥ 0) or destabilizing (∆∆G < 0). The results reveal that the performance of the predictors is affected by the site of mutation and the type of mutant residue. Further, the results show very low performance for pH values 6-8 and temperature higher than 65 for all predictors except iStable2.0 on the S630 dataset. To illustrate how stability and structure change upon single point mutation, we considered four stabilizing, two destabilizing and two stabilizing mutations from two proteins, namely the toxin protein and bovine liver cytochrome. Overall, the results on S268, S630 and S1342 datasets show that the performance of the integrated predictors is better than the mechanistic or individual machine learning predictors. We expect that this paper will provide useful guidance for the design and development of next-generation bioinformatic tools for predicting protein stability changes upon mutations.


Assuntos
Biologia Computacional/métodos , Mutação de Sentido Incorreto , Estabilidade Proteica , Proteínas/química , Proteínas/genética , Software , Algoritmos , Bases de Dados de Proteínas , Evolução Molecular , Aprendizado de Máquina , Modelos Moleculares , Conformação Proteica , Proteínas/metabolismo , Reprodutibilidade dos Testes , Relação Estrutura-Atividade
19.
Bioinformatics ; 37(22): 4291-4295, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34009289

RESUMO

MOTIVATION: Digital pathology supports analysis of histopathological images using deep learning methods at a large-scale. However, applications of deep learning in this area have been limited by the complexities of configuration of the computational environment and of hyperparameter optimization, which hinder deployment and reduce reproducibility. RESULTS: Here, we propose HEAL, a deep learning-based automated framework for easy, flexible and multi-faceted histopathological image analysis. We demonstrate its utility and functionality by performing two case studies on lung cancer and one on colon cancer. Leveraging the capability of Docker, HEAL represents an ideal end-to-end tool to conduct complex histopathological analysis and enables deep learning in a broad range of applications for cancer image analysis. AVAILABILITY AND IMPLEMENTATION: The docker image of HEAL is available at https://hub.docker.com/r/docurdt/heal and related documentation and datasets are available at http://heal.erc.monash.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias do Colo , Aprendizado Profundo , Humanos , Software , Reprodutibilidade dos Testes
20.
Nucleic Acids Res ; 49(10): e60, 2021 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-33660783

RESUMO

Sequence-based analysis and prediction are fundamental bioinformatic tasks that facilitate understanding of the sequence(-structure)-function paradigm for DNAs, RNAs and proteins. Rapid accumulation of sequences requires equally pervasive development of new predictive models, which depends on the availability of effective tools that support these efforts. We introduce iLearnPlus, the first machine-learning platform with graphical- and web-based interfaces for the construction of machine-learning pipelines for analysis and predictions using nucleic acid and protein sequences. iLearnPlus provides a comprehensive set of algorithms and automates sequence-based feature extraction and analysis, construction and deployment of models, assessment of predictive performance, statistical analysis, and data visualization; all without programming. iLearnPlus includes a wide range of feature sets which encode information from the input sequences and over twenty machine-learning algorithms that cover several deep-learning approaches, outnumbering the current solutions by a wide margin. Our solution caters to experienced bioinformaticians, given the broad range of options, and biologists with no programming background, given the point-and-click interface and easy-to-follow design process. We showcase iLearnPlus with two case studies concerning prediction of long noncoding RNAs (lncRNAs) from RNA transcripts and prediction of crotonylation sites in protein chains. iLearnPlus is an open-source platform available at https://github.com/Superzchen/iLearnPlus/ with the webserver at http://ilearnplus.erc.monash.edu/.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Análise de Sequência/métodos , Software , Sequência de Aminoácidos , Animais , Sequência de Bases , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...