Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
ArXiv ; 2023 Dec 13.
Article in English | MEDLINE | ID: mdl-38764593

ABSTRACT

Detecting protein-protein interactions (PPIs) is crucial for understanding genetic mechanisms, disease pathogenesis, and drug design. However, with the fast-paced growth of biomedical literature, there is a growing need for automated and accurate extraction of PPIs to facilitate scientific knowledge discovery. Pre-trained language models, such as generative pre-trained transformers (GPT) and bidirectional encoder representations from transformers (BERT), have shown promising results in natural language processing (NLP) tasks. We evaluated the performance of PPI identification of multiple GPT and BERT models using three manually curated gold-standard corpora: Learning Language in Logic (LLL) with 164 PPIs in 77 sentences, Human Protein Reference Database with 163 PPIs in 145 sentences, and Interaction Extraction Performance Assessment with 335 PPIs in 486 sentences. BERT-based models achieved the best overall performance, with BioBERT achieving the highest recall (91.95%) and F1-score (86.84%) and PubMedBERT achieving the highest precision (85.25%). Interestingly, despite not being explicitly trained for biomedical texts, GPT-4 achieved commendable performance, comparable to the top-performing BERT models. It achieved a precision of 88.37%, a recall of 85.14%, and an F1-score of 86.49% on the LLL dataset. These results suggest that GPT models can effectively detect PPIs from text data, offering promising avenues for application in biomedical literature mining. Further research could explore how these models might be fine-tuned for even more specialized tasks within the biomedical domain.

2.
BMC Bioinformatics ; 22(1): 213, 2021 Apr 24.
Article in English | MEDLINE | ID: mdl-33894739

ABSTRACT

BACKGROUND: In this research, an astute system has been developed by using machine learning and data mining approach to predict the risk level of cervical and ovarian cancer in association to stress. RESULTS: For functioning factors and subfactors, several machine learning models like Logistics Regression, Random Forest, AdaBoost, Naïve Bayes, Neural Network, kNN, CN2 rule Inducer, Decision Tree, Quadratic Classifier were compared with standard metrics e.g., F1, AUC, CA. For certainty info gain, gain ratio, gini index were revealed for both cervical and ovarian cancer. Attributes were ranked using different feature selection evaluators. Then the most significant analysis was made with the significant factors. Factors like children, age of first intercourse, age of husband, Pap test, age are the most significant factors of cervical cancer. On the other hand, genital area infection, pregnancy problems, use of drugs, abortion, and the number of children are important factors of ovarian cancer. CONCLUSION: Resulting factors were merged, categorized, weighted according to their significance level. The categorized factors were indexed using ranker algorithm which provides them a weightage value. An algorithm has been formulated afterward which can be used to predict the risk level of cervical and ovarian cancer in relation to women's mental health. The research will have a great impact on the low incoming country like Bangladesh as most women in low incoming nations were unaware of it. As these two can be described as the most sensitive cancers to women, the development of the application from algorithm will also help to reduce women's mental stress. More data and parameters will be added in future for research in this perspective.


Subject(s)
Machine Learning , Neoplasms , Algorithms , Bayes Theorem , Child , Female , Humans , Logistic Models , Neural Networks, Computer , Pregnancy
SELECTION OF CITATIONS
SEARCH DETAIL
...