Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26
Filter
1.
PLoS One ; 19(6): e0305366, 2024.
Article in English | MEDLINE | ID: mdl-38843169

ABSTRACT

[This corrects the article DOI: 10.1371/journal.pone.0275998.].

2.
Article in English | MEDLINE | ID: mdl-38324432

ABSTRACT

To automatically mine structured semantic topics from text, neural topic modeling has arisen and made some progress. However, most existing work focuses on designing a mechanism to enhance topic coherence but sacrificing the diversity of the extracted topics. To address this limitation, we propose the first neural-based topic modeling approach purely based on mutual information maximization, called the mutual information topic (MIT) model, in this article. The proposed MIT significantly improves topic diversity by maximizing the mutual information between word distribution and topic distribution. Meanwhile, MIT also utilizes Dirichlet prior in latent topic space to ensure the quality of mined topics. The experimental results on three publicly benchmark text corpora show that MIT could extract topics with higher coherence values (considering four topic coherence metrics) than competitive approaches and has a significant improvement on topic diversity metric. Besides, our experiments prove that the proposed MIT converges faster and more stable than adversarial-neural topic models.

3.
Environ Sci Pollut Res Int ; 30(59): 123862-123881, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37995031

ABSTRACT

As a bridge between economy and ecology, green finance is vital in improving environmental quality and promoting sustainable development. Based on the building of an environmental pollution index system, this paper constructs the [Formula: see text] model to deeply explore the specific impact of green finance on environmental pollution using China's provincial panel data from 2007 to 2020. This paper constructs an intermediary model to test the impact mechanism of green finance on reducing environmental pollution and discusses the regional heterogeneity of green finance in reducing environmental pollution. The results show that (1) green finance can significantly reduce environmental pollution, among which green credit has a pronounced effect on reducing environmental pollution, green investment has a relatively small effect, and green securities have not significant effect. (2) Green finance has the best inhibitory effect on solid pollution, less inhibitory effect on air pollution, and no significant improvement effect on water pollution. (3) Green technology innovation, industrial structure upgrading, and environmental regulation play an intermediary role in the process of green finance reducing environmental pollution and improving environmental quality. (4) The effect of green finance in the eastern and carbon emission pilot areas is significantly better than in the central and western regions and non-carbon emission pilot areas respectively. According to the research results of this paper, suggestions are put forward to promote the development of green finance, which is of great significance to reducing environmental pollution and achieving sustainable development goals.


Subject(s)
Air Pollution , Air Pollution/prevention & control , Water Pollution , Carbon , Ecology , China , Economic Development
4.
Front Psychol ; 14: 1026638, 2023.
Article in English | MEDLINE | ID: mdl-36844331

ABSTRACT

This paper explores how older adults with different cognitive abilities perform the refusal speech act in the cognitive assessment in the setting of memory clinics. The refusal speech act and its corresponding illocutionary force produced by nine Chinese older adults in the Montreal Cognitive Assessment-Basic was annotated and analyzed from a multimodal perspective. Overall, regardless of the older adults' cognitive ability, the most common discursive device to refuse is the demonstration of their inability to carry out or continue the cognitive task. Individuals with lower cognitive ability were found to perform the refusal illocutionary force (hereafter RIF) with higher frequency and degree. Additionally, under the pragmatic compensation mechanism, which is influenced by cognitive ability, multiple expression devices (including prosodic features and non-verbal acts) interact dynamically and synergistically to help older adults carry out the refusal behavior and to unfold older adults' intentional state and emotion as well. The findings indicate that both the degree and the frequency of performing the refusal speech act in the cognitive assessment are related to the cognitive ability of older adults.

5.
Article in English | MEDLINE | ID: mdl-35576420

ABSTRACT

Biomedical argument mining aims to automatically identify and extract the argumentative structure in biomedical text. It helps to determine not only what positions people adopt, but also why they hold such opinions, which provides valuable insights into medical decision making. Generally, biomedical argument mining consists of three subtasks: argument component identification, argument component classification and relation identification. Current approaches employ conventional multi-task learning framework for jointly addressing the latter two subtasks, and achieve some success. However, explicit sequential dependency between these two subtasks is ignored, which is crucial for accurate biomedical argument mining. Moreover, relation identification is conducted solely based on the argument component pair without considering its potentially valuable context. Therefore, in this paper, a novel sequential multi-task learning approach is proposed for biomedical argument mining. Specifically, to model explicit sequential dependency between argument component classification and relation identification, an information transfer strategy is employed to capture the information of argument component type that is transferred to relation identification. Furthermore, graph convolutional network is employed to model dependency relation among the related argument component pairs. The proposed method has been evaluated on a benchmark dataset and the experimental results show that the proposed method outperforms the state-of-the-art methods.


Subject(s)
Benchmarking , Clinical Decision-Making , Humans
6.
PLoS One ; 17(10): e0275998, 2022.
Article in English | MEDLINE | ID: mdl-36301794

ABSTRACT

The steam turbine is one of the major pieces of equipment in thermal power plants. It is crucial to predict its output accurately. However, because of its complex coupling relationships with other equipment, it is still a challenging task. Previous methods mainly focus on the operation of the steam turbine individually while ignoring the coupling relationship with the condenser, which we believe is crucial for the prediction. Therefore, in this paper, to explore the coupling relationship between steam turbine and condenser, we propose a novel approach for steam turbine power prediction based on the encode-decoder framework guided by the condenser vacuum degree (CVD-EDF). In specific, the historical information within condenser operation conditions data is encoded using a long-short term memory network. Moreover, a connection module consisting of an attention mechanism and a convolutional neural network is incorporated to capture the local and global information in the encoder. The steam turbine power is predicted based on all the information. In this way, the coupling relationship between the condenser and the steam turbine is fully explored. Abundant experiments are conducted on real data from the power plant. The experimental results show that our proposed CVD-EDF achieves great improvements over several competitive methods. our method improves by 32.2% and 37.0% in terms of RMSE and MAE by comparing the LSTM at one-minute intervals.


Subject(s)
Cardiovascular Diseases , Steam , Humans , Vacuum , Power Plants , Neural Networks, Computer
7.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2365-2376, 2022.
Article in English | MEDLINE | ID: mdl-33974546

ABSTRACT

Biomedical factoid question answering is an important task in biomedical question answering applications. It has attracted much attention because of its reliability. In question answering systems, better representation of words is of great importance, and proper word embedding can significantly improve the performance of the system. With the success of pretrained models in general natural language processing tasks, pretrained models have been widely used in biomedical areas, and many pretrained model-based approaches have been proven effective in biomedical question-answering tasks. In addition to proper word embedding, name entities also provide important information for biomedical question answering. Inspired by the concept of transfer learning, in this study, we developed a mechanism to fine-tune BioBERT with a named entity dataset to improve the question answering performance. Furthermore, we applied BiLSTM to encode the question text to obtain sentence-level information. To better combine the question level and token level information, we use bagging to further improve the overall performance. The proposed framework was evaluated on BioASQ 6b and 7b datasets, and the results have shown that our proposed framework can outperform all baselines.


Subject(s)
Machine Learning , Natural Language Processing , Language , Learning , Reproducibility of Results
8.
Artif Intell Med ; 118: 102119, 2021 08.
Article in English | MEDLINE | ID: mdl-34412842

ABSTRACT

OBJECTIVE: Health issue identification in social media is to predict whether the writers have a disease based on their posts. Numerous posts and comments are shared on social media by users. Certain posts may reflect writers' health condition, which can be employed for health issue identification. Usually, the health issue identification problem is formulated as a classification task. METHODS AND MATERIAL: In this paper, we propose novel multi-task hierarchical neural networks with topic attention for identifying health issue based on posts collected from the social media platforms. Specifically, the model incorporates the hierarchical relationship among the document, sentences, and words via bidirectional gated recurrent units (BiGRUs). The global topic information shared across posts is incorporated with the hidden states of BiGRUs to obtain the topic-enhanced attention weights for words. In addition, tasks of predicting whether the writers suffer from a disease (health issue identification) and predicting the specific domain of the posts (domain category classification) are learned jointly in multi-task mechanism. RESULTS: The proposed method is evaluated on two datasets: dementia issue dataset and depression issue dataset. The proposed approach achieves 98.03% and 88.28% F-1 score on two datasets, outperforming the state-of-the-art approach by 0.73% and 0.4% respectively. Further experimental analysis shows the effectiveness of incorporating both the multi-task learning framework and topic attention mechanism.


Subject(s)
Social Media , Humans , Language , Neural Networks, Computer
9.
J Affect Disord ; 295: 148-155, 2021 12 01.
Article in English | MEDLINE | ID: mdl-34461370

ABSTRACT

BACKGROUND: Objective biomarkers are crucial for overcoming the clinical dilemma in major depressive disorder (MDD), and the individualized diagnosis is essential to facilitate the precise medicine for MDD. METHODS: Sleep disturbance-related magnetic resonance imaging (MRI) features was identified in the internal dataset (92 MDD patients) using the relevance vector regression algorithm, which was further verified in 460 MDD patients of an independent, multicenter dataset. Subsequently, using these MRI features, the eXtreme Gradient Boosting classification model was constructed in the current multicenter dataset (460 MDD patients and 470 normal controls). Meanwhile, the association between classification outputs and the severity of depressive symptoms was also investigated. RESULTS: In MDD patients, the combination of gray matter density and fractional amplitude of low-frequency fluctuation can accurately predict individual sleep disturbance score that was calculated by the sum of item 4 score, item 5 score, and item 6 score of the 17-Item Hamilton Rating Scale for Depression (HAMD-17) (R2 = 0.158 in the internal dataset; R2 = 0.110 in multicenter dataset). Furthermore, the classification model based on these MRI features distinguished MDD patients from normal controls with 86.3% accuracy (area under the curve = 0.937). Importantly, the classification outputs significantly correlated with HAMD-17 scores in MDD patients. LIMITATION: Lacking some specialized tools to assess the personal sleep quality, e.g. Pittsburgh Sleep Quality Index. CONCLUSION: Neuroimaging features can reflect accurately individual sleep disturbance manifestation and serve as potential diagnostic biomarkers of MDD.


Subject(s)
Depressive Disorder, Major , Biomarkers , Depressive Disorder, Major/diagnostic imaging , Humans , Machine Learning , Neuroimaging , Sleep
10.
ACS Chem Neurosci ; 12(15): 2878-2886, 2021 08 04.
Article in English | MEDLINE | ID: mdl-34282889

ABSTRACT

Diagnosis of major depressive disorder (MDD) using resting-state functional connectivity (rs-FC) data faces many challenges, such as the high dimensionality, small samples, and individual difference. To assess the clinical value of rs-FC in MDD and identify the potential rs-FC machine learning (ML) model for the individualized diagnosis of MDD, based on the rs-FC data, a progressive three-step ML analysis was performed, including six different ML algorithms and two dimension reduction methods, to investigate the classification performance of ML model in a multicentral, large sample dataset [1021 MDD patients and 1100 normal controls (NCs)]. Furthermore, the linear least-squares fitted regression model was used to assess the relationships between rs-FC features and the severity of clinical symptoms in MDD patients. Among used ML methods, the rs-FC model constructed by the eXtreme Gradient Boosting (XGBoost) method showed the optimal classification performance for distinguishing MDD patients from NCs at the individual level (accuracy = 0.728, sensitivity = 0.720, specificity = 0.739, area under the curve = 0.831). Meanwhile, identified rs-FCs by the XGBoost model were primarily distributed within and between the default mode network, limbic network, and visual network. More importantly, the 17 item individual Hamilton Depression Scale scores of MDD patients can be accurately predicted using rs-FC features identified by the XGBoost model (adjusted R2 = 0.180, root mean squared error = 0.946). The XGBoost model using rs-FCs showed the optimal classification performance between MDD patients and HCs, with the good generalization and neuroscientifical interpretability.


Subject(s)
Depressive Disorder, Major , Brain/diagnostic imaging , Brain Mapping , Humans , Machine Learning , Magnetic Resonance Imaging
11.
Ann Transl Med ; 9(4): 316, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33708943

ABSTRACT

BACKGROUND: Diabetes has significant effects on bone metabolism. Both type 1 and type 2 diabetes can cause osteoporotic fracture. However, it remains challenging to diagnose osteoporosis in type 2 diabetes by bone mineral density which lacks regular changes. Seen another way, osteoporosis can be ascribed to the imbalance of bone metabolism, which is closely related to diabetes as well. METHODS: Here, to assist clinicians in diagnosing osteoporosis in type 2 diabetes, an efficient and simple SVM (support vector machine) model was established based on different combinations of biochemical indexes, which were collected from patients who did the test of bone turn-over markers (BTMs) from January 2016 to March 2018 in the department of endocrine, Zhongda Hospital affiliated to Southeast University. The classification was done based on a software package of machine learning in Python. The classification performance was measured by SKLearn program incorporated in the Python software package and compared with the clinical diagnostic results. RESULTS: The predicting accuracy rate of final model was above 88%, with feature combination of sex, age, BMI (body mass index), TP1NP (total procollagen I N-terminal propeptide) and OSTEOC (osteocalcin). CONCLUSIONS: Experimental results show that the model showed an anticipant result for early detection and daily monitoring on type 2 diabetic osteoporosis.

12.
IEEE/ACM Trans Comput Biol Bioinform ; 17(6): 2029-2039, 2020.
Article in English | MEDLINE | ID: mdl-31095491

ABSTRACT

Biomedical event extraction plays an important role in the extraction of biological information from large-scale scientific publications. However, most state-of-the-art systems separate this task into several steps, which leads to cascading errors. In addition, it is complicated to generate features from syntactic and dependency analysis separately. Therefore, in this paper, we propose an end-to-end model based on long short-term memory (LSTM) to optimize biomedical event extraction. Experimental results demonstrate that our approach improves the performance of biomedical event extraction. We achieve average F1-scores of 59.68, 58.23, and 57.39 percent on the BioNLP09, BioNLP11, and BioNLP13's Genia event datasets, respectively. The experimental study has shown our proposed model's potential in biomedical event extraction.


Subject(s)
Biomedical Research/classification , Computational Biology/methods , Data Mining/methods , Neural Networks, Computer
13.
ACS Chem Neurosci ; 10(8): 3479-3485, 2019 08 21.
Article in English | MEDLINE | ID: mdl-31145586

ABSTRACT

The objective of the study was to explore the potential value of plasma indicators for identifying amnesic mild cognitive impairment (aMCI) and determine whether levels of plasma indicators are related to the performance of cognitive function and brain tissue volumes. In total, 155 participants (68 aMCI patients and 87 health controls) were recruited in the present cross-sectional study. The levels of plasma amyloid-ß (Aß) 40, Aß42, total tau (t-tau), and neurofilament light (NFL) were measured using an ultrasensitive quantitative method. Machine learning algorithms were performed for establishing an optimal model of identifying aMCI. Compared with healthy controls, Aß40 and Aß42 levels were lower and NFL levels were higher in plasma of aMCI patients with an exception of t-tau levels. In aMCI patients, the higher plasma Aß40 levels were correlated with the impaired episodic memory and negative correlations were observed between plasma t-tau levels and global cognitive function and gray matter (GM) volume. In addition, the higher plasma NFL levels were correlated with reduced hippocampus volume and total GM volume of the left inferior and middle temporal gyrus. An integrated model included clinical features, hippocampus volume, and plasma Aß42 and NFL and had the highest accuracy for detecting aMCI patients (accuracy, 74.2%). We demonstrated that plasma Aß40, Aß42, t-tau, and NFL may be useful to identify aMCI and correlate with cognitive decline and brain atrophy. Among these plasma indicators, Aß42 and NFL are more valuable as key members of a peripheral biomarker panel to detect aMCI.


Subject(s)
Amyloid beta-Peptides/blood , Cognitive Dysfunction/blood , Cognitive Dysfunction/diagnosis , Early Diagnosis , Neurofilament Proteins/blood , tau Proteins/blood , Aged , Alzheimer Disease/blood , Alzheimer Disease/diagnosis , Alzheimer Disease/pathology , Biomarkers/blood , Brain/pathology , Cognitive Dysfunction/pathology , Cross-Sectional Studies , Female , Humans , Male , Middle Aged
14.
Virus Genes ; 54(5): 662-671, 2018 Oct.
Article in English | MEDLINE | ID: mdl-30105631

ABSTRACT

Despite the notable success of combination antiretroviral therapy, how to eradicate latent HIV-1 from reservoirs poses a challenge. The Tat protein plays an indispensable role in HIV reactivation and histone demethylase LSD1 promotes Tat-mediated long terminal repeats (LTR) activation. However, the role of LSD1 in remodeling chromatin and the role of its component BHC80 in activation of latent HIV-1 in T cells are unknown. Our findings indicate that LSD1 could decrease the level of histone H3 lysine 4 trimethylation (H3K4me3) at the HIV-1 promoter by recruiting histone lysine demethylase 5A (KDM5A) and preventing histone methyltransferase Set1A and WD-40 repeat protein 5 (WDR5) from binding to LTR. Moreover, BHC80 is necessary for LSD1-triggered LTR activation and assists LSD1 in activating LTR by binding to nucleotides 305-631 of LTR. In activated J-Lat-A2 cells, BHC80 expression was elevated and its isoform BHC80-6 promoted the association of BHC80 with LSD1. These results suggest that the LSD1-BHC80 complex enhances HIV-1 transcription by a decrease of H3K4me3 level at the viral promoter. Therefore, it might be used as a new drug target to reactivate latent HIV-1.


Subject(s)
HIV-1/metabolism , Histone Deacetylases/metabolism , Histone Demethylases/metabolism , tat Gene Products, Human Immunodeficiency Virus/metabolism , Binding Sites , HEK293 Cells , HIV-1/genetics , HeLa Cells , Humans , Jurkat Cells , Promoter Regions, Genetic , Protein Binding , Sp1 Transcription Factor/metabolism , Terminal Repeat Sequences , Transcriptional Activation , Virus Activation
15.
Virol Sin ; 33(3): 261-269, 2018 Jun.
Article in English | MEDLINE | ID: mdl-29737506

ABSTRACT

Despite the success of combined antiretroviral therapy in recent years, the prevalence of human immunodeficiency virus (HIV)-associated neurocognitive disorders in people living with HIV-1 is increasing, significantly reducing the health-related quality of their lives. Although neurons cannot be infected by HIV-1, shed viral proteins such as transactivator of transcription (Tat) can cause dendritic damage. However, the detailed molecular mechanism of Tat-induced neuronal impairment remains unknown. In this study, we first showed that recombinant Tat (1-72 aa) induced neurotoxicity in primary cultured mouse neurons. Second, exposure to Tat1-72 was shown to reduce the length and number of dendrites in cultured neurons. Third, Tat1-72 (0-6 h) modulates protein phosphatase 1 (PP1) expression and enhances its activity by decreasing the phosphorylation level of PP1 at Thr320. Finally, Tat1-72 (24 h) downregulates CREB activity and CREB-mediated gene (BDNF, c-fos, Egr-1) expression. Together, these findings suggest that Tat1-72 might impair cognitive function by regulating the activity of PP1 and the CREB/BDNF pathway.


Subject(s)
Brain-Derived Neurotrophic Factor/metabolism , Cyclic AMP Response Element-Binding Protein/metabolism , Dendrites/metabolism , HIV-1/metabolism , Neurons/metabolism , Protein Phosphatase 1/metabolism , tat Gene Products, Human Immunodeficiency Virus/pharmacology , Animals , Blotting, Western , Cells, Cultured , Dendrites/drug effects , Mice , Mice, Inbred C57BL , Neurons/drug effects , Signal Transduction/physiology
16.
Artif Intell Med ; 87: 1-8, 2018 05.
Article in English | MEDLINE | ID: mdl-29559249

ABSTRACT

OBJECTIVE: A drug-drug interaction (DDI) is a situation in which a drug affects the activity of another drug synergistically or antagonistically when being administered together. The information of DDIs is crucial for healthcare professionals to prevent adverse drug events. Although some known DDIs can be found in purposely-built databases such as DrugBank, most information is still buried in scientific publications. Therefore, automatically extracting DDIs from biomedical texts is sorely needed. METHODS AND MATERIAL: In this paper, we propose a novel position-aware deep multi-task learning approach for extracting DDIs from biomedical texts. In particular, sentences are represented as a sequence of word embeddings and position embeddings. An attention-based bidirectional long short-term memory (BiLSTM) network is used to encode each sentence. The relative position information of words with the target drugs in text is combined with the hidden states of BiLSTM to generate the position-aware attention weights. Moreover, the tasks of predicting whether or not two drugs interact with each other and further distinguishing the types of interactions are learned jointly in multi-task learning framework. RESULTS: The proposed approach has been evaluated on the DDIExtraction challenge 2013 corpus and the results show that with the position-aware attention only, our proposed approach outperforms the state-of-the-art method by 0.99% for binary DDI classification, and with both position-aware attention and multi-task learning, our approach achieves a micro F-score of 72.99% on interaction type identification, outperforming the state-of-the-art approach by 1.51%, which demonstrates the effectiveness of the proposed approach.


Subject(s)
Data Mining , Deep Learning , Drug Interactions , Databases, Factual , Drug-Related Side Effects and Adverse Reactions
17.
Virol Sin ; 31(3): 199-206, 2016 Jun.
Article in English | MEDLINE | ID: mdl-27007880

ABSTRACT

The multifunctional trans-activator Tat is an essential regulatory protein for HIV-1 replication and is characterized by high sequence diversity. Numerous experimental studies have examined Tat in HIV-1 subtype B, but research on subtype C Tat is lacking, despite the high prevalence of infections caused by subtype C worldwide. We hypothesized that amino acid differences contribute to functional differences among Tat proteins. In the present study, we found that subtype B NL4-3 Tat and subtype C isolate HIV1084i Tat exhibited differences in stability by overexpressing the fusion protein Tat-Flag. In addition, 1084i Tat can activate LTR and NF-κB more efficiently than NL4-3 Tat. In analyses of the activities of the truncated forms of Tat, we found that the carboxyl-terminal region of Tat regulates its stability and transactivity. According to our results, we speculated that the differences in stability between B-Tat and C-Tat result in differences in transactivation ability.


Subject(s)
HIV-1/metabolism , tat Gene Products, Human Immunodeficiency Virus/chemistry , tat Gene Products, Human Immunodeficiency Virus/metabolism , Amino Acid Substitution , HEK293 Cells , HIV-1/chemistry , HIV-1/genetics , Humans , NF-kappa B/metabolism , Protein Stability , Recombinant Fusion Proteins/chemistry , Recombinant Fusion Proteins/genetics , Sequence Deletion , Structure-Activity Relationship , Transcriptional Activation , Virus Replication , tat Gene Products, Human Immunodeficiency Virus/genetics , tat Gene Products, Human Immunodeficiency Virus/immunology
18.
Virus Genes ; 52(2): 179-88, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26832332

ABSTRACT

The multifunctional transactivator Tat protein is an essentially regulatory protein for HIV-1 replication and it plays a role in pathogenesis of HIV-1 infection. At present, numerous experimental studies about HIV-1 Tat focus on subtype B, very few has been under study of subtype C-Tat. In view of the amino acid variation of the clade-specific Tat proteins, we hypothesized that the amino acid difference contributed to differential function of Tat proteins. In the present study, we documented that subtype B NL4-3 Tat and subtype C isolate HIV1084i Tat from pediatric patient in Zambia exhibited distinct nuclear localization by over-expressing fusion protein Tat-EGFP. Interestingly, 1084i Tat showed uniform nuclear distribution, whereas NL4-3 Tat primarily localized in nucleolus. The 57th amino acid, highly conserved between B-Tat (arginine) and C-Tat (serine), is located in the basic domain of Tat, and played an important role in this subcellular localization. Meanwhile, we found that substitution of arginine to serine at the site 57 decreases Tat transactivation of the HIV-1 LTR promoter.


Subject(s)
Amino Acid Substitution , Genotype , HIV-1/genetics , HIV-1/metabolism , tat Gene Products, Human Immunodeficiency Virus/metabolism , Active Transport, Cell Nucleus , Amino Acid Sequence , HIV Infections/virology , Humans , Intracellular Space , Mutation , Nuclear Localization Signals , Nucleic Acid Conformation , Position-Specific Scoring Matrices , Protein Transport , Recombinant Fusion Proteins , tat Gene Products, Human Immunodeficiency Virus/chemistry , tat Gene Products, Human Immunodeficiency Virus/genetics
19.
Artif Intell Med ; 64(1): 51-8, 2015 May.
Article in English | MEDLINE | ID: mdl-25863986

ABSTRACT

OBJECTIVES: Scientists have devoted decades of efforts to understanding the interaction between proteins or RNA production. The information might empower the current knowledge on drug reactions or the development of certain diseases. Nevertheless, due to the lack of explicit structure, literature in life science, one of the most important sources of this information, prevents computer-based systems from accessing. Therefore, biomedical event extraction, automatically acquiring knowledge of molecular events in research articles, has attracted community-wide efforts recently. Most approaches are based on statistical models, requiring large-scale annotated corpora to precisely estimate models' parameters. However, it is usually difficult to obtain in practice. Therefore, employing un-annotated data based on semi-supervised learning for biomedical event extraction is a feasible solution and attracts more interests. METHODS AND MATERIAL: In this paper, a semi-supervised learning framework based on hidden topics for biomedical event extraction is presented. In this framework, sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences in the annotated corpus. More specifically, not only the structures of the sentences, but also the hidden topics embedded in the sentences are used for describing the distance. The sentences and newly assigned event annotations, together with the annotated corpus, are employed for training. RESULTS: Experiments were conducted on the multi-level event extraction corpus, a golden standard corpus. Experimental results show that more than 2.2% improvement on F-score on biomedical event extraction is achieved by the proposed framework when compared to the state-of-the-art approach. CONCLUSION: The results suggest that by incorporating un-annotated data, the proposed framework indeed improves the performance of the state-of-the-art event extraction system and the similarity between sentences might be precisely described by hidden topics and structures of the sentences.


Subject(s)
Data Mining/methods , Medical Informatics/methods , Supervised Machine Learning
20.
Comput Math Methods Med ; 2014: 298473, 2014.
Article in English | MEDLINE | ID: mdl-25214883

ABSTRACT

Biomedical relation extraction aims to uncover high-quality relations from life science literature with high accuracy and efficiency. Early biomedical relation extraction tasks focused on capturing binary relations, such as protein-protein interactions, which are crucial for virtually every process in a living cell. Information about these interactions provides the foundations for new therapeutic approaches. In recent years, more interests have been shifted to the extraction of complex relations such as biomolecular events. While complex relations go beyond binary relations and involve more than two arguments, they might also take another relation as an argument. In the paper, we conduct a thorough survey on the research in biomedical relation extraction. We first present a general framework for biomedical relation extraction and then discuss the approaches proposed for binary and complex relation extraction with focus on the latter since it is a much more difficult task compared to binary relation extraction. Finally, we discuss challenges that we are facing with complex relation extraction and outline possible solutions and future directions.


Subject(s)
Biology/methods , Data Mining/methods , Medicine/methods , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...