Search | VHL Regional Portal

Pathogenicity classification of missense mutations based on deep generative model.

Bai, Ke; Yang, Lu; Xue, Jian; Zhao, Lin; Hao, Fanchang.

Comput Biol Med ; 170: 107980, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38242017

ABSTRACT

Missense mutations affect the function of human proteins and are closely associated with multiple acute and chronic diseases. The identification of disease-associated missense mutations and their classification for pathogenicity can provide insights into the genetic basis of disease and protein function. This paper proposes MLAE (Method based on LSTM-Ladder AutoEncoder), a deep learning classification model for identifying disease-associated missense mutations and classifying their pathogenicity based on the Variational AutoEncoder (VAE) framework. MLAE overcomes the limitations of the VAE framework by introducing the Ladder structure, combined with LSTM networks. This reduces the loss of original information during the transmission process, thereby making the model more effective in learning. In the experiment, MLAE classified all 27572 possible missense variants of the three input proteins with an average classification AUC of 0.941. This result provides evidence that MLAE is effective in predicting pathogenicity. Additionally, MLAE provides results for multi-label classification, with an average Hamming loss of 0.196, supporting the classification of complex variants. The proposed MLAE method provides an insightful approach to effectively capture amino acid sequence information and accurately predict the pathogenicity of mutations, thereby providing an analytical basis for the study and prevention of related diseases.

Subject(s)

Mutation, Missense , Humans , Virulence , Mutation

A two-transcript biomarker of host classifier genes for discrimination of bacterial from viral infection in acute febrile illness: a multicentre discovery and validation study.

Xu, Nannan; Hao, Fanchang; Dong, Xiaomeng; Yao, Yongyuan; Guan, Yanyan; Yang, Lulu; Chen, Fengzhe; Zheng, Feng; Li, Qingyan; Liu, Wenguo; Zhao, Cui; Li, Wen; Palavecino, Elizabeth; Wang, Wei; Wang, Gang.

Lancet Digit Health ; 3(8): e507-e516, 2021 08.

Article in English | MEDLINE | ID: mdl-34325854

ABSTRACT

BACKGROUND: Acute febrile illness is one of the main reasons for outpatient hospital visits worldwide. However, differential diagnosis between bacterial and viral causes is challenging and misdiagnosis can result in antimicrobial overuse and hinder prompt treatment. We aimed to build and validate a diagnostic model to discriminate bacterial from viral infection in acute febrile illness by evaluating the expression of potential classifier host genes. METHODS: In this multicentre discovery and validation study, we included patients aged 14-85 years with acute febrile illness (fever for ≤14 days, axillary temperature of ≥38°C, and confirmed bacterial infection, viral infection, or non-infectious inflammatory disease), and healthy control participants (no significant medical history and no fever within the past 90 days) from four hospitals in Shandong province, China. Patients from the first hospital were divided into the screening, discovery, and internal validation groups, and patients from the three other hospitals comprised the external validation group. We measured expression of candidate genes in peripheral blood by RT-PCR, and patients for whom a successful RT-PCT result was recorded were included in the next-step analysis. For patients from the first hospital, those enrolled during the early phase of the study were assigned to the screening group, which was used to identify the optimal transcripts (IFI44L and PI3) for discrimination between bacterial and viral infections by screening four candidate genes (FAM89A, IFI44L, PI3, and ITGB2) by RT-PCR. The remaining patients were then randomly assigned (1:1) to discovery and internal validation groups by time of admission and blood drawing via the equidistant random sampling method. A logistic regression model integrating the mRNA levels of IFI44L and PI3 was built by use of the discovery group, and the diagnostic performance of the model was evaluated in the internal and external validation groups using area under the receiver operating curve (AUC), sensitivity, and specificity. FINDINGS: Between March 1, 2018, and Aug 31, 2019, we assessed 1658 individuals for inclusion in the study. After exclusion of ineligible participants, 458 participants were enrolled (178 patients with acute febrile illness caused by bacterial infection, 212 with acute febrile illness caused by viral infection, 38 with non-infectious inflammatory diseases, and 30 healthy controls). The 390 patients with bacterial or viral infections were assigned to one of four groups: screening (n=64, 33 with bacterial infections and 31 with viral infections), discovery (n=124, 55 with bacterial infections and 69 with viral infections), internal validation (n=124, 55 with bacterial infections and 69 with viral infections), and external validation (n=78, 35 with bacterial infections and 43 with viral infections). Of the four candidate host genes (FAM89A, IFI44L, PI3, and ITGB2), IFI44L and PI3 showed the most discriminative expression pattern and were used to build the logistic regression model. We established the optimal cutoff of the bacterial infection likelihood score to be 0·547598. With the diagnostic result from the gold standard tests (culture and PCR) as the reference, the two-transcript classifier model had an AUC of 0·969 (95% CI 0·937-1·000), sensitivity of 0·891 (0·782-0·949), and specificity of 0·971 (0·900-0·992) to discriminate bacterial and viral infections in the internal validation group. The model showed similar results in the external validation group (AUC 0·986, 95% CI 0·968-1·000; sensitivity 0·857, 0·706-0·937; and specificity 0·954, 0·845-0·987). INTERPRETATION: IFI44L and PI3 transcripts, measured by RT-PCR, are robust classifiers to discriminate bacterial from viral infection in acute febrile illness. This two-transcript biomarker has the potential to be transformed into a commercial panel and applied universally. FUNDING: None.

Subject(s)

Bacteria , Bacterial Infections/diagnosis , Fever/diagnosis , Mass Screening/methods , Models, Biological , Virus Diseases/diagnosis , Viruses , Adolescent , Adult , Aged , Aged, 80 and over , Area Under Curve , Bacteria/growth & development , Bacterial Infections/metabolism , Bacterial Infections/microbiology , Biomarkers/metabolism , China , Diagnosis, Differential , Female , Fever/metabolism , Fever/microbiology , Fever/virology , Gene Expression , Humans , Male , Middle Aged , ROC Curve , Reproducibility of Results , Virus Diseases/metabolism , Virus Diseases/virology , Viruses/growth & development , Young Adult

A 2-Approximation Scheme for Sorting Signed Permutations by Reversals, Transpositions, Transreversals, and Block-Interchanges.

Hao, FanChang; Zhang, Melvin; Leong, Hon Wai.

IEEE/ACM Trans Comput Biol Bioinform ; 16(5): 1702-1711, 2019.

Article in English | MEDLINE | ID: mdl-28678711

ABSTRACT

We consider the problem of sorting signed permutations by reversals, transpositions, transreversals, and block-interchanges and give a 2-approximation scheme, called the GSB (Genome Sorting by Bridges) scheme. Our result extends 2-approximation algorithm of He and Chen [12] that allowed only reversals and block-interchanges, and also the 1.5 approximation algorithm of Hartman and Sharan [11] that allowed only transreversals and transpositions. We prove this result by introducing three bridge structures in the breakpoint graph, namely, the L-bridge, T-bridge, and X-bridge and show that they model "proper" reversals, transpositions, transreversals, and block-interchanges, respectively. We show that we can always find at least one of these three bridges in any breakpoint graph, thus giving an upper bound on the number of operations needed. We prove a lower bound on the distance and use it to show that GSB has a 2-approximation ratio. An ${\text{O(n}}^{3})$O(n3) algorithm called GSB-I that is based on the GSB approximation scheme presented in this paper has recently been published by Yu, Hao, and Leong in [17] . We note that our 2-approximation scheme admits many possible implementations by varying the order we search for proper rearrangement operations.

Subject(s)

Gene Rearrangement/genetics , Genome/genetics , Genomics/methods , Algorithms , Models, Genetic

An O([Formula: see text]) algorithm for sorting signed genomes by reversals, transpositions, transreversals and block-interchanges.

Yu, Shuzhi; Hao, Fanchang; Leong, Hon Wai.

J Bioinform Comput Biol ; 14(1): 1640002, 2016 Feb.

Article in English | MEDLINE | ID: mdl-26707923

ABSTRACT

We consider the problem of sorting signed permutations by reversals, transpositions, transreversals, and block-interchanges. The problem arises in the study of species evolution via large-scale genome rearrangement operations. Recently, Hao et al. gave a 2-approximation scheme called genome sorting by bridges (GSB) for solving this problem. Their result extended and unified the results of (i) He and Chen - a 2-approximation algorithm allowing reversals, transpositions, and block-interchanges (by also allowing transversals) and (ii) Hartman and Sharan - a 1.5-approximation algorithm allowing reversals, transpositions, and transversals (by also allowing block-interchanges). The GSB result is based on introduction of three bridge structures in the breakpoint graph, the L-bridge, T-bridge, and X-bridge that models goodreversal, transposition/transreversal, and block-interchange, respectively. However, the paper by Hao et al. focused on proving the 2-approximation GSB scheme and only mention a straightforward [Formula: see text] algorithm. In this paper, we give an [Formula: see text] algorithm for implementing the GSB scheme. The key idea behind our faster GSB algorithm is to represent cycles in the breakpoint graph by their canonical sequences, which greatly simplifies the search for these bridge structures. We also give some comparison results (running time and computed distances) against the original GSB implementation.

Subject(s)

Algorithms , Genomics/methods , Computational Biology/methods , DNA Transposable Elements , Genome , Models, Genetic

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL