Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
Sci Rep ; 14(1): 10337, 2024 05 06.
Article in English | MEDLINE | ID: mdl-38710802

ABSTRACT

Infectious diseases have long been a shaping force in human history, necessitating a comprehensive understanding of their dynamics. This study introduces a co-evolution model that integrates both epidemiological and evolutionary dynamics. Utilizing a system of differential equations, the model represents the interactions among susceptible, infected, and recovered populations for both ancestral and evolved viral strains. Methodologically rigorous, the model's existence and uniqueness have been verified, and it accommodates both deterministic and stochastic cases. A myriad of graphical techniques have been employed to elucidate the model's dynamics. Beyond its theoretical contributions, this model serves as a critical instrument for public health strategy, particularly predicting future outbreaks in scenarios where viral mutations compromise existing interventions.


Subject(s)
Stochastic Processes , Humans , Immune System/virology , Evolution, Molecular , Viruses/genetics , Viruses/immunology , Biological Evolution
2.
Front Bioinform ; 3: 1276934, 2023.
Article in English | MEDLINE | ID: mdl-37900965

ABSTRACT

DNA, as the storage medium in organisms, can address the shortcomings of existing electromagnetic storage media, such as low information density, high maintenance power consumption, and short storage time. Current research on DNA storage mainly focuses on designing corresponding encoders to convert binary data into DNA base data that meets biological constraints. We have created a new Chinese character code table that enables exceptionally high information storage density for storing Chinese characters (compared to traditional UTF-8 encoding). To meet biological constraints, we have devised a DNA shift coding scheme with low algorithmic complexity, which can encode any strand of DNA even has excessively long homopolymer. The designed DNA sequence will be stored in a double-stranded plasmid of 744bp, ensuring high reliability during storage. Additionally, the plasmid's resistance to environmental interference ensuring long-term stable information storage. Moreover, it can be replicated at a lower cost.

3.
Front Bioinform ; 2: 812314, 2022.
Article in English | MEDLINE | ID: mdl-36304271

ABSTRACT

Brain tumor research has been stapled for human health while brain network research is crucial for us to understand brain activity. Here the structural controllability theory is applied to study three human brain-specific gene regulatory networks, including forebrain gene regulatory network, hindbrain gene regulatory network and neuron associated cells cancer related gene regulatory network, whose nodes are neural genes and the edges represent the gene expression regulation among the genes. The nodes are classified into two classes: critical nodes and ordinary nodes, based on the change of the number of driver nodes upon its removal. Eight topological properties (out-degree DO, in-degree DI, degree D, betweenness B, closeness CA, in-closeness CI, out-closeness CO and clustering coefficient CC) are calculated in this paper and the results prove that the critical genes have higher score of topological properties than the ordinary genes. Then two bioinformatic analysis are used to explore the biologic significance of the critical genes. On the one hand, the enrichment scores in several kinds of gene databases are calculated and reveal that the critical nodes are richer in essential genes, cancer genes and the neuron related disease genes than the ordinary nodes, which indicates that the critical nodes may be the biomarker in brain-specific gene regulatory network. On the other hand, GO analysis and KEGG pathway analysis are applied on them and the results show that the critical genes mainly take part in 14 KEGG pathways that are transcriptional misregulation in cancer, pathways in cancer and so on, which indicates that the critical genes are related to the brain tumor. Finally, by deleting the edges or routines in the network, the robustness analysis of node classification is realized, and the robustness of node classification is proved. The comparison of neuron associated cells cancer related GRN (Gene Regulatory Network) and normal brain-specific GRNs (including forebrain and hindbrain GRN) shows that the neuron-related cell cancer-related gene regulatory network is more robust than other types.

4.
ACS Synth Biol ; 11(7): 2504-2512, 2022 07 15.
Article in English | MEDLINE | ID: mdl-35771957

ABSTRACT

DNA computing has gained considerable attention due to the characteristics of high-density information storage and high parallel computing for solving computational problems. Building addressable logic gates with biomolecules is the basis for establishing biological computers. In the current calculation model, the multiinput AND operation often needs to be realized through a multilevel cascade between logic gates. Through experiments, it was found that the multilevel cascade causes signal leakage and affects the stability of the system. Using DNA strand displacement technology, we constructed a domino-like multiinput AND gate computing system instead of a cascade of operations, realizing multiinput AND computing on one logic gate and abandoning the traditional multilevel cascade of operations. Fluorescence experiments demonstrated that our methods significantly reduce system construction costs and improve the stability and robustness of the system. Finally, we proved stability and robustness of the domino AND gate by simulating the tic-tac-toe process with a massively parallel computing strategy.


Subject(s)
DNA , Logic , Computers, Molecular , DNA/genetics
5.
Infect Dis Poverty ; 11(1): 50, 2022 May 04.
Article in English | MEDLINE | ID: mdl-35509019

ABSTRACT

BACKGROUND: Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus. METHODS: The dataset included all 11 influenza virus proteins encoded in eight genome segments of 1724 strains. Two types of features were hierarchically used to build the prediction model. Amino acid features were directly delivered from 67 feature descriptors and input into the random forest classifier to output informative features about the class label and probabilistic prediction. The sequential forward search strategy was used to optimize the informative features. The final features for each strain had low dimensions and included knowledge from different perspectives, which were used to build the machine learning model for pathogenicity identification. RESULTS: The 40 signature positions were achieved by entropy screening. Mutations at position 135 of the hemagglutinin protein had the highest entropy value (1.06). After the informative features were directly generated from the 67 random forest models, the dimensions for class and probabilistic features were optimized as 4 and 3, respectively. The optimal class features had a maximum accuracy of 94.2% and a maximum Matthews correlation coefficient of 88.4%, while the optimal probabilistic features had a maximum accuracy of 94.1% and a maximum Matthews correlation coefficient of 88.2%. The optimized features outperformed the original informative features and amino acid features from individual descriptors. The sequential forward search strategy had better performance than the classical ensemble method. CONCLUSIONS: The optimized informative features had the best performance and were used to build a predictive model so as to identify the phenotype of influenza B virus with high pathogenicity and provide early risk warning for disease control.


Subject(s)
Amino Acids , Influenza B virus , Algorithms , Amino Acids/genetics , Influenza B virus/genetics , Machine Learning , Virulence
6.
Infect Dis Poverty ; 10(1): 128, 2021 Oct 24.
Article in English | MEDLINE | ID: mdl-34689829

ABSTRACT

BACKGROUND: Coronaviruses can be isolated from bats, civets, pangolins, birds and other wild animals. As an animal-origin pathogen, coronavirus can cross species barrier and cause pandemic in humans. In this study, a deep learning model for early prediction of pandemic risk was proposed based on the sequences of viral genomes. METHODS: A total of 3257 genomes were downloaded from the Coronavirus Genome Resource Library. We present a deep learning model of cross-species coronavirus infection that combines a bidirectional gated recurrent unit network with a one-dimensional convolution. The genome sequence of animal-origin coronavirus was directly input to extract features and predict pandemic risk. The best performances were explored with the use of pre-trained DNA vector and attention mechanism. The area under the receiver operating characteristic curve (AUROC) and the area under precision-recall curve (AUPR) were used to evaluate the predictive models. RESULTS: The six specific models achieved good performances for the corresponding virus groups (1 for AUROC and 1 for AUPR). The general model with pre-training vector and attention mechanism provided excellent predictions for all virus groups (1 for AUROC and 1 for AUPR) while those without pre-training vector or attention mechanism had obviously reduction of performance (about 5-25%). Re-training experiments showed that the general model has good capabilities of transfer learning (average for six groups: 0.968 for AUROC and 0.942 for AUPR) and should give reasonable prediction for potential pathogen of next pandemic. The artificial negative data with the replacement of the coding region of the spike protein were also predicted correctly (100% accuracy). With the application of the Python programming language, an easy-to-use tool was created to implements our predictor. CONCLUSIONS: Robust deep learning model with pre-training vector and attention mechanism mastered the features from the whole genomes of animal-origin coronaviruses and could predict the risk of cross-species infection for early warning of next pandemic.


Subject(s)
Coronavirus Infections , Coronavirus , Pandemics , Animals , Coronavirus/isolation & purification , Coronavirus Infections/epidemiology , Coronavirus Infections/veterinary , Deep Learning , Humans , Models, Statistical , Risk Assessment/methods
7.
Comput Math Methods Med ; 2021: 6985008, 2021.
Article in English | MEDLINE | ID: mdl-34671417

ABSTRACT

Swine influenza viruses (SIVs) can unforeseeably cross the species barriers and directly infect humans, which pose huge challenges for public health and trigger pandemic risk at irregular intervals. Computational tools are needed to predict infection phenotype and early pandemic risk of SIVs. For this purpose, we propose a feature representation algorithm to predict cross-species infection of SIVs. We built a high-quality dataset of 1902 viruses. A feature representation learning scheme was applied to learn feature representations from 64 well-trained random forest models with multiple feature descriptors of mutant amino acid in the viral proteins, including compositional information, position-specific information, and physicochemical properties. Class and probabilistic information were integrated into the feature representations, and redundant features were removed by feature space optimization. High performance was achieved using 20 informative features and 22 probabilistic information. The proposed method will facilitate SIV characterization of transmission phenotype.


Subject(s)
Influenza A virus/genetics , Influenza A virus/pathogenicity , Orthomyxoviridae Infections/veterinary , Swine Diseases/virology , Algorithms , Amino Acid Sequence , Amino Acids/analysis , Amino Acids/genetics , Animals , Computational Biology , Host Specificity , Humans , Influenza A Virus, H1N1 Subtype/genetics , Influenza A Virus, H1N2 Subtype/genetics , Influenza A Virus, H3N2 Subtype/genetics , Influenza A virus/classification , Influenza, Human/epidemiology , Influenza, Human/transmission , Influenza, Human/virology , Machine Learning , Models, Statistical , Mutation , Orthomyxoviridae Infections/virology , Pandemics , Risk Factors , Swine , Swine Diseases/transmission , Viral Proteins/chemistry , Viral Proteins/genetics
8.
Comput Biol Chem ; 88: 107315, 2020 Oct.
Article in English | MEDLINE | ID: mdl-32622177

ABSTRACT

Pulse diagnosis is an important part of Chinese medicine and has played an important role in the development of Chinese medical science. However, the pulse is traditionally determined by cutting it off, which leads to a lack of objective standard pulse identification methods and affects their accuracy and feasibility. This research has studied and discussed the processing and identification of four kinds of pulse: normal pulse, wiry pulse, smooth pulse, and thready pulse. Four frequency-domain characteristics of the pulse wave and six kinds of wavelet scale energy characteristic information were extracted, and a three-layer BP (backprocessing) neural network was established. The LM (Levenberg-Marquard) algorithm and a genetic algorithm were used to improve the BP neural network, to train on and predict experimental samples, and to obtain classification accuracies of 90% and 95% respectively. Moreover, improved BP neural network based on a genetic algorithm has shown highly superior performance in terms of convergence speed and low error rate.


Subject(s)
Algorithms , Pulse , Humans
9.
Infect Dis Poverty ; 9(1): 33, 2020 Mar 25.
Article in English | MEDLINE | ID: mdl-32209118

ABSTRACT

BACKGROUND: Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome. SARS-CoV-2 with potential origin of bat is still circulating in China. In this study, a prediction model is proposed to evaluate the infection risk of non-human-origin coronavirus for early warning. METHODS: The spike protein sequences of 2666 coronaviruses were collected from 2019 Novel Coronavirus Resource (2019nCoVR) Database of China National Genomics Data Center on Jan 29, 2020. A total of 507 human-origin viruses were regarded as positive samples, whereas 2159 non-human-origin viruses were regarded as negative. To capture the key information of the spike protein, three feature encoding algorithms (amino acid composition, AAC; parallel correlation-based pseudo-amino-acid composition, PC-PseAAC and G-gap dipeptide composition, GGAP) were used to train 41 random forest models. The optimal feature with the best performance was identified by the multidimensional scaling method, which was used to explore the pattern of human coronavirus. RESULTS: The 10-fold cross-validation results showed that well performance was achieved with the use of the GGAP (g = 3) feature. The predictive model achieved the maximum ACC of 98.18% coupled with the Matthews correlation coefficient (MCC) of 0.9638. Seven clusters for human coronaviruses (229E, NL63, OC43, HKU1, MERS-CoV, SARS-CoV, and SARS-CoV-2) were found. The cluster for SARS-CoV-2 was very close to that for SARS-CoV, which suggests that both of viruses have the same human receptor (angiotensin converting enzyme II). The big gap in the distance curve suggests that the origin of SARS-CoV-2 is not clear and further surveillance in the field should be made continuously. The smooth distance curve for SARS-CoV suggests that its close relatives still exist in nature and public health is challenged as usual. CONCLUSIONS: The optimal feature (GGAP, g = 3) performed well in terms of predicting infection risk and could be used to explore the evolutionary dynamic in a simple, fast and large-scale manner. The study may be beneficial for the surveillance of the genome mutation of coronavirus in the field.


Subject(s)
Betacoronavirus/immunology , Coronavirus Infections , Coronavirus/immunology , Disease Reservoirs/virology , Pandemics , Peptidyl-Dipeptidase A/metabolism , Pneumonia, Viral , Receptors, Virus/genetics , Spike Glycoprotein, Coronavirus/immunology , Algorithms , Amino Acids/genetics , Angiotensin-Converting Enzyme 2 , Animals , Betacoronavirus/genetics , COVID-19 , China , Chlorocebus aethiops , Coronavirus/genetics , Coronavirus/isolation & purification , Coronavirus Infections/genetics , Coronavirus Infections/transmission , Coronavirus Infections/virology , Endopeptidases/genetics , Endopeptidases/metabolism , Genome/genetics , Genome, Viral/genetics , Humans , Pandemics/prevention & control , Peptidyl-Dipeptidase A/genetics , Phylogeny , Pneumonia, Viral/genetics , Pneumonia, Viral/transmission , Pneumonia, Viral/virology , Receptors, Virus/metabolism , Risk Assessment , SARS-CoV-2
10.
Brief Bioinform ; 21(1): 11-23, 2020 Jan 17.
Article in English | MEDLINE | ID: mdl-30239616

ABSTRACT

Cell-penetrating peptides (CPPs) have been shown to be a transport vehicle for delivering cargoes into live cells, offering great potential as future therapeutics. It is essential to identify CPPs for better understanding of their functional mechanisms. Machine learning-based methods have recently emerged as a main approach for computational identification of CPPs. However, one of the main challenges and difficulties is to propose an effective feature representation model that sufficiently exploits the inner difference and relevance between CPPs and non-CPPs, in order to improve the predictive performance. In this paper, we have developed CPPred-FL, a powerful bioinformatics tool for fast, accurate and large-scale identification of CPPs. In our predictor, we introduce a new feature representation learning scheme that enables one to learn feature representations from totally 45 well-trained random forest models with multiple feature descriptors from different perspectives, such as compositional information, position-specific information and physicochemical properties, etc. We integrate class and probabilistic information into our feature representations. To improve the feature representation ability, we further remove redundant and irrelevant features by feature space optimization. Benchmarking experiments showed that CPPred-FL, using 19 informative features only, is able to achieve better performance than the state-of-the-art predictors. We anticipate that CPPred-FL will be a powerful tool for large-scale identification of CPPs, facilitating the characterization of their functional mechanisms and accelerating their applications in clinical therapy.

11.
BMC Bioinformatics ; 20(Suppl 8): 288, 2019 Jun 10.
Article in English | MEDLINE | ID: mdl-31182019

ABSTRACT

BACKGROUND: Avian influenza virus can directly cross species barriers and infect humans with high fatality. As antigen novelty for human host, the public health is being challenged seriously. The pandemic risk of avian influenza viruses should be analyzed and a prediction model should be constructed for virology applications. RESULTS: The 178 signature positions in 11 viral proteins were firstly screened as features by the scores of five amino acid factors and their random forest rankings. The Supporting Vector Machine algorithm achieved well performance. The most important amino acid factor (Factor 5) and the minimal range of signature positions (63 amino acid residues) were also explored. Moreover, human-origin avian influenza viruses with three or four genome segments from human virus had pandemic risk with high probability. CONCLUSION: Using machine learning methods, the present paper scores the amino acid mutations and predicts pandemic risk with well performance. Although long evolution distances between avian and human viruses suggest that avian influenza virus in nature still need time to fix among human host, it should be notable that there are high pandemic risks for H7N9 and H9N2 avian viruses.


Subject(s)
Amino Acids/genetics , Birds/virology , Influenza in Birds/epidemiology , Influenza in Birds/virology , Mutation/genetics , Pandemics , Algorithms , Animals , Computer Simulation , Databases as Topic , Genome, Viral , Machine Learning , Reassortant Viruses/genetics , Risk Factors
12.
Comput Biol Chem ; 78: 455-459, 2019 Feb.
Article in English | MEDLINE | ID: mdl-30528510

ABSTRACT

Using wavelet packet decomposition, the energy coefficients in the fifth level of viral protein sequences were achieved to predict interspecies transmission. Since avian-origin influenza viruses could have high sequence similarities with human-origin avian influenza virus and could have the phenotype of interspecies transmission, viral data should be filtered to prevent the misconduct of feature selection and false performance of predicting models. Considering the balance of data size, the empirical cut-off value 97% was used to screen avian-origin influenza virus with high sequence similarity. The excellent performances of cross validation show that the SVM model has the best capability of predicting transmission and evaluating the contribution of five amino acid factors. The robust model was finally used to evaluate the filtered data of avian-origin virus and the results confirmed that double check for ambiguous phenotype of avian-origin virus with high sequence similarity was necessary and part of them have the ability to across species barriers.


Subject(s)
Algorithms , Influenza A virus/genetics , Signal Processing, Computer-Assisted , Databases, Genetic , Humans , Phenotype
13.
Front Genet ; 9: 495, 2018.
Article in English | MEDLINE | ID: mdl-30410501

ABSTRACT

As one of the well-studied RNA methylation modifications, N6-methyladenosine (m6A) plays important roles in various biological progresses, such as RNA splicing and degradation, etc. Identification of m6A sites is fundamentally important for better understanding of their functional mechanisms. Recently, machine learning based prediction methods have emerged as an effective approach for fast and accurate identification of m6A sites. In this paper, we proposed "M6AMRFS", a new machine learning based predictor for the identification of m6A sites. In this predictor, we exploited a new feature representation algorithm to encode RNA sequences with two feature descriptors (dinucleotide binary encoding and Local position-specific dinucleotide frequency), and used the F-score algorithm combined with SFS (Sequential Forward Search) to enhance the feature representation ability. To predict m6A sites, we employed the eXtreme Gradient Boosting (XGBoost) algorithm to build a predictive model. Benchmarking results showed that the proposed predictor is competitive with the state-of-the art predictors. Importantly, robust predictions for multiple species by our predictor demonstrate that our predictive models have strong generalization ability. To the best of our knowledge, M6AMRFS is the first tool that can be used for the identification of m6A sites in multiple species. To facilitate the use of our predictor, we have established a user-friendly webserver with the implementation of M6AMRFS, which is currently available in http://server.malab.cn/M6AMRFS/. We anticipate that it will be a useful tool for the relevant research of m6A sites.

14.
Molecules ; 23(7)2018 Jun 29.
Article in English | MEDLINE | ID: mdl-29966263

ABSTRACT

Avian influenza virus (AIV) can directly cross species barriers and infect humans with high fatality. Using machine learning methods, the present paper scores the amino acid mutations and predicts interspecies transmission. Initially, 183 signature positions in 11 viral proteins were screened by the scores of five amino acid factors and their random forest rankings. The most important amino acid factor (Factor 3) and the minimal range of signature positions (50 amino acid residues) were explored by a supporting vector machine (the highest-performing classifier among four tested classifiers). Based on these results, the avian-to-human transmission of AIVs was analyzed and a prediction model was constructed for virology applications. The distributions of human-origin AIVs suggested that three molecular patterns of interspecies transmission emerge in nature. The novel findings of this paper provide important clues for future epidemic surveillance.


Subject(s)
Amino Acid Substitution , Influenza A virus/genetics , Influenza in Birds/virology , Influenza, Human/transmission , Influenza, Human/virology , Mutation , Animals , Animals, Wild , Birds , Humans , Position-Specific Scoring Matrices , Reproducibility of Results
15.
IEEE Trans Nanobioscience ; 10(2): 94-8, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21742570

ABSTRACT

The solution space exponential explosion caused by the enumeration of the candidate solutions maybe is the biggest obstacle in DNA computing. In the paper, a new unenumerative DNA computing model for graph vertex coloring problem is presented based on two techniques: 1) ordering the vertex sequence for a given graph in such a way that any two consecutive labeled vertices i and i+1 should be adjacent in the graph as much as possible; 2) reducing the number of encodings representing colors according to the construture of the given graph. A graph with 12 vertices without triangles is solved and its initial solution space includes only 283 DNA strands, which is 0.0532 of 3(12) (the worst complexity).


Subject(s)
Computational Biology/methods , Computer Simulation , Computers, Molecular , DNA/chemistry , Polymerase Chain Reaction , Base Sequence , Color , DNA/metabolism , DNA Probes , Electrophoresis, Agar Gel , Molecular Sequence Data
SELECTION OF CITATIONS
SEARCH DETAIL
...