ABSTRACT
Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning.
Subject(s)
Humans , Algorithms , Computational Biology , Methods , Diagnostic Imaging , Genomics , Methods , Image Interpretation, Computer-Assisted , Methods , Machine Learning , Neural Networks, Computer , Protein Structure, Secondary , Proteins , MetabolismABSTRACT
A novel subtype of influenza A virus 09H1N1 has rapidly spread across the world. Evolutionary analyses of this virus have revealed that 09H1N1 is a triple reassortant of segments from swine, avian and human influenza viruses. In this study, we investigated factors shaping the codon usage bias of 09H1N1 and carried out cluster analysis of 60 strains of influenza A virus from different subtypes based on their codon usage bias. We discovered that more preferentially used codons of 09H1N1 are A-ended or U-ended, and the intra-genomic codon usage bias of 09H1N1 is quite low. Base composition constraint, dinucleotide biases and translational selection are the main factors influencing the codon usage bias of 09H1N1. At the genome level, we find that the codon usage bias of 09H1N1 is similar to H1N1 (A/swine/Kansas/77778/2007H1N1), H9N2 from Asia, H1N2 from Asia and North America and H3N2 from North America. Our results provide insight for understanding the processes governing evolution, regulation of gene expression, and revealing the evolution of 09H1N1.
ABSTRACT
With thousands of sequenced 16 S rRNA genes available,and advancements in oligonucleotide microarray technology,the detection of microorganisms in microbial communities consisting of hundreds of species may be possible.The existing algorithms developed for sequence-specific probe design are not suitable for applications in large-scale bacteria detection due to the lack of coverage,flexibility and efficiency.Many other strategies developed for group-specific probe design focus on how to find a unique group-specific probe that can specifically detect all target sequences of a group.Unique group-specific probe for each group can not always be found.Hence,it is necessary to design non-unique probes.Each probe can specifically detect target sequences of a different subgroup.Combination of multiple probes can achieve higher coverage.However,it is a time-consuming task to evaluate all possible combinations.A feasible algorithm using relative entropy and genetic algorithm (GA) to design group-specific non-unique probes was presented.
ABSTRACT
<p><b>OBJECTIVE</b>To study the severe acute respiratory syndrome (SARS)-associated coronavirus genotype and its characteristics.</p><p><b>METHODS</b>A SARS-associated coronavirus isolate named ZJ01 was obtained from throat swab samples taken from a patient in Hangzhou, Zhejing province. The complete genome sequence of ZJ01 consisted of 29,715 bp (GenBank accession: AY297028, version: gi: 30910859). Seventeen SARS-associated coronavirus genome sequences in GenBank were compared to analyze the common sequence variations and the probability of co-occurrence of multiple polymorphisms or mutations. Phylogenetic analysis of those sequences was done.</p><p><b>RESULTS</b>By bioinformatics processing and analysis, the 5 loci nucleotides at ZJ01 genome were found being T, T, G, T and T, respectively. Compared with other SARS-associated coronavirus genomes in the GenBank database, an A/G mutation was detected besides the other 4 mutation loci (C:G:C:C/T:T:T:T) involved in this genetic signature. Therefore a new definition was put forward according to the 5 mutation loci. SARS-associated coronavirus strains would be grouped into two genotypes (C:G:A:C:C/T:T:G:T:T), and abbreviated as SARS coronavirus C genotype and T genotype. On the basis of this new definition, the ZJ01 isolate belongs to SARS-associated coronavirus T genotype, first discovered and reported in mainland China. Phylogenetic analysis of the spike protein gene fragments of these SARS-associated coronavirus strains showed that the GZ01 isolate was phylogenetically distinct from other isolates, and compared with groups F1 and F2 of the T genotype, the isolates of BJ01 and CUHK-W1 were more closely related to the GZ01 isolate. It was interesting to find that two (A/G and C/T) of the five mutation loci occurred in the spike protein gene, which caused changes of Asp to Gly and Thr to Ile in the protein, respectively.</p><p><b>CONCLUSION</b>Attention should be paid to whether these genotype and mutation patterns are related to the virus's biological activities,epidemic characteristics and host clinical symptoms.</p>